olivier giroux

2.9K posts

olivier giroux

olivier giroux

@__simt__

Dismantling difficulty in concurrent programming.

California, USA انضم Temmuz 2014
165 يتبع2.7K المتابعون
تغريدة مثبتة
olivier giroux
olivier giroux@__simt__·
@ernire I personally think that people overestimate the market potential of hard-to-program computer systems. Most computers have to be programmable to most programmers. Most programs have to be programmed by most programmers. There is a very significant human factor in computing.
English
4
13
96
0
olivier giroux
olivier giroux@__simt__·
@blelbach What!? I was *just* trying to tell you this story and you shut me down!
English
1
0
1
216
olivier giroux أُعيد تغريده
Gokhan Avkarogullari
Gokhan Avkarogullari@gavkar·
We brought significant architectural advancements and feature set improvements to A19/M5 GPUs. Scalable GPU Neural Accelerators 2nd Gen Dynamic Caching Shader Architecture 3rd Gen Ray Tracing The best GPU Driven Pipeline New graphics features, rate and perf increases.
English
18
29
596
36.8K
olivier giroux
olivier giroux@__simt__·
Hey, @trebolloc @awnihannun, I wonder if you know of some member of the MLX community who would like to work on neural accelerators directly.
English
0
0
2
516
olivier giroux
olivier giroux@__simt__·
@never_released My kids are forced to use those, and I'm forced to buy it for them, which is really traumatic for me I gotta say. Look at screen and type all day? -> Here's the cheapest screen and keyboard China can make, kid. Have an old MBA you could use? -> Nope, not allowed.
English
0
0
1
172
Longhorn
Longhorn@never_released·
tbh giving Chromebooks to pupils/students might be doing them a disservice imo :/ and Google doesn't seem too interested in fixing that
English
3
0
23
2.3K
olivier giroux أُعيد تغريده
Awni Hannun
Awni Hannun@awnihannun·
M5 Max is a local AI powerhouse in a laptop form factor. So awesome to see this thing released. Up to 8x faster prefill / image generation compared to M1 Max. Benchmarks done with MLX / mlx-lm.
Awni Hannun tweet media
English
31
45
485
36.6K
olivier giroux
olivier giroux@__simt__·
@never_released Can I ask you to write some words about the really nice things you see becoming unblocked by that?
English
1
0
4
146
Longhorn
Longhorn@never_released·
The feature ask at the very top of my list for Metal is to have a way to have the GPU device-side address match the host-side one. It'd actually unblock a number of really nice things.
English
2
2
22
3.2K
olivier giroux
olivier giroux@__simt__·
@never_released C++ doesn’t support virtual aliases either; at best it happens to work sometimes. The proxy model is a view on what C++ itself might need to do.
English
0
0
6
264
Longhorn
Longhorn@never_released·
GPUs and caches not handling aliasing: #virtual-aliasing-support" target="_blank" rel="nofollow noopener">docs.nvidia.com/cuda/cuda-prog… > If accessing same allocation through different “proxies” is required in the same kernel, a fence.proxy.alias can be used between the two accesses. The above example can thus be made legal with inline PTX assembly
English
3
7
46
4.8K
olivier giroux
olivier giroux@__simt__·
@jimmyjames_tech @jonmasters Thanks for the kind words but all I contributed to this result is some mentorship and a fresh eye. This team had all the good ideas ready to go from the start of this cycle. I almost feel like I observed.
English
2
0
1
102
🦊
🦊@jimmyjames_tech·
@jonmasters They have been making great strides on the gpu front for years, and @__simt__ has only accelerated that.
English
1
0
3
564
Jon Masters 🏴‍☠️
Jon Masters 🏴‍☠️@jonmasters·
Anyone else appreciate how cool it is that @Arm v8 load-acquire/store-release OMCA semantics fit perfectly with C11 sequential consistency requirements?
English
2
0
9
1.2K
Hernan Ponce De Leon
Hernan Ponce De Leon@h_poncedeleon·
The timing seems right to share that our paper "Towards Unified Analysis of GPU Consistency" has been accepted to ASPLOS. hernanponcedeleon.github.io/pdfs/asplos202…
Reese Levine@reeselevine

Just noticed the new sections on memory ordering/synchronization in MSL 3.2 (section 6.15 of developer.apple.com/metal/Metal-Sh…) Finally adding some useful primitives that open up the potential for a lot of interesting GPU compute algorithms on Apple silicon!

English
2
5
30
4.3K
olivier giroux أُعيد تغريده
@ericniebler.bsky.social
@ericniebler.bsky.social@ericniebler·
BREAKING: P2300 has been voted into C++26! 🎉
English
15
26
179
19K
JF Bastien
JF Bastien@jfbastien·
@code_report It’s wild that zipping the sources of K actually creates a BIGGER file!!! 🤯
English
1
0
15
1.7K
Conor Hoekstra
Conor Hoekstra@code_report·
Arthur Whitney for the first time ever has open sourced (under MIT license) K. This K is not the full K9 but is known as "K Junior." 4 files, 270 lines, 17K characters. You can find the zipped folder on shakti.com but I have put it on GitHub: github.com/codereport/kju…
Conor Hoekstra tweet media
English
12
17
112
20.9K
David Goldblatt
David Goldblatt@davidtgoldblatt·
Interested in the complicated and byzantine rules surrounding the C/C++ memory model? *Also* interested in the complicated and byzantine rules surrounding the C/C++ provenance model? Well, then have I got a C++ paper for you: wg21.link/P3292R0
English
4
12
85
16.2K
olivier giroux
olivier giroux@__simt__·
@FelixCLC_ Nothing wrong with that. I have a talk on SIMT coming up and I feel the same way.
English
0
0
6
217
olivier giroux
olivier giroux@__simt__·
@cdiggins @lemire @Love2Code Opinion : it's the most important work in vectorization in this millennium, and its lessons need to be absorbed by implementers in order for anyone to be able to top it.
English
0
0
4
114