kyle yu

53 posts

kyle yu banner
kyle yu

kyle yu

@brrrkyle

building gpu visualizations

Katılım Nisan 2026
31 Takip Edilen163 Takipçiler
Sabitlenmiş Tweet
kyle yu
kyle yu@brrrkyle·
this is how i wish i learned GPU fundamentals not a lengthy textbook. not a static image. every concept is an interactive visualization. covering the SM architecture, memory coalescing, synchronization, and more. what concepts do you want to see next? brrrviz.com
English
3
8
142
23.7K
kyle yu
kyle yu@brrrkyle·
this is how i wish i learned GPU fundamentals not a lengthy textbook. not a static image. every concept is an interactive visualization. covering the SM architecture, memory coalescing, synchronization, and more. what concepts do you want to see next? brrrviz.com
English
3
8
142
23.7K
Pramod Goyal
Pramod Goyal@goyal__pramod·
It's a crime that more people have not read these beautiful blogs! Beautiful visuals, simple explanations, code anyone can understand. I have a new bar for my future blogs now...
Pramod Goyal tweet media
English
9
75
902
37K
kyle yu
kyle yu@brrrkyle·
Chapter 9 of BrrrViz walks you through both scenarios. brrrviz.com
English
0
0
0
93
kyle yu
kyle yu@brrrkyle·
The cost: serialization. Threads queue at the address one at a time. The more threads contend for the same location, the more your parallelism collapses into a bottleneck. This is why real GPU kernels accumulate locally in registers first, then do a single atomicAdd at the end.
English
1
0
1
105
kyle yu
kyle yu@brrrkyle·
Most GPU bugs don't crash your program. They just give you the wrong answer. Silently. When thousands of threads try to update the same memory address simultaneously, each one does three things: 📖 read the current value ⚡ execute their computation ✍ write back the result
kyle yu tweet media
English
1
1
2
184
Zak 🦈 (e/acc)
Zak 🦈 (e/acc)@ZakShark·
Formez vous à l'inference/kernel engineering. Savoir bien optimiser les GPU kernels dans les workloads d'inference vaut de l'or. Maitriser CUDA ou Triton, vLLM, SGLang, TensorRT-LLM est un vrai plus si vous voulez vous démarquer pour 2026-2027 en que AI/ML Engineer.
Français
11
47
499
21.1K
Banshee
Banshee@Banshee2507·
Started learning CUDA today parallel computing feels like a whole new mindset. Any tips, resources, or beginner pitfalls I should know about? #CUDA #GPU #Learning
Banshee tweet media
English
20
8
144
8.6K
himanshu
himanshu@retr0sushi_·
always a beginner :) ps : if you have resources or roadmaps don't be shy to share them with me pls!
himanshu tweet mediahimanshu tweet media
English
6
1
42
3K
kyle yu
kyle yu@brrrkyle·
Chasing utilization without this perspective often means optimizing the wrong thing. Understanding where your kernel sits on this diagram helps you execute better optimizations. Find it at chapter 3 of BrrrViz 👉 brrrviz.com
English
0
0
0
58
kyle yu
kyle yu@brrrkyle·
Memory-bound means your hardware is waiting on data. Fix data movement, locality, and reuse. Compute-bound means the data is there, but the math is slow on the hardware. Fix precision, use tensor cores, or change instruction path.
kyle yu tweet media
English
1
0
0
63
kyle yu
kyle yu@brrrkyle·
Stop tuning the wrong bottleneck. GPU optimization isn’t one ceiling, it’s memory bandwidth vs peak compute. The roofline plots both, so you see which one limits your kernel.
kyle yu tweet media
English
1
0
2
88