levi

180 posts

levi banner
levi

levi

@levidiamode

365 days of GPU programming ▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 75/365

out of memory Katılım Haziran 2025
256 Takip Edilen474 Takipçiler
Vikram
Vikram@msharmavikram·
The first prints of the Programming Massively Parallel Processors (PMPP) 5th Edition is here. For those in @nvidia GTC’26, I will be hosting @WenmeiHwu in CWE PMPP Edition 5 Unveiled: Ask Dr. Wen Mei Your Questions [QA81637] March 19  |  11:00- 11:50 a.m. PDT SJCC LL21F (LL)
English
4
2
82
3.8K
levi
levi@levidiamode·
Day 75/365 of GPU Programming I took a first stab at an interactive 3D GPU playground. It starts from the package level and lets you traverse the H100 from die to GPCs to groups of TPCs to each SM and into individual Tensor Cores; basically going deeper and deeper at every level. It's by no means perfect yet as it still has lots of missing components and open questions (e.g. at what level do you stop, do you drill down to the transistor or stop at the cuda/tensor core? what are the actual dimensions & proportions inside the chips? etc). Unfortunately I don't have access to all the data Nvidia or @dylan522p @SemiAnalysis_ have, so some of the design choices like block boundaries are also just educational guesses (someone pls tell me if I'm wrong though). That being said, the 3D interactions have already been helpful for me personally to internalize the different layers of the GPU and how they interact during specific execution phases of a kernel. Next, I'm gonna do a few more audits of the model and add some interactive walk-throughs before putting out the public website. Maybe I'll take a couple of examples from @rasbt's LLM book or @elliotarledge's CUDA book and visualize them inside a dedicated section. If there's any specific features that you would like to see, I'd be super keen to hear.
levi@levidiamode

Day 74/365 of GPU Programming I always found die shots and SM diagrams beautiful but difficult to map mentally, so I've been trying to find a way to interact with GPUs in 3D. This is what I have so far: a single input that goes through a simplified H100 execution pipeline to see what the silicon is doing at each step; from CPU-side tokenization and embedding lookup, through matmuls on tensor cores to the final softmax output. My current plan is to make this an interactive playground that lets you zoom in and zoom out through various levels of depth (package → die → GPC → SM → tensor core) while also including step-through examples similar to the bycroft LLM 3D visualization. Ideally this should make exploring the architectural side just as easy as mapping CUDA abstractions onto the actual hardware processes. I'm starting with an H100 but would be fun to expand this to more GPUs and highlight the differences between generations. This was largely inspired by @srush_nlp's GPU puzzles, @JayAlammar's Illustrated Transformers and @karpathy's makemore series, which made me think about how to study and visualize GPUs from the ground up.

English
0
0
33
1.7K
levi
levi@levidiamode·
@fellix_bg this is so cool. how did you come across this shot?
English
1
0
0
4
DuckDodgers
DuckDodgers@fellix_bg·
@levidiamode History of Nvidia GPUs in silicon: @N02/52181918538" target="_blank" rel="nofollow noopener">flickr.com/photos/1535492…
English
1
0
1
18
levi
levi@levidiamode·
Day 74/365 of GPU Programming I always found die shots and SM diagrams beautiful but difficult to map mentally, so I've been trying to find a way to interact with GPUs in 3D. This is what I have so far: a single input that goes through a simplified H100 execution pipeline to see what the silicon is doing at each step; from CPU-side tokenization and embedding lookup, through matmuls on tensor cores to the final softmax output. My current plan is to make this an interactive playground that lets you zoom in and zoom out through various levels of depth (package → die → GPC → SM → tensor core) while also including step-through examples similar to the bycroft LLM 3D visualization. Ideally this should make exploring the architectural side just as easy as mapping CUDA abstractions onto the actual hardware processes. I'm starting with an H100 but would be fun to expand this to more GPUs and highlight the differences between generations. This was largely inspired by @srush_nlp's GPU puzzles, @JayAlammar's Illustrated Transformers and @karpathy's makemore series, which made me think about how to study and visualize GPUs from the ground up.
levi@levidiamode

Day 73/365 of GPU Programming Wanted to understand FP4 better and came across this great @Cohere_Labs talk on Training LLMs with MXFP4 and @juliarturc's amazing series on quantization So fascinating learning what makes low precision work for LLM training and inference

English
16
14
319
23.7K
levi
levi@levidiamode·
@srush_nlp @sheerluck_io If you had more time, what educational resource/OS project would you tackle next?
English
0
0
0
24
Sasha Rush
Sasha Rush@srush_nlp·
@sheerluck_io Yeah people like the GPU ones. I think autodiff is fun too.
English
3
0
8
1.1K
Sahil
Sahil@sheerluck_io·
Bro I think I got a goldmine. Damn, journey starts today Thank you so much @srush_nlp for providing these stuff I'll start from the GPU Series
Sahil tweet media
English
1
0
14
1.3K
Tinkerbell
Tinkerbell@Tinkerbellcodes·
@levidiamode share the resources that are helping you to learn these stuff sir!
English
1
0
1
24
levi
levi@levidiamode·
@Alan_Ma_ thanks Alan! your Unwrapping TPUs series was so sick. excited to see what you cook up next
English
0
0
1
69
levi
levi@levidiamode·
@gpusteve maybe everyone else does too 😭
English
1
0
1
30
steve
steve@gpusteve·
went to a party in sf
steve tweet media
English
2
0
16
301