levi
180 posts

levi
@levidiamode
365 days of GPU programming ▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 75/365



Day 74/365 of GPU Programming I always found die shots and SM diagrams beautiful but difficult to map mentally, so I've been trying to find a way to interact with GPUs in 3D. This is what I have so far: a single input that goes through a simplified H100 execution pipeline to see what the silicon is doing at each step; from CPU-side tokenization and embedding lookup, through matmuls on tensor cores to the final softmax output. My current plan is to make this an interactive playground that lets you zoom in and zoom out through various levels of depth (package → die → GPC → SM → tensor core) while also including step-through examples similar to the bycroft LLM 3D visualization. Ideally this should make exploring the architectural side just as easy as mapping CUDA abstractions onto the actual hardware processes. I'm starting with an H100 but would be fun to expand this to more GPUs and highlight the differences between generations. This was largely inspired by @srush_nlp's GPU puzzles, @JayAlammar's Illustrated Transformers and @karpathy's makemore series, which made me think about how to study and visualize GPUs from the ground up.


Day 73/365 of GPU Programming Wanted to understand FP4 better and came across this great @Cohere_Labs talk on Training LLMs with MXFP4 and @juliarturc's amazing series on quantization So fascinating learning what makes low precision work for LLM training and inference















