
Reiner Pope
182 posts

Reiner Pope
@reinerpope
CEO and founder, @MatXComputing, developing high throughput chips tailored for LLMs














Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography

Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography


Intelligence per picojoule, with @itsclivetime and @dylan522p (0:00) Intro (1:22) What is codesign? (2:49) Codesign example: Swish vs ReLU (4:22) Are DeepSeek papers codesign? (6:45) Predicting where ML research will go (8:06) Should researchers hate your chips? (9:34) Can you codesign too much? (13:23) Picking the right grain size for specialization (16:22) How much hardware flexibility for The Age of Research? (20:05) Did reasoning and RL disrupt hardware roadmaps? (23:09) Cerebras/Groq: unexpected wins on reasoning and RL (25:34) Disaggregating MLP and attention (29:06) The right metrics for quantization and codesign papers

Intelligence per picojoule, with @itsclivetime and @dylan522p (0:00) Intro (1:22) What is codesign? (2:49) Codesign example: Swish vs ReLU (4:22) Are DeepSeek papers codesign? (6:45) Predicting where ML research will go (8:06) Should researchers hate your chips? (9:34) Can you codesign too much? (13:23) Picking the right grain size for specialization (16:22) How much hardware flexibility for The Age of Research? (20:05) Did reasoning and RL disrupt hardware roadmaps? (23:09) Cerebras/Groq: unexpected wins on reasoning and RL (25:34) Disaggregating MLP and attention (29:06) The right metrics for quantization and codesign papers










