Chris Lattner

2.1K posts

Chris Lattner banner
Chris Lattner

Chris Lattner

@clattner_llvm

Building beautiful things like Mojo🔥 and MAX @Modular, lifting the world of production AI/ML software into a new phase of innovation. We’re hiring! 🚀🧠

Katılım Haziran 2014
149 Takip Edilen92.9K Takipçiler
Chris Lattner
Chris Lattner@clattner_llvm·
Cool to see that Tesla Full Self Driving has adopted the @LLVMFoundation MLIR stack, and is seeing 20% faster reaction time as a result. It is quite likely that a modern compiler and runtime implementation the break-through that robotaxi and FSD have been waiting for!
Teslahubs@Teslahubs

FSD (Supervised) v14.3 (HW4; Models S/3/X/Y/CT) rewrites the AI runtime with MLIR for a 20% faster reaction time and improves emergency-vehicle & small-animal handling — meaning fewer disengagements and safer supervised driving. teslahubs.com/blogs/tips/tes… #FSD

English
15
32
515
79.9K
Mark Saroufim
Mark Saroufim@marksaroufim·
After 5 amazing years, I’m leaving the PyTorch team at Meta. I did my best work there and got to work with some of the smartest, most OSS pilled engineers in the industry. More soon on what’s next: still systems, still OSS (but not everything), a smaller team with a lot of GPUs
Mark Saroufim tweet media
English
94
28
1.2K
62.4K
Chris Lattner
Chris Lattner@clattner_llvm·
@Hassan_Abedi Good news, MAX is free and open source: you can pip install and run this for free. Check out the open source tab on Modular.com and have fun!
English
0
1
11
454
Chris Lattner
Chris Lattner@clattner_llvm·
Google Deep Mind's impressive fully-open Gemma 4 is live day-zero on Modular Cloud. Modular provides the fastest performance on NVIDIA Blackwell and AMD MI355X, thanks to MAX and Mojo🔥. The team took this impressive new model to production inference in days.🚀
English
9
33
367
24.2K
Chris Lattner retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵
GIF
English
354
1.2K
8.8K
3.8M
Chris Lattner
Chris Lattner@clattner_llvm·
Two models are live: Gemma 4 31B (dense, 256K context) and Gemma 4 26B A4B (MoE, 4B active params per pass). Both multimodal. Available in Modular Cloud and in our open source MAX nightlies. I'd love to hear what you're building!👇 modular.com/blog/day-zero-…
English
1
0
25
3K
Chris Lattner
Chris Lattner@clattner_llvm·
How is this possible? MAX was built for rapid development and portability from the start. When a new architecture drops, you're not rewriting kernels or waiting on upstream support or maintaining separate vendor codepaths. You get great leverage from an open modern stack built for GenAI. 🚀
English
3
1
47
5.7K
Chris Lattner
Chris Lattner@clattner_llvm·
@meep414 Yes it does, but we frequently see performance wins from moving to higher level abstractions. It's hard to keep all the details straight when code size balloons
English
0
0
1
65
Zeke
Zeke@meep414·
@clattner_llvm Does the solver recover the exact same schedule as flash attention? It would be cool if there were multiple optimal solutions or if it found something faster.
English
1
0
0
111
Chris Lattner
Chris Lattner@clattner_llvm·
Pipelining AI kernels is required to get full perf/utilization out of modern chips. However, no one has been able to crack "full control over the hardware" without "having to micromanage it". Let's crack this open: kernel authors deserve a powerful scheduler they can control. 💪
Modular@Modular

FA4 on Blackwell: 14 ops, 5 hardware units, 28 dependency edges. One wrong sync = a race condition sanitizers won't catch. We built a constraint solver that derives the pipeline schedule automatically, in Mojo 🔥 Part 1 of our series is out → modular.com/blog/software-…

English
9
16
255
24.6K
Jackmin
Jackmin@jackminong·
why does everyone want an IR?
Jackmin tweet media
English
16
12
122
12.9K
Chris Lattner
Chris Lattner@clattner_llvm·
@matt_dz Hey Matt, I agree with you that there are many interesting approaches. We're not beholden to ILP or any other specific algorithm. This is why it should be a library and not hard coded into a compiler!
English
0
0
2
117
Matt
Matt@matt_dz·
@clattner_llvm Super nifty! Out of curiosity, modulo scheduling approach immediately reminded me of Twill (arxiv.org/abs/2512.18134); are the main diffs that you pose this as ILP (vs SMT) problem and standalone (vs joint) SWP/WS optimization?
Matt tweet media
English
1
0
2
191
Chris Lattner
Chris Lattner@clattner_llvm·
@navneet_rabdiya Why use "yet another DSL" with attendant poor tooling - when you can have a language built for enabling powerful tools like this as libraries? :-)
English
0
0
1
193
Navneet
Navneet@navneet_rabdiya·
The tension between control and abstraction is real. Most current ML frameworks either give you bare metal control (hand-rolling your own CUDA kernels) or hide everything behind high-level ops. We need something in between - maybe a declarative scheduling DSL that can hint intent?
English
1
0
2
333
Chris Lattner
Chris Lattner@clattner_llvm·
@wildpinesai 100%: how much pain and suffering has lack of proper abstractions caused us all?
English
0
0
3
410
WildPinesAI
WildPinesAI@wildpinesai·
@clattner_llvm compute-sanitizer can't even track TMA or async WGMMA. you cannot debug your way to correct pipelining. a constraint solver is the only sane path when the safety net doesn't exist
English
1
0
7
480
Chris Lattner retweetledi
This Week in AI
This Week in AI@ThisWeeknAI·
GOOGLE SIGNS $5B DEAL WITH ANTHROPIC @Jason: Who Nvidia's biggest competitor? @clattner_llvm "Google... They are way better already and have the opportunity to add a couple trillion to their marketcap." From episode 6 of This Week in AI.
English
16
59
623
160.9K
Chris Lattner retweetledi
Modular
Modular@Modular·
130 lines instead of 870. That's the difference between our conv2d implementation on Blackwell and CUTLASS's. We broke kernels into three swappable pieces: one for moving data, one for coordinating the pipeline, one for compute. When you need a new kernel, you only change the piece that actually needs to change. Part 3 of our Structured Mojo Kernels series walks through the details: modular.com/blog/structure…
English
4
18
122
14.6K
Chris Lattner retweetledi
Modular
Modular@Modular·
2 days ago we shipped image generation in <1s 🔥 Today, we make that <300ms 🤯 NVIDIA + AMD⚡️ Full demo below ⬇️
English
10
12
163
11.7K