Pranjal

323 posts

Pranjal

Pranjal

@pranjalssh

@xai

Office Katılım Ocak 2021
690 Takip Edilen3.5K Takipçiler
Sabitlenmiş Tweet
Pranjal
Pranjal@pranjalssh·
I implemented H100 cuda matmul kernel from scratch, taking inspiration from @Si_Boehm's blog. Our final kernel outperforms cuBLAS by 7% for N=4096. It fits in a single C++ file without any dependencies. Full-blown blog post with all details: cudaforfun.substack.com/p/outperformin…
English
32
30
287
49.8K
Pranjal retweetledi
Guodong Zhang
Guodong Zhang@Guodzh·
Last day at xAI. Wild journey past three years but excited about next chapter. Thanks all for the love and support yesterday. So many friends made along the way and I will miss you all!
English
236
61
2.5K
651.5K
Erik Bernhardsson
Erik Bernhardsson@bernhardsson·
It’s ironic that Blackwells have been out since 2024 but people still prefer Hoppers because the kernels aren’t Blackwell-optimized yet, and now the Hopper prices are going up.
English
25
22
429
122K
Pranjal
Pranjal@pranjalssh·
Actually, engram explains deja vu
Français
0
0
3
815
Pranjal
Pranjal@pranjalssh·
@vikhyatk You can do better baselines. Show the bandwidth/flops!
English
0
0
2
240
Pranjal retweetledi
Pranjal
Pranjal@pranjalssh·
@tetsuo_cpp No no no the world needs one more dsl, the one i write will be best i promise
English
3
1
19
4.7K
tetsuo.cpp (no slop)
tetsuo.cpp (no slop)@tetsuo_cpp·
Oh, you're writing CUDA kernels? Everyone's on Triton now. Just kidding, we're all on Mojo. We're using cuTile. We're using ROCm. We have an in-house DSL compiler targeting the NVGPU MLIR dialect but wait, Tile IR just dropped so we're going to target that instead. Our PM is on TileLang. The team lead was on CuTe but now she's back to handwriting PTX. If you're not on Pallas, you're ngmi. Our intern is building on TT-Metalium for our Wormholes. Our CFO approved an order for some big chungus wafer-scale chips so now we're porting our kernels to CSL. Our CTO is working on a kernel-less graph compiler so we won't need to write kernels anymore. Our CEO thinks we're talking about the Linux kernel. We're building Claude for dogs.
English
67
179
2.8K
181.3K
Pranjal
Pranjal@pranjalssh·
@vikhyatk Also much fewer lines and everything can fit in one file
English
0
0
4
394
vik
vik@vikhyatk·
i feel like cuda c++ is easier to work with than cutedsl. did a relative import and everything stopped working still excited about cutedsl but i worry it might be too early for production use
English
6
0
42
23.2K
Pranjal
Pranjal@pranjalssh·
@nearcyan Sounds like a South park episode idea
English
0
0
3
1.2K
near
near@nearcyan·
if doordash was ran like early-uber they'd have sent a seal team to novo nordisk a decade ago and stopped ozempic at its source
English
28
107
5.1K
174.5K
Pranjal
Pranjal@pranjalssh·
@Teknium Yea I used the opportunity to rant myself instead of offering a solution.
English
0
0
5
392
Teknium (e/λ)
Teknium (e/λ)@Teknium·
@pranjalssh Yes but there are people who only work on data and need to train some dang models and not be burdened by infra woes for 98% of their work hours ^_^
English
2
0
18
879
Teknium (e/λ)
Teknium (e/λ)@Teknium·
can someone make flash-attn not so brittle please its been like 2 years
English
12
1
144
16.2K
Pranjal
Pranjal@pranjalssh·
Less known PTX instructions
Pranjal tweet media
English
0
0
13
1.6K
Pranjal retweetledi
skcd
skcd@skcd42·
we are opening 3 new roles at xAI to shape the future of software engineering. Join us in product, infrastructure, or post-training for the grok-code-team and help push the frontier forward with the best team in the world 🚀
English
20
61
478
36K
Pranjal retweetledi
Bryan Johnson
Bryan Johnson@bryan_johnson·
I’m not arguing for immortality. I’m observing that our tolerance and embrace of death is the greatest existential risk we face as a species. For it then rationalizes our other self destructive tendencies, framing them as virtues.
English
199
88
1.5K
117.2K