Jędrzej Maczan (@jedmaczan) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

I built a tiny-vllm in C++ and CUDA - paged attention - continuous batching - educational - 100% human-written™ And now I writing a course where you will build your own vLLM yourself. Still work in progress, I'll finish by the end of April. All for free ofc, just a GitHub repo

English

15

30

593

17.9K

Jędrzej Maczan@jedmaczan·1d

@antgr81 appreciated 🫡

English

0

1

8

Antonis@antgr81·1d

@jedmaczan followed

English

1

0

1

17

Jędrzej Maczan@jedmaczan·2d

I built a tiny-vllm in C++ and CUDA - paged attention - continuous batching - educational - 100% human-written™ And now I writing a course where you will build your own vLLM yourself. Still work in progress, I'll finish by the end of April. All for free ofc, just a GitHub repo

English

15

30

593

17.9K

Jędrzej Maczan@jedmaczan·1d

@EAnsari88820 thx lfg!

English

0

25

Ehaan Ansari@EAnsari88820·1d

@jedmaczan woww keep going !!

English

1

0

1

38

Jędrzej Maczan@jedmaczan·1d

@ICE257_ will do

English

0

12

ICE@ICE257_·1d

@jedmaczan Followed, keep me updated

English

1

0

1

30

Jędrzej Maczan@jedmaczan·1d

@nearlypi thanks man, hope you find it useful

English

0

1

65

bikram_@nearlypi·1d

@jedmaczan Great stuff 👍🏾

English

1

0

1

80

Jędrzej Maczan@jedmaczan·1d

@YuvrajS9886 Basics of C++ and CUDA and learn on the go. You got this bro!

English

1

0

5

214

Yuvraj Singh@YuvrajS9886·1d

@jedmaczan Exact prerequisites like c or cuda cus I know nothing ☠️?

English

1

0

2

244

Jędrzej Maczan@jedmaczan·1d

Andrej I got inspired by your llm.c and how you explain things from scratch. I’d love if you take a look at my project and the course (I won’t lie, I try to complement LLM101n a bit) @karpathy

English

0

3

548

Jędrzej Maczan@jedmaczan·2d

@yacinelearning hi brother, recently finished raw C++ and CUDA small vLLM, now writing a tiny-vllm course from scratch x.com/jedmaczan/stat… hbu?

Jędrzej Maczan@jedmaczan

I built a tiny-vllm in C++ and CUDA - paged attention - continuous batching - educational - 100% human-written™ And now I writing a course where you will build your own vLLM yourself. Still work in progress, I'll finish by the end of April. All for free ofc, just a GitHub repo

English

1

0

1

110

Yacine Mahdid@yacinelearning·2d

yo guys what you working on this week???

English

47

0

124

10K

Jędrzej Maczan@jedmaczan·2d

@tatavishnurao thank you king

English

0

1

219

Vishnu@tatavishnurao·2d

@jedmaczan Added to elite git repos on my list for 2k26 @jedmaczan

English

1

0

1

293

Jędrzej Maczan@jedmaczan·2d

@stephennfern full course soon, breaking down bf16 now

English

0

1

62

Stephen Fernandes@stephennfern·2d

@jedmaczan This is dope ✨️✨️✨️ awaiting ...

English

1

0

1

81

Jędrzej Maczan@jedmaczan·2d

@sarthak2143 ❤️

QME

0

1

50

sλrthak@sarthak2143·2d

@jedmaczan HELL YEAHHH

English

1

0

1

83

Jędrzej Maczan@jedmaczan·2d

@jino_rohit cause it's ml and math 🥰

English

1

0

2

286

Jino Rohit@jino_rohit·2d

@jedmaczan this is so cool!

English

1

0

3

410

Jędrzej Maczan@jedmaczan·2d

@botir33751732 🫡

QME

0

315

Botir Khaltaev@botir33751732·2d

@jedmaczan nice good stuff

English

1

0

1

369

Jędrzej Maczan@jedmaczan·2d

@Triceratops_b0 code is finished, course on April 30 will be finished I promise bro

English

0

129

Triceratops@Triceratops_b0·2d

@jedmaczan Do it bro 🥳

English

1

0

2

180

Jędrzej Maczan@jedmaczan·2d

@gauravkaul tysm! will be much better when full course is finished

English

0

2

156

G K@gauravkaul·2d

@jedmaczan Great work 👏 very well documented

English

1

0

2

219

Jędrzej Maczan@jedmaczan·2d

github.com/jmaczan/tiny-v…

ZXX

0

5

31

1.3K

Jędrzej Maczan@jedmaczan·2d

github.com/jmaczan/torch-…

ZXX

0

1

102

Jędrzej Maczan@jedmaczan·14 Oca

LLMs on WebGPU in PyTorch!!

GIF

1

0

2

589

Jędrzej Maczan@jedmaczan·6d

@anirudhbv_ce @nvidia I recognize the paper, is it courses.cs.washington.edu/courses/cse599…? 😍

English

0

144

anirudh bv@anirudhbv_ce·6d

Finally got my Softmax kernels running on a @nvidia Blackwell B300 today! A single-pass tiled Softmax and a two-pass streaming Online Softmax. Writing ct.load() feels like cheating compared to manual Triton pointer math when mapping directly to TMA hardware.

English

9

12

122

6.3K

Jędrzej Maczan retweetledi

Tri Dao@tri_dao·5 Mar

The FA4 paper is finally out after a year of work. On Blackwell GPUs, attention now goes about as fast as matmul even though the bottlenecks are so different! Tensor cores are now crazy fast that attn fwd is bottlenecked by exponential, and attn bwd is bottlenecked by shared memory bandwidth. Some fun stuff in the redesigned algorithm to overcome these bottlenecks: exponential emulation with polynomials, new online softmax to avoid 90% of softmax rescaling, 2CTA MMA instructions that allow two thread blocks to share operands to reduce smem traffic.

Ted Zadouri@tedzadouri

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English

30

230

1.8K

185.4K

Jędrzej Maczan retweetledi

Leonardo de Moura@Leonard41111588·3 Mar

AI is writing a growing share of the world's software. No one is formally verifying any of it. New essay: "When AI Writes the World's Software, Who Verifies It?" leodemoura.github.io/blog/2026/02/2…

English

41

248

1.6K

420.4K

Jędrzej Maczan

Keşfet