ayush (@ayushrgarg) - Twitter Profili | Zamantika Mersobahis Locabet

ayush retweetledi

I added KV caching and INT8 KV quantization to our transformer inference, improving throughput by 35x. All of this was done from scratch in Rust + CUDA, on top of a homemade ML framework. On a 4-token prompt with 252 generated tokens: - Original: 0.76 tok/s - KV cache fp32: 27.21 tok/s - KV cache int8 (quantized): 27.29 tok/s Try it out yourself here: mni-ml.github.io/demos/kv-cache/ In practice: - KV caching gave us about a 35x end-to-end speedup - INT8 KV cache kept roughly the same speed as fp32 but cut KV cache memory by 3.78x FP32 cache used 4.5 MB in this run while the INT8 cache used only 1.19 MB This simple change to inference created a huge impact on performance. To learn more about the KV cache and other optimizations like this, check out the blog at mni.ml!

English

20

22

489

47.7K

ayush@ayushrgarg·2d

@_reesechong @MankyDankyBanky @srush_nlp my roommate 🤞

English

1

0

2

208

Reese Chong@_reesechong·4d

Behind the scenes of mni-ml: January 4th 2026 - my roommate @MankyDankyBanky and I wanted to do a big project together. ”maybe we should try to build pytorch from scratch” We found @srush_nlp's minitorch curriculum and committed to grinding through it Jan to April. February - autodiff and tensor internals done. lots of late night PR reviews, stacked diffs, Kinton ramen runs to Toronto when I'd visit Aadi at Shopify. We started posting on X to keep ourselves accountable. March - the month of parallelization: Aadi shipped tiled matmul using the same algo @nvidia teaches in their CUDA guide, wrapped by end of month - pooling, conv1d/2d forward+backward, softmax, dropout. March 22-23 — @socraticainfo symposium & we see the tinytpu team on the stage which filled us with determination 🫡 cc: @evanliin @XanderChin @suryasure05 @kennykgguo March 24 - chose the mni-ml brand and started the educational blog March 30 - minitorch is DONE ahead of schedule. now we build on top of the framework. April 5-6 - cuBLAS matmul via koffi FFI. buffer pooling, strided batched GEMM, kernel optimizations. CUDA backend takes shape. April 7 - huge day. cross-platform CI pipeline, prebuilt npm binaries, v0.3.0 — CUDA live on @npmjs. flatten the monorepo, add @WebGPU + Windows CUDA build targets by eod. April 12 - flash attention CUDA kernel ships. we caught a bug where head dim > 32 was truncating. April 14 (during exam season), we recorded the demo in @Shopify recording studio during Aadi’s lunch break. Everything over the last 4mo finally came together. Cc: @fnthawar @tobi @alspee April 17: launch post and bought the domain mni.ml and we’re just getting started. We have so much in store for this summer, stay tuned 🫡 cc: @sundeep @GavinSherry

Aadi Kulshrestha@MankyDankyBanky

I trained a 12M parameter LLM on my own ML framework using a Rust backend and CUDA kernels for flash attention, AdamW, and more. Wrote the full transformer architecture, and BPE tokenizer from scratch. The framework features: - Custom CUDA kernels (Flash Attention, fused LayerNorm, fused GELU) for 3x increased throughput - Automatic WebGPU fallback for non-NVIDIA devices - TypeScript API with Rust compute backend - One npm install to get started, prebuilt binaries for every platform Try out the model for yourself: mni-ml.github.io/demos/transfor… Built with @_reesechong. Check out the repos and blog if you want to learn more. Shoutout to @modal for the compute credits allowing me to train on 2 A100 GPUs without going broke cc @sundeep @GavinSherry

English

15

12

239

40K

ayush@ayushrgarg·3d

@VishnuSatish_ my goat

English

1

0

1

433

ayush retweetledi

Vishnu Satish@VishnuSatish_·3d

I built and trained a ~6M parameter GPT-2 entirely from scratch in C++, and it actually generates English text with mostly correct grammar! No PyTorch and no external dependencies. Just pure C++ 20. More info, GitHub link, and screenshots below!

English

70

63

1.2K

64.7K

ayush@ayushrgarg·4d

@adiprasadd unironically pu 9 pm tn lets run it 😭

English

1

0

164

adi@adiprasadd·4d

@ayushrgarg next step is quit poker

English

1

0

1

191

ayush@ayushrgarg·4d

99% of gamblers quit before they win I won

English

1

0

31

1.1K

ayush@ayushrgarg·4d

@MankyDankyBanky my goats @_reesechong and @MankyDankyBanky this is insane

English

1

0

4

2.7K

ayush retweetledi

Aadi Kulshrestha@MankyDankyBanky·4d

I trained a 12M parameter LLM on my own ML framework using a Rust backend and CUDA kernels for flash attention, AdamW, and more. Wrote the full transformer architecture, and BPE tokenizer from scratch. The framework features: - Custom CUDA kernels (Flash Attention, fused LayerNorm, fused GELU) for 3x increased throughput - Automatic WebGPU fallback for non-NVIDIA devices - TypeScript API with Rust compute backend - One npm install to get started, prebuilt binaries for every platform Try out the model for yourself: mni-ml.github.io/demos/transfor… Built with @_reesechong. Check out the repos and blog if you want to learn more. Shoutout to @modal for the compute credits allowing me to train on 2 A100 GPUs without going broke cc @sundeep @GavinSherry

English

130

256

3.5K

771.6K

ayush@ayushrgarg·4d

@liao_lucas check your DMs!

English

0

13

lucas liao@liao_lucas·4d

@ayushrgarg @ayushrgarg slide me a referral tho

English

1

0

1

38

ayush@ayushrgarg·5d

is ts tuff

English

12

0

70

4.9K

ayush@ayushrgarg·5d

@KlausCodes u coming?

English

1

0

1

266

Satyam@KlausCodes·5d

@ayushrgarg TUFF

English

1

0

2

321

ayush@ayushrgarg·14 Nis

we do things a lil diff around here

English

1

0

16

892

ayush@ayushrgarg·13 Nis

@forwarddeploy ill bring the parle-g

English

0

2

160

Umesh Khanna 🇨🇦🇺🇸@forwarddeploy·13 Nis

Thinking of hosting more chai and samosas in SF at ours ☕️ Want to come hang out with good people and have fun snacks, lmk below! 🙌

English

224

7

544

57.6K

ayush retweetledi

Modal@modal·8 Nis

The future of artificial intelligence is physical. @physical_int runs robotic control inference on Modal with >2x lower latency than the lag between your brain and your finger.

English

3

30

300

94.1K

ayush@ayushrgarg·5 Nis

@ksgat_ yep

1

0

130

gabe@ksgat_·5 Nis

@ayushrgarg quads?

English

1

0

141

ayush@ayushrgarg·5 Nis

anybody really good @ flying drones located in the bay? will pay you $$$ to fly drones all day free lunch + unlimited snacks & drinks

English

6

1

28

2.4K

ayush@ayushrgarg·5 Nis

@PoG_Shmerb DM me proof of your drone skills; as for the job itself you'll fly a drone in an environment we choose (we'll provide drones + transmitters and anything else you may need) & we'll collect that data

English

0

151

Shmerb@PoG_Shmerb·5 Nis

@ayushrgarg What kind of footage do you want to capture?

English

1

0

1

154

ayush@ayushrgarg·4 Nis

@krupaad lets chat, we're building autonomous drones

English

1

0

7

847

krupa@krupaad·4 Nis

bit late to the recruiting cycle, but looking for a summer internship in ML/hardware/inference!! i've been working on CUDA kernel writing, FPGA acceleration and RTL. would love to find a team doing similar work this summer dual US/Canada citizen, can relocate anywhere DMs open :)

English

36

13

264

30.4K

ayush@ayushrgarg·2 Nis

@NehaKasoju ts not tuff

English

0

23