Parsa Ahmadnezhad

75 posts

Parsa Ahmadnezhad banner
Parsa Ahmadnezhad

Parsa Ahmadnezhad

@parsaxa

building stuff @uwaterloo

Toronto, Ontario Katılım Ocak 2021
117 Takip Edilen113 Takipçiler
Parsa Ahmadnezhad retweetledi
anirudh bv
anirudh bv@anirudhbv_ce·
Parsa and I ran into @dwarkesh_sp in the lobby LOL! I told him that I loved his podcast episode with Jensen Huang. Can't wait for the next episode :)
anirudh bv tweet media
South Beach, San Francisco 🇺🇸 English
3
1
105
4.2K
Parsa Ahmadnezhad retweetledi
anirudh bv
anirudh bv@anirudhbv_ce·
Today, I demo'd turboquant-gpu at @agihouse_org in Hillsborough. The founder of @Waymo asked me about the next steps and where I'll take it. I said I'll show him in two weeks and took his email. Stay tuned. Super super excited for what's coming. ✌️
anirudh bv tweet mediaanirudh bv tweet mediaanirudh bv tweet media
San Francisco, CA 🇺🇸 English
15
5
212
10.5K
justinwu
justinwu@byjustinwu·
joining @Tesla in hawthorne, california this fall :)
GIF
English
34
1
187
7.7K
anirudh bv
anirudh bv@anirudhbv_ce·
I presented TurboQuant-GPU at @Shopify Toronto today :) Had a blast talking about KV cache compression and discussing future implementations for agents 🔥 Thank you @fnthawar for inviting me and hosting the talk today 🙏 pip install turboquant-gpu ✌️ ( 1.4k downloads!! )
anirudh bv tweet media
Kensington-Chinatown, Toronto 🇨🇦 English
19
8
339
20.8K
Parsa Ahmadnezhad retweetledi
Nengjia Li
Nengjia Li@nengjiali·
I spent 4 brutal months building a full Stereo Visual SLAM system from scratch in C++17 + CUDA. NO pre-made libraries. NO black-box magic. I just wanted to understand the underlying mechanics behind SLAM. Here’s the intuitive breakdown (the full SLAM pipeline, the math that almost broke me, and the real KITTI footage)👇 cc @aelluswamy your work in this space is a massive inspiration for tackling this from absolute scratch
English
41
135
1.3K
65.7K
Parsa Ahmadnezhad retweetledi
Matthew Wu
Matthew Wu@Matthew_Wu_·
A mains-powered isolated phone charger that I designed fully from discrete analog logic: - Offline isolated flyback topology, 15W of USB output - Custom discrete type II compensation loop - Compensated for 62° phase of margin and 12 dB of gain margin - Schmitt trigger integrator ramp generator + comparator-based PWM control - Designed for IEC 62368-1/CISPR compliance
Matthew Wu tweet mediaMatthew Wu tweet media
English
3
3
25
1.4K
anirudh bv
anirudh bv@anirudhbv_ce·
pip install turboquant-gpu 5.02x KV cache compression for ANY GPU (RTX, H100, A100, B200) - works over @huggingface transformers - dead-simple API: compress + generate in 3 lines - 3-bit Lloyd-Max fused KV compression (0.98 cosine similarity) - outperforms MXFP4 (3.76x) and NVFP4 (3.56x) on compression Ran Mistral-7B: 1,408 KB → 275 KB KV cache (5.02x) Quickstart: github.com/DevTechJr/turb… Written in cuTile (CUDA 12, 13) with PyTorch fallbacks
anirudh bv tweet media
English
79
271
2.3K
157K
Parsa Ahmadnezhad retweetledi
anirudh bv
anirudh bv@anirudhbv_ce·
I implemented @GoogleResearch's TurboQuant as a CUDA-native compression engine on Blackwell B200. 5x KV cache compression on Qwen 2.5-1.5B, near-loseless attention scores, generating live from compressed memory. 5 custom cuTile CUDA kernels ft: - fused attention (with QJL corrections) - online softmax -on-chip cache decompression - pipelined TMA loads Try it out: devtechjr.github.io/turboquant_cut… s/o @blelbach and the cuTile team at @nvidia for lending me Blackwell GPU access :) cc @sundeep @GavinSherry
English
145
308
3.3K
789.5K
Parsa Ahmadnezhad retweetledi
anirudh bv
anirudh bv@anirudhbv_ce·
Finally got my Softmax kernels running on a @nvidia Blackwell B300 today! A single-pass tiled Softmax and a two-pass streaming Online Softmax. Writing ct.load() feels like cheating compared to manual Triton pointer math when mapping directly to TMA hardware.
anirudh bv tweet mediaanirudh bv tweet mediaanirudh bv tweet mediaanirudh bv tweet media
English
9
12
126
8.2K
justinwu
justinwu@byjustinwu·
we built git for video editing. edit in parallel inside davinci resolve.
English
14
9
131
9.7K
Lucas Jin
Lucas Jin@lucashjin·
we just built git for video editing.
English
566
1.1K
17K
2M