Akshit Pareek

219 posts

Akshit Pareek

@apareek05

deep learning kernels and ML inference @TXinstruments

Bengaluru Katılım Haziran 2012

457 Takip Edilen95 Takipçiler

Sabitlenmiş Tweet

Akshit Pareek@apareek05·17 May

hacked around with Franka Panda on MuJoCo over the last couple of weeks, trying to get it to pick books off a table and drop them in a box. limited myself to SmolVLA and ACT since i was on my own 5070 Ti.

English

108

12.5K

Akshit Pareek retweetledi

Fabrizio Romano@FabrizioRomano·2d

❤️🤍🏆 Arsenal lift Premier League trophy after winning the title 2025/26! ✨

English

1.5K

14.3K

98.1K

1.2M

Akshit Pareek@apareek05·3d

@aryanmadhaverma 👌

QME

amv@aryanmadhaverma·3d

takes me back to my microprocessors courses at uni a productive weekend break :”)

Dwarkesh Patel@dwarkesh_sp

New blackboard lecture w @reinerpope How do chips actually work – starting with basic logic gates, and working up to why GPUs, TPUs, FPGAs, and the human brain each look the way they do. 0:00:00 – Building a multiply-accumulate from logic gates 0:16:20 – Muxes and the cost of data movement 0:25:59 – How systolic arrays work 0:39:00 – Clock cycles and pipeline registers 0:51:40 – FPGAs vs ASICs 1:03:14 – Cache vs scratchpad 1:07:16 – Why CPU cores are much bigger than GPU cores 1:11:49 – Brains vs chips 1:15:22 – A GPU is just a bunch of tiny TPUs Look up Dwarkesh Podcast on YouTube/Spotify/etc to watch. Enjoy!

English

2.1K

Akshit Pareek retweetledi

Stepan Feduniak@FeduniakS·19 May

Spent last week benchmarking policy speedup methods. Then we just collected faster data and it beat all baselines... Although obvious, but turns out first step to speed up your policy is … collect faster data.

English

21.6K

Akshit Pareek@apareek05·6d

@cneuralnetwork top 3 last month here :)

English

108

neural nets.@cneuralnetwork·6d

use claude 4.7 opus xhigh so much that I reached top 50 on cisco leaderboards for ai usage on day 1 itself 😭😭😭

English

745

24.9K

Akshit Pareek@apareek05·18 May

@richnanophd didn’t need to do it for smolVLA, fit comfortably with bf16 weights during inference on 16gb vram, during training I needed to make some sacrifices. But yeah, for running molmoAct2 as just local inference on 5070ti, I’ll be trying both 4bit and 8bit quantizations.

English

Dr. Richard@richnanophd·17 May

@apareek05 Love this. I fight the same VRAM limits with local sims. Did you quantize SmolVLA to squeeze it onto the 50 Ti? Solid work 👍

English

Akshit Pareek@apareek05·17 May

English

108

12.5K

Akshit Pareek@apareek05·17 May

@sakurayukiai I was able to train SmolVLA in bf16, as only 100M params were trainable, but I did have to settle with a batch size of 16. card was almost at the limit though

English

115

Sakura Yuki@sakurayukiai·17 May

@apareek05 5070 Ti gang 🤝 People sleep on what a single consumer card can actually do. Did you have to drop to a 4-bit quant to save VRAM for the KV cache, or did it squeeze into bf16?

English

155

Akshit Pareek@apareek05·17 May

tried to consolidate everything into this blog: akshitpareek.com/posts/two-week…

English

438

Akshit Pareek@apareek05·17 May

then tried a harder scene from scratch. bigger rack, fancier prompts, 240 fresh demos. SmolVLA stopped following the prompt entirely. 0/10. next: renting an A100 to try pi 0.5 or MolmoAct2.

English

350

Akshit Pareek@apareek05·28 Nis

@jino_rohit learn the hardware

English

107

Jino Rohit@jino_rohit·28 Nis

cuda, triton, cutlass, cute, tilelang, thunderkittens, mojo, helion. so which one do you even learn at this point?

English

255

17.7K

Akshit Pareek retweetledi

TC@totalcristiano·15 Nis

There’s only one Cristiano Ronaldo.

English

130

19.8K

421.8K

Akshit Pareek retweetledi

amv@aryanmadhaverma·10 Nis

wrote about my recent behavior cloning experiment digging a bit into the internals of how action chunking transformers work and the distributed inference over raspi and my mac aryanmadhavverma.com/tech/2026/04/0… imo, writing is one of the best methods of spaced repetition. you're forced to look back at everything you did, identify knowledge gaps, think hard and find more things to dig deep into I was writing a section on cross-attention which got me curious about what the queries are actually learning and what information did the decoder space hold wrote a visualisation script and realised the cross attention queries had developed temporal coherence on their own which meant that early queries learned to attend to the arm, later queries to the target object, and this attention pattern shifted dynamically with each frame, always keeping the robot's relevant parts in focus. no one told them which timestep matters or where to look, they just figured it out from 50 demos!

English

1.1K

Akshit Pareek@apareek05·27 Mar

@aryanmadhaverma @vinitsarode_ 27 😭

amv@aryanmadhaverma·27 Mar

@vinitsarode_ @apareek05 your time to shine

English

2.2K

Vinit Sarode@vinitsarode_·27 Mar

who's the hottest man in blr who we can cast in our launch video. imagine a young 27 yo harvey specter. we shoot this weekend.

English

180

49.4K

Akshit Pareek@apareek05·18 Mar

@aryanmadhaverma @claudeai openrouter/hunter-alpha

Nederlands

amv@aryanmadhaverma·18 Mar

this is not what I pay you 100$/month for @claudeai

English

374

Akshit Pareek retweetledi

Peer Richelsen@peer_rich·16 Mar

the male urge to build a GPU cluster at home

English

237

80.4K

Akshit Pareek@apareek05·15 Mar

@sudoingX Rtx 5070ti, running Qwen 3.5 9B Q4 with 264k context

English

202

Sudo su@sudoingX·15 Mar

drop your GPU below. i'll tell you exactly what model and config to run on it. here's what i've tested and verified on real hardware: RTX 3060 12GB - Qwen 3.5 9B Q4 - 50 tok/s - 128K context RTX 3090 24GB - Qwen 3.5 27B Q4 - 35 tok/s - 300K context RTX 3090 24GB - Qwen 3.5 35B MoE Q4 - 112 tok/s - 262K context 2x RTX 3090 - Qwen3-Coder 80B Q4 - 46 tok/s - full VRAM all running llama.cpp with flash attention. every number is real. every config is tested. if your card isn't on this list drop it below and i'll tell you what fits.

English

724

101

1.6K

192.8K

Akshit Pareek@apareek05·12 Mar

@aryanmadhaverma new ones coming tomorrow

English

amv@aryanmadhaverma·12 Mar

filling in the knowledge gaps ASAP

English

211

Keşfet

@aryanmadhaverma @cneuralnetwork @richnanophd @sakurayukiai @jino_rohit @vinitsarode_ @elonmusk @BarackObama