Chidhambararajan R (a.k.a Chidha)

454 posts

Chidhambararajan R (a.k.a Chidha) banner
Chidhambararajan R (a.k.a Chidha)

Chidhambararajan R (a.k.a Chidha)

@TheSeriousProg

Just another serious programmer : )

Katılım Kasım 2021
252 Takip Edilen46 Takipçiler
Justus Mattern
Justus Mattern@MatternJustus·
Currently in BLR to help build our model training team! With our recent momentum, it's clear to me that we have a shot at building a frontier research org in India. For this, we need more research talent If you want to work on post-training 100B+ param models, I'd love to meet
Justus Mattern@MatternJustus

Planning my next BLR trip rn - a big focus this time will be recruiting for @ProximalHQ! We have a super talent-dense team in our Bangalore office - some of our teammates are ex YC founders that have successfully sold companies or worked as quants at companies like Jane Street!

English
26
15
400
25K
aditya
aditya@adxtyahq·
just found out that arr[i] and i[arr] both compile in C++ and return the same result
aditya tweet media
English
133
128
3.1K
847.7K
Enze Xie
Enze Xie@xieenze_jr·
🚀 Excited to share Sol-RL (Speed-of-Light RL) — a new high-efficiency preference alignment method for Diffusion RL, primarily developed by first author Yitong Li (@yitongli165665 )! It uses a smart two-stage design: FP4 for ultra-fast massive rollouts and quick filtering of high-contrast samples, followed by BF16 high-precision optimization on the selected data only. Achieves up to 4.64× faster convergence while delivering better alignment results on SANA, FLUX.1 & SD3.5-L. 📄 Paper: arxiv.org/abs/2604.06916 Let’s push Diffusion RL forward together! 🔥
English
2
16
133
17.5K
Poke
Poke@interaction·
Starting today, personal superintelligence is just one tap away. No download, no signup. Text Poke for free now: Poke.com 🌴 — 0:00 – What's Poke? 0:50 – Introducing Poke Recipes 1:25 –  Create a Recipe in 10 seconds 1:43 – Earn on Poke 2:44 – Build with npx poke 12:58 – Recap 13:36 – Parisian Love
English
187
130
1.5K
846.7K
Chidhambararajan R (a.k.a Chidha)
Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·
There are a couple of issues with your proposal I remember attempting qkv attention pruning in my in home experiments a while ago. I did PCA on the qkv vectors with calibration data, this often shows high compressibility. (Similar to your approach) The results where something like: Top 4 dims: 95% energy Top 16 dim: 98% energy Top 32 dim: 99% energy And this goes quite exponentially Just to be void of biases from the calibration data, did some SVD analysis on the qkv weight matricies and it showcased way higher dim requirements for the same vectors for maintaining similar energy Then decided to drop the idea because: Attention qkv reduction is analogous to vector db retrieval. (In case of vector db the compressed vector's recall with the original should be atleast 99.9% for it to be acceptable). Bringing the same idealogy to qkv matricies the energy retained should atleast be 99% or 99.9% for it be close to lossless And this requirement meant higher number of dims preserved post the compression logic which can potentially reduce the wins You did mention in your paper that you calibrated on wiki test and the perplexity drop was minimal. (But perplexity often doesn't paint a good picture of performance drops) Google's turboquant having close to lossless perf (in 3-3.5bits) makes cause many papers have mentioned these models can't store more than 3bits of information per float A bit more emphasis on benchmarks again I made this comment in LinkedIn too, just putting it again here incase you are not active there
English
0
0
1
368
Harsh Chourasia
Harsh Chourasia@hrshc7·
Everyone’s busy arguing about “which model is best”… meanwhile a tiny 4B model is out here quietly doing the job 👀 Just saw Gemma-4-E4B casually identify sea animals from images in a single session, no drama, no insane setup, just working. And that’s the part people are missing. We’ve been conditioned to think: bigger = smarter cloud = necessary expensive = better But this flips all of that. A small model, running locally, handling vision tasks end to end… without begging for APIs or burning money per request. Not perfect, not magical. But good enough to be useful and that’s way more important. Because once something is: fast private and basically free …it stops being a “tool” and starts becoming part of your daily workflow. The shift isn’t loud. It’s practical, and already happening. But sure… keep debating benchmarks while this runs on someone’s laptop 🚀
Victor M@victormustar

Watch Gemma-4-E4B casually identify sea animals by classifying images in a single agentic session using its vision capabilities. (impressive for a 4B model 🚀) I'm convinced: the agents of tomorrow are local, free, fast, and run on every computer!

English
1
0
6
2.4K
xAI
xAI@xai·
Introducing Quality mode on Grok Imagine – powered by our most advanced image generation model. Quality mode gives you enhanced details, stronger text rendering, and higher levels of creative control. Now available on web and mobile. Try it at grok.com/imagine
English
4.8K
2.7K
20.3K
4.5M
Omar Khattab
Omar Khattab@lateinteraction·
overwhelming evidence for late interaction / multi-vector models yet again :-) > even after finetuning, single-vector models lag far behind multi-vector embeddings, which achieve significant performance gains and exhibit greater robustness to catastrophic forgetting.
Sumit@_reachsumit

On Strengths and Limitations of Single-Vector Embeddings Microsoft shows that dimensionality alone cannot explain poor retrieval performance of single-vector embeddings, identifying domain shift and the "drowning in documents" paradox as key factors. 📝 arxiv.org/abs/2603.29519

English
4
7
89
8.4K
Chidhambararajan R (a.k.a Chidha)
Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·
@lateinteraction Untill recent progress from a particular db company wasn't the ram demand and storage demand for multivector embeddings pretty high? Yeah I agree on the transformer vs Convnet example though
English
1
0
1
73
Omar Khattab
Omar Khattab@lateinteraction·
@TheSeriousProg It’s kind of like saying in 2026 that “transformers like BERT are too expensive so we use a ConvNet”. It’s an incorrect/lazy excuse.
English
1
0
1
85
Dirhousssi Amine
Dirhousssi Amine@DirhousssiAmine·
Been going down a massive rabbit hole with numerical stability in RL training lately.🕵️‍♂️🕵️ Take a look at these two GRPO sanity runs. Exact same model, identical task. One climbs perfectly, the other completely flatlines. The only difference? The dead run is in bf16, the successful one is fp32. What do you think the problem is with these runs? Drop your best guesses below !
Dirhousssi Amine tweet media
English
13
10
160
33K
Chidhambararajan R (a.k.a Chidha)
Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·
@Yuchenj_UW Imo in the future presumable software engineers will be valued more for number of average runs per line of code which they right This is just going way tok much in the hype loop
English
0
0
0
23
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
If you had two software engineering offers: > One pays you $500k/year salary, but covers zero LLM tokens. > One pays you $400k/year salary, but gives you $500/day free LLM tokens. Which one are you taking?
English
394
18
2.2K
539.5K