Chidhambararajan R (a.k.a Chidha)

454 posts

Chidhambararajan R (a.k.a Chidha)

@TheSeriousProg

Just another serious programmer : )

Katılım Kasım 2021

252 Takip Edilen46 Takipçiler

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·3h

@aidenybai Bro it should be the opposite

English

Aiden Bai@aidenybai·1d

introducing cli-to-server turn any CLI into an API server you can call npm i cli-to-server

Aiden Bai@aidenybai

introducing cli-to-js turn any CLI into a typed JavaScript API npm i cli-to-js

English

424

41K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·8h

@MatternJustus Do you have any hiring page? I am interested

English

555

Justus Mattern@MatternJustus·8h

Currently in BLR to help build our model training team! With our recent momentum, it's clear to me that we have a shot at building a frontier research org in India. For this, we need more research talent If you want to work on post-training 100B+ param models, I'd love to meet

Justus Mattern@MatternJustus

Planning my next BLR trip rn - a big focus this time will be recruiting for @ProximalHQ! We have a super talent-dense team in our Bangalore office - some of our teammates are ex YC founders that have successfully sold companies or worked as quants at companies like Jane Street!

English

400

25K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·2d

@adxtyahq Bro that's the worst c curve ball I have seen

English

130

aditya@adxtyahq·3d

just found out that arr[i] and i[arr] both compile in C++ and return the same result

English

133

128

3.1K

847.7K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·4d

@xieenze_jr @yitongli165665 My bad just now saw the prefix.. amazing work!

English

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·4d

@xieenze_jr @yitongli165665 How did you fix the precision mismatch problem in rl rollouts

English

Enze Xie@xieenze_jr·5d

🚀 Excited to share Sol-RL (Speed-of-Light RL) — a new high-efficiency preference alignment method for Diffusion RL, primarily developed by first author Yitong Li (@yitongli165665 )! It uses a smart two-stage design: FP4 for ultra-fast massive rollouts and quick filtering of high-contrast samples, followed by BF16 high-precision optimization on the selected data only. Achieves up to 4.64× faster convergence while delivering better alignment results on SANA, FLUX.1 & SD3.5-L. 📄 Paper: arxiv.org/abs/2604.06916 Let’s push Diffusion RL forward together! 🔥

English

133

17.5K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·4d

@thsottiaux Make a 50 bucks one too!

English

Tibo@thsottiaux·4d

We did it, say hi to the $100 plan! It should be the sweet spot for a ton of you. It comes with a ton of codex usage. And yes we are resetting the limits again too as I mentioned yesterday. Let’s keep building!

OpenAI@OpenAI

We’re updating our ChatGPT Pro and Plus subscriptions to better support the growing use of Codex. We’re introducing a new $100/month Pro tier. This new tier offers 5x more Codex usage than Plus and is best for longer, high-effort Codex sessions. In ChatGPT, this new Pro tier still offers access to all Pro features, including the exclusive Pro model and unlimited access to Instant and Thinking models. To celebrate the launch, we’re increasing Codex usage for a limited time through May 31st so that Pro $100 subscribers get up to 10x usage of ChatGPT Plus on Codex to build your most ambitious ideas.

English

394

118

3.7K

211.4K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·6 Nis

@interaction >says no signup >Opens the page has a sign up Wt

English

Poke@interaction·19 Mar

Starting today, personal superintelligence is just one tap away. No download, no signup. Text Poke for free now: Poke.com 🌴 — 0:00 – What's Poke? 0:50 – Introducing Poke Recipes 1:25 – Create a Recipe in 10 seconds 1:43 – Earn on Poke 2:44 – Build with npx poke 12:58 – Recap 13:36 – Parisian Love

English

187

130

1.5K

846.7K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·6 Nis

There are a couple of issues with your proposal I remember attempting qkv attention pruning in my in home experiments a while ago. I did PCA on the qkv vectors with calibration data, this often shows high compressibility. (Similar to your approach) The results where something like: Top 4 dims: 95% energy Top 16 dim: 98% energy Top 32 dim: 99% energy And this goes quite exponentially Just to be void of biases from the calibration data, did some SVD analysis on the qkv weight matricies and it showcased way higher dim requirements for the same vectors for maintaining similar energy Then decided to drop the idea because: Attention qkv reduction is analogous to vector db retrieval. (In case of vector db the compressed vector's recall with the original should be atleast 99.9% for it to be acceptable). Bringing the same idealogy to qkv matricies the energy retained should atleast be 99% or 99.9% for it be close to lossless And this requirement meant higher number of dims preserved post the compression logic which can potentially reduce the wins You did mention in your paper that you calibrated on wiki test and the perplexity drop was minimal. (But perplexity often doesn't paint a good picture of performance drops) Google's turboquant having close to lossless perf (in 3-3.5bits) makes cause many papers have mentioned these models can't store more than 3bits of information per float A bit more emphasis on benchmarks again I made this comment in LinkedIn too, just putting it again here incase you are not active there

English

368

Ashwin Gopinath@ashwingop·6 Nis

x.com/i/article/2040…

ZXX

350

62.1K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·6 Nis

@hrshc7 Bro need not put ChatGPTed summaries.. the community would love simple explanations in your own terms

English

Harsh Chourasia@hrshc7·5 Nis

Everyone’s busy arguing about “which model is best”… meanwhile a tiny 4B model is out here quietly doing the job 👀 Just saw Gemma-4-E4B casually identify sea animals from images in a single session, no drama, no insane setup, just working. And that’s the part people are missing. We’ve been conditioned to think: bigger = smarter cloud = necessary expensive = better But this flips all of that. A small model, running locally, handling vision tasks end to end… without begging for APIs or burning money per request. Not perfect, not magical. But good enough to be useful and that’s way more important. Because once something is: fast private and basically free …it stops being a “tool” and starts becoming part of your daily workflow. The shift isn’t loud. It’s practical, and already happening. But sure… keep debating benchmarks while this runs on someone’s laptop 🚀

Victor M@victormustar

Watch Gemma-4-E4B casually identify sea animals by classifying images in a single agentic session using its vision capabilities. (impressive for a 4B model 🚀) I'm convinced: the agents of tomorrow are local, free, fast, and run on every computer!

English

2.4K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·4 Nis

@xai Bro cutout of this slop and focus on coding agents bro

English

161

xAI@xai·3 Nis

Introducing Quality mode on Grok Imagine – powered by our most advanced image generation model. Quality mode gives you enhanced details, stronger text rendering, and higher levels of creative control. Now available on web and mobile. Try it at grok.com/imagine

English

4.8K

2.7K

20.3K

4.5M

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·1 Nis

@lateinteraction @prithivida Yeah that was the db company I was talking about

English

Omar Khattab@lateinteraction·1 Nis

@prithivida @TheSeriousProg there's for example Mixedbread dot com

English

Omar Khattab@lateinteraction·1 Nis

overwhelming evidence for late interaction / multi-vector models yet again :-) > even after finetuning, single-vector models lag far behind multi-vector embeddings, which achieve significant performance gains and exhibit greater robustness to catastrophic forgetting.

Sumit@_reachsumit

On Strengths and Limitations of Single-Vector Embeddings Microsoft shows that dimensionality alone cannot explain poor retrieval performance of single-vector embeddings, identifying domain shift and the "drowning in documents" paradox as key factors. 📝 arxiv.org/abs/2603.29519

English

8.4K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·1 Nis

@lateinteraction Untill recent progress from a particular db company wasn't the ram demand and storage demand for multivector embeddings pretty high? Yeah I agree on the transformer vs Convnet example though

English

Omar Khattab@lateinteraction·1 Nis

@TheSeriousProg It’s kind of like saying in 2026 that “transformers like BERT are too expensive so we use a ConvNet”. It’s an incorrect/lazy excuse.

English

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·31 Mar

@ChShersh Well you don't have any problem uploading it to claude endpoints though..

English

605

Dmitrii Kovanikov@ChShersh·31 Mar

Wait, am I the only one who thought all this time that Claude Code was already open-sourced? Why would you feed all your private data into a closed-source project?

Chaofan Shou@Fried_rice

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English

162

31.7K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·30 Mar

@DirhousssiAmine I mean grpo is opd on policy distillation

English

Dirhousssi Amine@DirhousssiAmine·30 Mar

@TheSeriousProg This is GRPO not distillation :)

English

Dirhousssi Amine@DirhousssiAmine·28 Mar

Been going down a massive rabbit hole with numerical stability in RL training lately.🕵️‍♂️🕵️ Take a look at these two GRPO sanity runs. Exact same model, identical task. One climbs perfectly, the other completely flatlines. The only difference? The dead run is in bf16, the successful one is fp32. What do you think the problem is with these runs? Drop your best guesses below !

English

160

33K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·30 Mar

@DirhousssiAmine Because sampling model policy and distillation model policy are closely aligned in similar precisions. Which is not the case with mismatched quantizations.

English

Dirhousssi Amine@DirhousssiAmine·30 Mar

@TheSeriousProg haha yes, but why ?

English

116

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·30 Mar

@elonmusk Musk bro make a coding LLM way more money over there

English

Elon Musk@elonmusk·30 Mar

Try Grok Imagine

Déborah@dvorahfr

50 seconds entirely created using Grok Imagine's tools. Grok can create scenes that could be featured in a film.

English

3.3K

3.5K

29.9K

25M

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·30 Mar

@PriyanshuP1405 I don't even find my 16gb gpu sufficient

English

426

Priyanshu Priyank@PriyanshuP1405·30 Mar

If you're a Machine learning engineer or a quant researcher and find 16 gb to be sufficient stay away from me

aditya@adxtyahq

never buy a 16GB RAM laptop in 2026. you’ll regret it within a week

English

156

13.1K

Chidhambararajan R (a.k.a Chidha)@TheSeriousProg·29 Mar

@Yuchenj_UW Imo in the future presumable software engineers will be valued more for number of average runs per line of code which they right This is just going way tok much in the hype loop

English

Yuchen Jin@Yuchenj_UW·29 Mar

If you had two software engineering offers: > One pays you $500k/year salary, but covers zero LLM tokens. > One pays you $400k/year salary, but gives you $500/day free LLM tokens. Which one are you taking?

English

394

2.2K

539.5K

Keşfet

@aidenybai @MatternJustus @adxtyahq @xieenze_jr @yitongli165665 @thsottiaux @interaction @hrshc7