Sachin

2.3K posts

Sachin

@sachdh

cooking reasoning models and agents at @AthenaAgentRL - a narrow intelligence lab

Beigetreten Nisan 2019

838 Folgt4K Follower

Angehefteter Tweet

Sachin@sachdh·22 Tem

Excited to share Aryabhatta 1.0, our leading model that scores 90.2% on JEE Mains, outperforming frontier models like o4 mini and Gemini Flash 2.5 Trained by us at @AthenaAgentRL , in collaboration with @physics__wallah, using custom RLVR training on 130K+ curated JEE problems 7B parameters and 4K context is all you need to crack JEE Also, you don’t need to blindly follow GRPO. Custom objective functions make a huge difference Details below 👇

English

108

191

1.9K

197.8K

Sachin@sachdh·3h

@leonardtang_ @joey00072fp4

QAM

Leonard Tang@leonardtang_·22h

opportunity cost is insanely high for exceptional talent

English

266

20.3K

Sachin@sachdh·11h

@natolambert Please join and leave soon 😜

English

392

Nathan Lambert@natolambert·17h

I am confidentially not joining Anthropic

English

529

53.9K

Sachin@sachdh·1d

@_arohan_ GRPO variants from last year will say hi

English

917

rohan anil@_arohan_·1d

I don’t know what the phenomena is called: Sometimes the field mines improvements near a local neighborhood. Like Adam -> (badam, dadam, madam), Shampoo -> Muon -> (Duon, Buon, Luon), last few made up instead of questioning whether the original formulation itself is the right question. You get so much math explaining these variants bordering slop. Same happened with Transformers too. Mathematically sophisticated but solving the wrong problem.

English

196

17.9K

Sachin retweetet

tokenbender@tokenbender·2d

We are releasing a fully reproducible early preprint of "Prism: Unlocking Language Model Capability Extraction". A trained language model knows many things at once, but deployment usually asks for one behavior at a time. Enterprise scenarios often have few products, workflows, features, or use-cases matter disproportionately. Prism asks and answers a simple question - "Is it possible to isolate and deploy only capabilities that are driven by Pareto principle and cut down costs by a huge margin while preserving most of the performance?" This paper discusses a novel approach to efficiency, understanding model behavior and opens up capability extraction.

English

211

20.9K

Sachin@sachdh·28 May

@ar0cket1 @ChinmayKak i agree about importance. yes, group rewards is monte carlo estimation to reduce variance. but lack / presence of clipping and KL regularization decides if it is REINFORCE or PPO

English

ar0cket1@ar0cket1·28 May

@sachdh @ChinmayKak i would say its more similar to reinforce grouped. imo the source of reward estimation is more important than clipping and KL regularization (you can add clipping and KL on anything, the source of reward is the more defining feature)

English

ar0cket1@ar0cket1·28 May

weird that they found a reinforce like algorithm as the best

Poolside@poolsideai

Today we’re publishing the technical report behind Laguna M.1 and Laguna XS.2. This report opens up more of what went into them: Model Factory, pre-training data, distributed training, post-training, agent RL, quantization, and evaluation. poolside.ai/assets/laguna/…

English

4.8K

Sachin@sachdh·28 May

@ar0cket1 @ChinmayKak GRPO is Group Rewards + PPO

English

ar0cket1@ar0cket1·28 May

@sachdh @ChinmayKak thats basically GRPO

English

121

Sachin@sachdh·28 May

@ar0cket1 @ChinmayKak you can do group rewards with REINFORCE

English

125

ar0cket1@ar0cket1·28 May

@ChinmayKak yeh I’ve seen that too, but GRPO is generally significnatly more informative than REINFORCE so I personally wouldn’t do REINFORCE

English

Sachin@sachdh·27 May

@rronak_ @MichaelElabd @QuantumArjun Congrats @rronak_

English

499

Ronak Malde@rronak_·27 May

Today, @MichaelElabd, @QuantumArjun, and I are excited to announce Trajectory. We are a research lab and product company building the platform for Continual Learning. Our platform unlocks the signal already sitting in product usage, so companies can continuously post-train large-scale agentic models that outperform the frontier. @trajectorylabs We’ve raised $15M from @Conviction, @BessemerVP, @radicalvcfund, @jeffdean, @drfeifei and more. We’re partnering with some of the best AI-native companies: @ClayRunHQ @Harvey, @DecagonAI, @mercor_ai, @RogoAI to power their agentic systems, some of which we are already in production with. We’ve brought together a world class research team from DeepMind, OpenAI, Apple, Meta Superintelligence, Amazon AGI, Scale AI, and an elite product team from Stripe and Figma. AI will never again start on day one. Every correction, every retry, every edit will make products smarter. This is Continual Learning.

English

244

154

1.4K

1.8M

Sachin@sachdh·27 May

@HappyyPablo @ZeroGPU_AI @huggingface congrats bhai!

हिन्दी

312

Shubham Sharma@HappyyPablo·27 May

Super happy that a bunch of people are finding marlin useful. Thank you for the inference support @ZeroGPU_AI @huggingface 🥰🤝 We’ve got hands on more compute now so we’ll also release a series of blogs and benchmarks for the Open source community to use for dense captioning and retrieval

English

5.2K

Sachin@sachdh·23 May

@joey00072fp4 not for long

English

134

joey00072@joey00072fp4·23 May

i love being a small account again

English

755

Sachin@sachdh·21 May

@ycombinator @getfuchsia @togao0 @joey00072fp4

QAM

Y Combinator@ycombinator·21 May

Fuchsia (@getfuchsia) is the fastest way to get your hardware certified. AI agents automate the grunt work that takes consultants weeks, and experts sign off. Congrats on the launch, @togao0! ycombinator.com/launches/QSi-f…

English

137

19.8K

Sachin@sachdh·20 May

@kingofknowwhere @levelheaded_94

QAM

480

Ankit Jxa@kingofknowwhere·20 May

Recently learnt that one of my relatives in this space has quit his job, bought 10k+ phones and is already up 2 Crore by selling egocentric data. I think this is extremely bullish for Urban Company and tje likes.

Ayush Tiwari@sighyush

An American firm is recording Indian workers inside factories and selling their videos to Big Tech to train robots. @raghavKakkar30 and I report for @scroll_in: scroll.in/article/109296…

English

19.3K

Sachin@sachdh·20 May

@ycombinator @memorydotstore Congrats @IshitaJindal17 and @diwanksingh

English

Y Combinator@ycombinator·19 May

Memory Store (@memorydotstore) gives your team and AI agents a shared company brain. Your team's knowledge & decisions are scattered across slack, emails, and people's heads. Memory Store turns them into a living wiki for your agents and teammates. Congrats on the launch, @ishitajindal17 & @diwanksingh! ycombinator.com/launches/QPs-m…

English

549

238.3K

Sachin retweetet

sankalp@dejavucoder·20 May

boosting teor's tweet here... also joey's old account is hacked so dont respond to DMs from it till it gets restored

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Why doesn't X have a dedicated "this account got hacked" report option? I can only report Joey as an impersonator of himself but I want ownership to be restored, not the whole thing to be nuked. @nikitabier

English

1.1K

Sachin retweetet

joey00072@joey00072fp4·19 May

i lost everything job, old phone, twitter account, old guitar

English

7.8K

Sachin@sachdh·20 May

@NirantK @render @ojusave

QAM

Nirant@NirantK·20 May

Folks also seem to be recommending @render, reading up on it now

Nirant@NirantK

My entire co runs on @railway, and we want to move to @cloudflare. Ask: Stack: FastAPI which for request routes + auth to @e2b sbx What should I understand about Durable Objects to use that as a Redis replacement?

English

2.5K

Sachin retweetet

render@infinterenders·18 May

report this asshole hacker he hacked @shxf0072 handle and @joey00072fp4 ( this is real person account he created new one )

English

1.6K

Sachin@sachdh·16 May

@reach_vb no reset yet ..:(

English

Vaibhav (VB) Srivastav@reach_vb·16 May

PSA: rate limits rest across all plans, models and surfaces! Set your /goals and let the tokens rip!!

Tibo@thsottiaux

Codex usage limits have now been reset across all paid plans. Enjoy the weekend!

English

190

15.1K

Sachin@sachdh·16 May

@neural_avb it was a goated team sir most fun I had in a job + unity office had lots of games. so game playing was kinda work 😜

English

AVB@neural_avb·16 May

@sachdh 🙏🏼🙏🏼🙏🏼🙏🏼🙏🏼🙏🏼 goat

English

AVB@neural_avb·16 May

This is what you can achieve with 5-6 hours of Self-Play RL training by the way Actors view the projectiles with lidar scans, picks an action using PPO policy, and competes against past versions of itself in a iterative self-improvement loop. Made in Unity with MLAgents.

Dwarkesh Patel@dwarkesh_sp

New blackboard lecture w @ericjang11 He walks through how to build AlphaGo from scratch, but with modern AI tools. Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn. Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second. Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside. Timestamps: 0:00:00 – Basics of Go 0:08:06 – Monte Carlo Tree Search 0:31:53 – What the neural network does 1:00:22 – Self-play 1:25:27 – Alternative RL approaches 1:45:36 – Why doesn’t MCTS work for LLMs 2:00:58 – Off-policy training 2:11:51 – RL is even more information inefficient than you thought 2:22:05 – Automated AI researchers

English

429

97.9K

Sachin@sachdh·16 May

@neural_avb checked ... it is still under active development MLAgents was my last internship in 2018

English

AVB@neural_avb·16 May

@sachdh This is like 3-4 years ago... I have no idea what's happening with MLAgents now!

English

186

Entdecken

@leonardtang_ @joey00072fp4 @natolambert @_arohan_ @ar0cket1 @ChinmayKak @rronak_ @MichaelElabd