Saurabh Srivastava

228 posts

Saurabh Srivastava

Saurabh Srivastava

@_saurabh

Code Lead @ Essential AI - code data, pre/post training, evals; Previously: 2x YC (W15, S18); PhD + Postdoc in Code Synthesis

San Francisco, CA Katılım Kasım 2008
1.4K Takip Edilen1.1K Takipçiler
Sabitlenmiş Tweet
Saurabh Srivastava
Saurabh Srivastava@_saurabh·
Code is an amazing petri dish for measuring and building intelligence. We released Rnj-1 yesterday; a 8B code & science model that you can run offline on your laptop. It got 20.8% (!) on the SWEBench software engineering benchmark. For context: a) that beats all comparable 8B open models by 10x, and b) punches way above its weight. It beats Google's Gemini 2.0 Flash and Alibaba's Qwen2.5 Coder 32B. This performance is close to OpenAI’s GPT-4o and their 120B open model despite being much smaller. This is where Essential AI is starting. Tons to come over the next year. Key to Rnj-1 was extremely strong pre-training (data, optimizers, mixing, infra, evals). Tons of distributed data work. Daily ablations on 0.2–2Bs to find the many hyperparams. New data distributions that teach the model code semantics beyond text, including leveraging PL ideas. Does code & science intelligence translate to other domains? Try it, tune it, and let us know. We're as excited to hear where it fails as much as what it does well.
Saurabh Srivastava tweet media
English
1
1
16
1.1K
Saurabh Srivastava retweetledi
Erik Kuna 🚀
Erik Kuna 🚀@erikkuna·
This is the shot you can’t get from the press site. This camera was sitting a few football fields from the SLS rocket at Pad 39B for days before launch, baking in the Florida sun, surviving rain, humidity, and whatever else the Cape threw at it. No photographer behind the viewfinder. Just a camera, a sound trigger, and a bet. The way pad remotes work: you set your camera up days in advance, dial in your composition, lock everything down, and walk away. You don’t touch it again until after the launch. The shutter fires on sound activation with a @MiopsTrigger smart+ trigger. With SLS, the four RS-25 engines ignite six seconds before the solid rocket boosters, so the camera is already firing before the vehicle even leaves the pad. You get home, pull the card, and find out if you nailed it or if a bird landed on your lens two days ago and left your a present and you got 400 photos of soemthing crappy. There’s no formula for protecting your gear this close. Some photographers build wooden boxes with doors that pop open. Some use plastic bags and tape. Some do plastic or metal barn door rigs on hinges. I tend to leave mine open just in plastic rain covers because boxes limit my composition and setup time, but that means your cameras are more exposed to the elements and whatever energy and debris comes off the pad. You’re basically gambling a camera body every time you set one. That’s what I love about this genre. There’s no playbook. You make it up as you go. Every time is an adventure. 📸 credit: me for @SuperclusterHQ - Artemis II pad remote | ~1,000 ft from Pad 39B | Kennedy Space Center
Erik Kuna 🚀 tweet media
English
705
5.3K
44.4K
1.1M
Saurabh Srivastava retweetledi
Leonardo de Moura
Leonardo de Moura@Leonard41111588·
Whenever I give a talk, people ask me: "What makes Lean different?", "Why did it succeed?" I finally wrote it down. Four things I believe, one honest weakness, and why "I fucking love this shit" keeps happening. leodemoura.github.io/blog/2026-4-2-…
English
5
52
275
23.3K
Saurabh Srivastava retweetledi
Leonardo de Moura
Leonardo de Moura@Leonard41111588·
Cray Distinguished Colloquium at UMN, next Monday. An AI converted zlib to Lean and proved it correct. 10 AI agents built a verified DSL in a weekend. Three IMO teams, no competing platform. The slides are written in Verso: checked by Lean. leodemoura.github.io/static/minneso…
English
8
38
128
10.1K
Saurabh Srivastava
Saurabh Srivastava@_saurabh·
Community together solved 8/10 of first proof. spec + reviews are the new bottlenecks in both code and math. expect to see solutions to the latter this year. Litt: “I actually expect to be doing the best work I’ve ever done, because I’ll have these amazing tools.” “Current AIs, it turns out, are frequently wrong but convincingly confident.” both simultaneously true. for the moment.
Harvard Department of Mathematics@HarvardMath

First Proof is an an effort to see whether LLMs can contribute meaningfully to pure mathematics research. The dust has settled on round one, and the results are surprising. Another round is commencing. scientificamerican.com/article/as-ai-…

English
0
0
3
292
Saurabh Srivastava retweetledi
Jonathan Gorard
Jonathan Gorard@getjonwithit·
I think one of the conclusions we should draw from the tremendous success of LLMs is how much of human knowledge and society exists at very low levels of Kolmogorov complexity. We are entering an era where the minimal representation of a human cultural artifact... (1/12)
English
192
498
4.5K
760K
Saurabh Srivastava retweetledi
Dwayne
Dwayne@CtrlAltDwayne·
The best argument for Rust in 2026 is not memory safety or performance. It is that AI writes better Rust than it writes C++. The compiler feedback loop is so tight that models self-correct in real time. Every error message is a free training signal. Rust was accidentally designed for AI-assisted development 10 years before anyone knew that mattered.
English
110
172
2.5K
171.6K
Saurabh Srivastava retweetledi
Rohan Pandey
Rohan Pandey@khoomeik·
labs will publish details on arch, optim, objectives, scaling, kernels, literally everything except data and academia will be astounded for the hundredth time, wondering to itself where the secret sauce is
English
27
83
1.2K
69.1K
Saurabh Srivastava retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
ah yes, this is what post-agi feels like :) i didn't touch anything. brb sauna
Andrej Karpathy tweet media
English
73
62
1K
162.3K
Saurabh Srivastava retweetledi
Essential AI
Essential AI@essential_ai·
Rnj-1’s performance is especially good in correctness and abstention in its weight class, which are the two most important metrics for this work.
Essential AI tweet media
English
1
2
7
5.6K
Saurabh Srivastava retweetledi
Essential AI
Essential AI@essential_ai·
Rnj-1 has outperformed other open models in its weight class in the largest open-source AI initiative in telecom to-date🚀
English
1
5
32
6.7K
Saurabh Srivastava
Saurabh Srivastava@_saurabh·
Don Knuth co-solving an open problem with human-AI collaboration. Calling it "Claude cycles" feel's like the right attribution. We should note: Noticing that a narrower version of the problem can be solved is an important cognitive ability! A model identifying the right narrowing is impressive. www-cs-faculty.stanford.edu/~knuth/papers/… IIUC, two keys: a) collaborator Filip Stappers designing a protocol where Claude was asked to explore but log each exploration attempt so human review possible, b) claude writing a constructive proof that worked for a limited version of the problem (odd cases, even remains an open problem). appears Knuth/Stappers took the construction and validated it working until 101, and then wrote the proof based on this existence proof. so claude didn't prove it all, but its explorations gave insights for humans to take it to the finish line!
English
0
0
5
438
Saurabh Srivastava retweetledi
Thang Luong
Thang Luong@lmthang·
It is the latter. AI models such as DeepThink currently can't quite invent new theories, but is very good in connecting ideas, e.g., across subfields in maths. Problem #7 of FirstProof is special, Aletheia can solve with very heavy machinery according to our experts x.com/lmthang/status…
English
2
1
16
1.8K
Saurabh Srivastava
Saurabh Srivastava@_saurabh·
What we have signs of life on: -> formalization. its hard but getting there. -> aesthetics of representation. humans state the crux of the problem way more clearly. -> long horizon tasks. math research projects can take years. -> critical re-examination of prior confident answer -> learning on the job. recent concepts have less literature/data. you want the most recent results to be more weighted, but that's not what the data distribution looks like.
English
0
0
0
29
Saurabh Srivastava
Saurabh Srivastava@_saurabh·
What is still missing: -> hypothesis generation: “discovering the statements of crucial lemmas is often much more difficult than proving them” -> goal seeking: human are truth seeking but may make errors. models currently are not similarly goal oriented. -> theory building: answering questions is getting solved. theory building and creative conjecturing still missing.
English
1
0
0
28