Saurabh Srivastava

228 posts

Saurabh Srivastava

@_saurabh

Code Lead @ Essential AI - code data, pre/post training, evals; Previously: 2x YC (W15, S18); PhD + Postdoc in Code Synthesis

San Francisco, CA Katılım Kasım 2008

1.4K Takip Edilen1.1K Takipçiler

Sabitlenmiş Tweet

Saurabh Srivastava@_saurabh·7 Ara

Code is an amazing petri dish for measuring and building intelligence. We released Rnj-1 yesterday; a 8B code & science model that you can run offline on your laptop. It got 20.8% (!) on the SWEBench software engineering benchmark. For context: a) that beats all comparable 8B open models by 10x, and b) punches way above its weight. It beats Google's Gemini 2.0 Flash and Alibaba's Qwen2.5 Coder 32B. This performance is close to OpenAI’s GPT-4o and their 120B open model despite being much smaller. This is where Essential AI is starting. Tons to come over the next year. Key to Rnj-1 was extremely strong pre-training (data, optimizers, mixing, infra, evals). Tons of distributed data work. Daily ablations on 0.2–2Bs to find the many hyperparams. New data distributions that teach the model code semantics beyond text, including leveraging PL ideas. Does code & science intelligence translate to other domains? Try it, tune it, and let us know. We're as excited to hear where it fails as much as what it does well.

English

1.1K

Saurabh Srivastava retweetledi

Erik Kuna 🚀@erikkuna·1d

This is the shot you can’t get from the press site. This camera was sitting a few football fields from the SLS rocket at Pad 39B for days before launch, baking in the Florida sun, surviving rain, humidity, and whatever else the Cape threw at it. No photographer behind the viewfinder. Just a camera, a sound trigger, and a bet. The way pad remotes work: you set your camera up days in advance, dial in your composition, lock everything down, and walk away. You don’t touch it again until after the launch. The shutter fires on sound activation with a @MiopsTrigger smart+ trigger. With SLS, the four RS-25 engines ignite six seconds before the solid rocket boosters, so the camera is already firing before the vehicle even leaves the pad. You get home, pull the card, and find out if you nailed it or if a bird landed on your lens two days ago and left your a present and you got 400 photos of soemthing crappy. There’s no formula for protecting your gear this close. Some photographers build wooden boxes with doors that pop open. Some use plastic bags and tape. Some do plastic or metal barn door rigs on hinges. I tend to leave mine open just in plastic rain covers because boxes limit my composition and setup time, but that means your cameras are more exposed to the elements and whatever energy and debris comes off the pad. You’re basically gambling a camera body every time you set one. That’s what I love about this genre. There’s no playbook. You make it up as you go. Every time is an adventure. 📸 credit: me for @SuperclusterHQ - Artemis II pad remote | ~1,000 ft from Pad 39B | Kennedy Space Center

English

705

5.3K

44.4K

1.1M

Saurabh Srivastava retweetledi

Leonardo de Moura@Leonard41111588·1d

Whenever I give a talk, people ask me: "What makes Lean different?", "Why did it succeed?" I finally wrote it down. Four things I believe, one honest weakness, and why "I fucking love this shit" keeps happening. leodemoura.github.io/blog/2026-4-2-…

English

275

23.3K

Saurabh Srivastava retweetledi

Leonardo de Moura@Leonard41111588·5d

Cray Distinguished Colloquium at UMN, next Monday. An AI converted zlib to Lean and proved it correct. 10 AI agents built a verified DSL in a weekend. Three IMO teams, no competing platform. The slides are written in Verso: checked by Lean. leodemoura.github.io/static/minneso…

English

128

10.1K

Saurabh Srivastava retweetledi

Derek@dhsorens·23 Mar

Lean, formalized maths, and formal verification are about to come screaming into the mainstream

Carlos E. Perez@IntuitMachine

I guess everybody needs to learn Lean now.

English

6.4K

Saurabh Srivastava retweetledi

Bojan Tunguz@tunguz·17 Mar

Everything is coding

The Wall Street Journal@WSJ

Exclusive: OpenAI’s top executives are finalizing plans for a major strategy shift to refocus the company around coding and business users on.wsj.com/3N6CFyr

English

202

42.2K

Saurabh Srivastava@_saurabh·17 Mar

Community together solved 8/10 of first proof. spec + reviews are the new bottlenecks in both code and math. expect to see solutions to the latter this year. Litt: “I actually expect to be doing the best work I’ve ever done, because I’ll have these amazing tools.” “Current AIs, it turns out, are frequently wrong but convincingly confident.” both simultaneously true. for the moment.

Harvard Department of Mathematics@HarvardMath

First Proof is an an effort to see whether LLMs can contribute meaningfully to pure mathematics research. The dust has settled on round one, and the results are surprising. Another round is commencing. scientificamerican.com/article/as-ai-…

English

292

Saurabh Srivastava retweetledi

Jonathan Gorard@getjonwithit·14 Mar

I think one of the conclusions we should draw from the tremendous success of LLMs is how much of human knowledge and society exists at very low levels of Kolmogorov complexity. We are entering an era where the minimal representation of a human cultural artifact... (1/12)

English

192

498

4.5K

760K

Saurabh Srivastava retweetledi

Dwayne@CtrlAltDwayne·13 Mar

The best argument for Rust in 2026 is not memory safety or performance. It is that AI writes better Rust than it writes C++. The compiler feedback loop is so tight that models self-correct in real time. Every error message is a free training signal. Rust was accidentally designed for AI-assisted development 10 years before anyone knew that mattered.

English

110

172

2.5K

171.6K

Saurabh Srivastava retweetledi

Rohan Pandey@khoomeik·13 Mar

labs will publish details on arch, optim, objectives, scaling, kernels, literally everything except data and academia will be astounded for the hundredth time, wondering to itself where the secret sauce is

English

1.2K

69.1K

Saurabh Srivastava retweetledi

Andrej Karpathy@karpathy·6 Mar

ah yes, this is what post-agi feels like :) i didn't touch anything. brb sauna

English

162.3K

Saurabh Srivastava retweetledi

Essential AI@essential_ai·7 Mar

Rnj-1’s performance is especially good in correctness and abstention in its weight class, which are the two most important metrics for this work.

English

5.6K

Saurabh Srivastava retweetledi

Essential AI@essential_ai·7 Mar

Rnj-1 has outperformed other open models in its weight class in the largest open-source AI initiative in telecom to-date🚀

English

6.7K

Saurabh Srivastava retweetledi

Max Hodak@maxhodak_·5 Mar

excited to announce that we've raised a $230 million series c to get our retinal prosthesis to market and our biohybrid and vessel technologies into the clinic!

Science Corporation@ScienceCorp_

We’ve closed a $230 million Series C financing with participation from @khoslaventures, @lightspeedvp, @ycombinator, IQT, and @QuietCapital, all pre-existing investors in Science, among others.

English

805

98.7K

Saurabh Srivastava@_saurabh·4 Mar

Don Knuth co-solving an open problem with human-AI collaboration. Calling it "Claude cycles" feel's like the right attribution. We should note: Noticing that a narrower version of the problem can be solved is an important cognitive ability! A model identifying the right narrowing is impressive. www-cs-faculty.stanford.edu/~knuth/papers/… IIUC, two keys: a) collaborator Filip Stappers designing a protocol where Claude was asked to explore but log each exploration attempt so human review possible, b) claude writing a constructive proof that worked for a limited version of the problem (odd cases, even remains an open problem). appears Knuth/Stappers took the construction and validated it working until 101, and then wrote the proof based on this existence proof. so claude didn't prove it all, but its explorations gave insights for humans to take it to the finish line!

English

438

Saurabh Srivastava retweetledi

Dimitris Papailiopoulos@DimitrisPapail·3 Mar

x.com/i/article/2028…

ZXX

116

796

258.1K

Saurabh Srivastava retweetledi

Thang Luong@lmthang·26 Şub

It is the latter. AI models such as DeepThink currently can't quite invent new theories, but is very good in connecting ideas, e.g., across subfields in maths. Problem #7 of FirstProof is special, Aletheia can solve with very heavy machinery according to our experts x.com/lmthang/status…

English

1.8K

Saurabh Srivastava@_saurabh·25 Şub

I agree that some form of this is inevitable by 2027. I conjecture that we might also see progress in what I call automated *hypothesis generation*. Business needs are driving inference (speed/cost) close to zero. That will unlock directions in hypothesis generation.

Taelin@VictorTaelin

proofs will be worthless and theorem proving will be fully automated. only definitions will still be human-driven, at least in the nearish term, since AI still harshly lacks out-of-the-box thinking. yet, whenever a human comes up with a new cool definition, the AI will be able to quickly explore the full mathematical landscape that unfolds from it, AND use / apply it on existing proofs to certain extent, I believe. so, math will proceed at unprecedented speed, because all mathematicians have to do is think about these cool definitions that will break major walls down the line. their time is 20x more efficient because they spend 0 time proving now

English

116

Saurabh Srivastava@_saurabh·24 Şub

What we have signs of life on: -> formalization. its hard but getting there. -> aesthetics of representation. humans state the crux of the problem way more clearly. -> long horizon tasks. math research projects can take years. -> critical re-examination of prior confident answer -> learning on the job. recent concepts have less literature/data. you want the most recent results to be more weighted, but that's not what the data distribution looks like.

English

Saurabh Srivastava@_saurabh·24 Şub

What is still missing: -> hypothesis generation: “discovering the statements of crucial lemmas is often much more difficult than proving them” -> goal seeking: human are truth seeking but may make errors. models currently are not similarly goal oriented. -> theory building: answering questions is getting solved. theory building and creative conjecturing still missing.

English

Saurabh Srivastava@_saurabh·24 Şub

@littmath's essay is a good read. Daniel is revising his ETA for autonomous AI mathematicians. We got here by mapping math into (lean) code. Contrary to opinions that code is solved, we are just getting started. Using code as the domain language has outsized benefits. Worth reading for: "what's still missing", and "what we are making progress on"

Daniel Litt@littmath

Some thoughts on AI and mathematics, inspired by "First Proof."

English

178

Keşfet

@MiopsTrigger @SuperclusterHQ @littmath @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates