David Dohan

571 posts

David Dohan

@dmdohan

reducing perplexity @openai | past: probabilistic programs, proteins, science & reasoning @ google brain 🧠

Katılım Ağustos 2011

1.6K Takip Edilen12K Takipçiler

Sabitlenmiş Tweet

David Dohan@dmdohan·23 Tem

Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper: arxiv.org/abs/2207.10342

English

676

David Dohan retweetledi

OpenAI@OpenAI·19 Tem

We achieved gold medal-level performance 🥇on the 2025 International Mathematical Olympiad with a general-purpose reasoning LLM! Our model solved world-class math problems—at the level of top human contestants. A major milestone for AI and mathematics.

Alexander Wei@alexwei_

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

English

214

438

673.1K

David Dohan retweetledi

Nat McAleese@__nmca__·19 Tem

I feel this may be helpful to some of you today:

English

713

89K

David Dohan@dmdohan·19 Tem

Fun to watch prediction markets update on the news

English

1.8K

David Dohan@dmdohan·19 Tem

OpenAI achieved gold medal on 2025 International Math Olympiad (solving 5 of 6 problems)! Thinks for hours and writes proofs in natural language. We've come a long way from LLMs solving 50% of MATH dataset in 2022 Congrats @alexwei_ on spearheading a major milestone!

Alexander Wei@alexwei_

English

124

7.4K

David Dohan@dmdohan·1 Haz

How to code a side project in 2025: 1. May 31 - Write project spec 2. Procrastinate 6 months 3. Dec 31 - ask favorite AI to implement it

English

4.4K

David Dohan retweetledi

Noam Brown@polynoamial·27 Şub

Scaling pretraining and scaling thinking are two different dimensions of improvement. They are complementary, not in competition.

English

129.4K

David Dohan retweetledi

Noam Brown@polynoamial·22 Oca

This is on the scale of the Apollo Program and Manhattan Project when measured as a fraction of GDP. This kind of investment only happens when the science is carefully vetted and people believe it will succeed and be completely transformative. I agree it’s the right time.

OpenAI@OpenAI

Announcing The Stargate Project The Stargate Project is a new company which intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States. We will begin deploying $100 billion immediately. This infrastructure will secure American leadership in AI, create hundreds of thousands of American jobs, and generate massive economic benefit for the entire world. This project will not only support the re-industrialization of the United States but also provide a strategic capability to protect the national security of America and its allies. The initial equity funders in Stargate are SoftBank, OpenAI, Oracle, and MGX. SoftBank and OpenAI are the lead partners for Stargate, with SoftBank having financial responsibility and OpenAI having operational responsibility. Masayoshi Son will be the chairman. Arm, Microsoft, NVIDIA, Oracle, and OpenAI are the key initial technology partners. The buildout is currently underway, starting in Texas, and we are evaluating potential sites across the country for more campuses as we finalize definitive agreements. As part of Stargate, Oracle, NVIDIA, and OpenAI will closely collaborate to build and operate this computing system. This builds on a deep collaboration between OpenAI and NVIDIA going back to 2016 and a newer partnership between OpenAI and Oracle. This also builds on the existing OpenAI partnership with Microsoft. OpenAI will continue to increase its consumption of Azure as OpenAI continues its work with Microsoft with this additional compute to train leading models and deliver great products and services. All of us look forward to continuing to build and develop AI—and in particular AGI—for the benefit of all of humanity. We believe that this new step is critical on the path, and will enable creative people to figure out how to use AI to elevate humanity.

English

247

674

7.4K

917.8K

David Dohan@dmdohan·28 Ara

@recurrented x.com/dmdohan/status…

David Dohan@dmdohan

@mollyfmielke There's evidence for it: "In all cases, with exception of S9, they report having owned 1-of-3 toys widely sold by Fisher-Price between 1972 and 1989" Anecdotally, friend traces some # colors to license plate on family car. neurocritic.blogspot.com/2013/01/fisher… study: ncbi.nlm.nih.gov/pmc/articles/P…

QME

202

Raye@rayefull·28 Ara

any good research / stories on synesthesia

English

498

David Dohan@dmdohan·25 Ara

@SteveMoraco same no more joking on the internet allowed

English

229

steve@SteveMoraco·25 Ara

it's been almost a week and i'm still upset at how many people misread this tweet and think it means o3 had to think for 16 hours on each question

David Dohan@dmdohan

o3 @ 87.5% on ARC-AGI It was 16 hours at an increase rate of 3.5% an hour to "solved"

English

1.7K

David Dohan retweetledi

roon@tszzl·22 Ara

🚨SCANDAL 🚨 OpenAI trained on the train set for the Millenium Puzzles

English

1.6K

142.7K

David Dohan retweetledi

Steven Heidel@stevenheidel·21 Ara

these new captchas are getting way too difficult

English

77.3K

David Dohan retweetledi

roon@tszzl·22 Ara

o3 has literally made 0% progress on the Millennium eval it’s ai winter now

English

1.8K

191.7K

David Dohan@dmdohan·22 Ara

@cHHillee @polynoamial @tamaybes Gotta look for the NP problems of P vs NP: easy to check, hard to do. Not sure what these look like in math outside formal theorem proving

English

316

Horace He@cHHillee·22 Ara

@dmdohan @polynoamial @tamaybes I do think this one of the main obstacles for evals moving forward - “results being easy to check” is also the same condition for models being very amenable to current post-training techniques.

English

422

Tamay Besiroglu@tamaybes·22 Ara

I’m excited to announce the development of Tier 4, a new suite of math problems that go beyond the hardest problems in FrontierMath. o3 is remarkable, but there’s still a ways to go before any single AI system nears the collective prowess of the math community.

English

1.2K

214.1K

David Dohan@dmdohan·22 Ara

@polynoamial @tamaybes iiuc one of the constraints with FrontierMath is that the results are easy to check. Unless we do it with formal theorem proving, I’m not sure how to do that for unsolved problems Though maybe one tier should be unsolved hard to check ones too

English

3.2K

Noam Brown@polynoamial·22 Ara

@tamaybes Why not just evaluate the model on unsolved math problems?

English

420

61.4K

David Dohan retweetledi

Liam Fedus@LiamFedus·21 Ara

I have yet to find a well-defined task that cannot be optimized by these models. Eval improvement like ARC AGI showcase this dynamic

AshutoshShrivastava@ai_for_success

So we went from 0 to 87% in 5 years in ARC AGI score. There is no wall it seems. GPT-2 (2019): 0% GPT-3 (2020): 0% GPT-4 (2023): 2% GPT-4o (2024): 5% o1-preview (2024): 21% o1 high (2024): 32% o1 Pro (2024): ~50% o3 tuned low (2024): 76% o3 tuned high (2024): 87%

English

113

32.2K

David Dohan@dmdohan·21 Ara

still a ways to go on FrontierMath!

Nat McAleese@__nmca__

Lots of folks are posting quotes from Gowers/Tao about the hardest split of FrontierMath, but our 25% score is on the full set (which is also extremely hard, with old sota 2%, but not as hard as those quotes imply).

English

3.5K

David Dohan@dmdohan·21 Ara

@_xjdr @btc4me2 tbc it's a joke - literally meant it had been 16 hours since previous post & the o1->o3 jump is 32->87% x.com/dmdohan/status…

David Dohan@dmdohan

At this rate, how long til ARC-AGI is “solved”? For context: - gpt-4o @ 5% - Sonnet3.5 @ 14% - o1-preview @ 18% - o1 @ 32% - best scaffolded solution @ 54%

English

1.7K

xjdr@_xjdr·20 Ara

@btc4me2 this is just one of several insane implications of his statement

English

1.2K

xjdr@_xjdr·20 Ara

there are several layers to how wild this statement is

David Dohan@dmdohan

o3 @ 87.5% on ARC-AGI It was 16 hours at an increase rate of 3.5% an hour to "solved"

English

299

16.6K

David Dohan@dmdohan·21 Ara

Caveat on the Tao quote: that refers to the hardest "research" split of the dataset, while the 25% is across the entire dataset. x.com/Jsevillamol/st…

Jaime Sevilla@Jsevillamol

@GarrisonLovely To clear a possible misunderstanding: the quotes refer to questions in the highest tier of difficulty of FrontierMath. Not every question in the benchmark is as difficult as the ones Tao and Gowers reviewed.

English

7.2K

David Dohan@dmdohan·20 Ara

FrontierMath details: arxiv.org/html/2411.0487…

English

8.2K

David Dohan@dmdohan·20 Ara

imo the improvements on FrontierMath are even more impressive than ARG-AGI. Jump from 2% to 25% Terence Tao said the dataset should "resist AIs for several years at least" and "These are extremely challenging. I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages…”

Nat McAleese@__nmca__

Well, on FrontierMath 2024-11-26 o3 improves the state of the art from 2% to 25% accuracy. These are absurdly hard strongly held out math questions. And on ARC, the semi-private test set and public validation set scores are 87.5% (private) and 91.5% (public). (7/n)

English

886

153.4K

David Dohan@dmdohan·20 Ara

@charles_irl openai.com/index/early-ac…

QME

670

Charles 🎉 Frye@charles_irl·20 Ara

@dmdohan when can i do my "why do computers get hot?" eval

English

707

David Dohan@dmdohan·20 Ara

We are used to the cadence of big model releases: GPT2->3->4 took two years each time We’re in a different world now o1 was announced months ago, now already on next generation Expect faster improvement going forward: o1 is like gpt2 if we could jump to gpt4 ~immediately

English

191

48.5K

Keşfet

@alexwei_ @SteveMoraco @cHHillee @polynoamial @tamaybes @elonmusk @BarackObama @taylorswift13