David Dohan

571 posts

David Dohan

David Dohan

@dmdohan

reducing perplexity @openai | past: probabilistic programs, proteins, science & reasoning @ google brain 🧠

Katılım Ağustos 2011
1.6K Takip Edilen12.1K Takipçiler
Sabitlenmiş Tweet
David Dohan
David Dohan@dmdohan·
Happy to release our work on Language Model Cascades. Read on to learn how we can unify existing methods for interacting models (scratchpad/chain of thought, verifiers, tool-use, …) in the language of probabilistic programming. paper: arxiv.org/abs/2207.10342
David Dohan tweet media
English
6
99
677
0
David Dohan retweetledi
OpenAI
OpenAI@OpenAI·
We achieved gold medal-level performance 🥇on the 2025 International Mathematical Olympiad with a general-purpose reasoning LLM! Our model solved world-class math problems—at the level of top human contestants. A major milestone for AI and mathematics.
Alexander Wei@alexwei_

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

English
216
447
4.1K
671.6K
David Dohan retweetledi
Nat McAleese
Nat McAleese@__nmca__·
I feel this may be helpful to some of you today:
Nat McAleese tweet media
English
12
62
712
88.9K
David Dohan
David Dohan@dmdohan·
Fun to watch prediction markets update on the news
English
1
0
14
1.8K
David Dohan
David Dohan@dmdohan·
OpenAI achieved gold medal on 2025 International Math Olympiad (solving 5 of 6 problems)! Thinks for hours and writes proofs in natural language. We've come a long way from LLMs solving 50% of MATH dataset in 2022 Congrats @alexwei_ on spearheading a major milestone!
Alexander Wei@alexwei_

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

English
1
0
124
7.4K
David Dohan
David Dohan@dmdohan·
How to code a side project in 2025: 1. May 31 - Write project spec 2. Procrastinate 6 months 3. Dec 31 - ask favorite AI to implement it
English
2
3
63
4.4K
David Dohan retweetledi
Noam Brown
Noam Brown@polynoamial·
Scaling pretraining and scaling thinking are two different dimensions of improvement. They are complementary, not in competition.
English
49
81
1K
129.3K
David Dohan retweetledi
Noam Brown
Noam Brown@polynoamial·
This is on the scale of the Apollo Program and Manhattan Project when measured as a fraction of GDP. This kind of investment only happens when the science is carefully vetted and people believe it will succeed and be completely transformative. I agree it’s the right time.
OpenAI@OpenAI

Announcing The Stargate Project The Stargate Project is a new company which intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States. We will begin deploying $100 billion immediately. This infrastructure will secure American leadership in AI, create hundreds of thousands of American jobs, and generate massive economic benefit for the entire world. This project will not only support the re-industrialization of the United States but also provide a strategic capability to protect the national security of America and its allies. The initial equity funders in Stargate are SoftBank, OpenAI, Oracle, and MGX. SoftBank and OpenAI are the lead partners for Stargate, with SoftBank having financial responsibility and OpenAI having operational responsibility. Masayoshi Son will be the chairman. Arm, Microsoft, NVIDIA, Oracle, and OpenAI are the key initial technology partners. The buildout is currently underway, starting in Texas, and we are evaluating potential sites across the country for more campuses as we finalize definitive agreements. As part of Stargate, Oracle, NVIDIA, and OpenAI will closely collaborate to build and operate this computing system. This builds on a deep collaboration between OpenAI and NVIDIA going back to 2016 and a newer partnership between OpenAI and Oracle. This also builds on the existing OpenAI partnership with Microsoft. OpenAI will continue to increase its consumption of Azure as OpenAI continues its work with Microsoft with this additional compute to train leading models and deliver great products and services. All of us look forward to continuing to build and develop AI—and in particular AGI—for the benefit of all of humanity. We believe that this new step is critical on the path, and will enable creative people to figure out how to use AI to elevate humanity.

English
247
686
7.5K
917.6K
Raye
Raye@rayefull·
any good research / stories on synesthesia
English
5
0
6
497
David Dohan retweetledi
roon
roon@tszzl·
🚨SCANDAL 🚨 OpenAI trained on the train set for the Millenium Puzzles
English
83
27
1.7K
142.6K
David Dohan retweetledi
Steven Heidel
Steven Heidel@stevenheidel·
these new captchas are getting way too difficult
Steven Heidel tweet media
English
29
96
2K
77.3K
David Dohan retweetledi
roon
roon@tszzl·
o3 has literally made 0% progress on the Millennium eval it’s ai winter now
English
59
30
1.8K
191.7K
David Dohan
David Dohan@dmdohan·
@cHHillee @polynoamial @tamaybes Gotta look for the NP problems of P vs NP: easy to check, hard to do. Not sure what these look like in math outside formal theorem proving
English
0
0
5
315
Horace He
Horace He@cHHillee·
@dmdohan @polynoamial @tamaybes I do think this one of the main obstacles for evals moving forward - “results being easy to check” is also the same condition for models being very amenable to current post-training techniques.
English
1
0
4
419
Tamay Besiroglu
Tamay Besiroglu@tamaybes·
I’m excited to announce the development of Tier 4, a new suite of math problems that go beyond the hardest problems in FrontierMath. o3 is remarkable, but there’s still a ways to go before any single AI system nears the collective prowess of the math community.
English
26
87
1.2K
214K
David Dohan
David Dohan@dmdohan·
@polynoamial @tamaybes iiuc one of the constraints with FrontierMath is that the results are easy to check. Unless we do it with formal theorem proving, I’m not sure how to do that for unsolved problems Though maybe one tier should be unsolved hard to check ones too
English
5
0
15
3.2K
Noam Brown
Noam Brown@polynoamial·
@tamaybes Why not just evaluate the model on unsolved math problems?
English
20
11
423
61.4K
David Dohan retweetledi
xjdr
xjdr@_xjdr·
@btc4me2 this is just one of several insane implications of his statement
English
3
0
58
1.2K
David Dohan
David Dohan@dmdohan·
Caveat on the Tao quote: that refers to the hardest "research" split of the dataset, while the 25% is across the entire dataset. x.com/Jsevillamol/st…
Jaime Sevilla@Jsevillamol

@GarrisonLovely To clear a possible misunderstanding: the quotes refer to questions in the highest tier of difficulty of FrontierMath. Not every question in the benchmark is as difficult as the ones Tao and Gowers reviewed.

English
2
2
46
7.2K
David Dohan
David Dohan@dmdohan·
imo the improvements on FrontierMath are even more impressive than ARG-AGI. Jump from 2% to 25% Terence Tao said the dataset should "resist AIs for several years at least" and "These are extremely challenging. I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages…”
Nat McAleese@__nmca__

Well, on FrontierMath 2024-11-26 o3 improves the state of the art from 2% to 25% accuracy. These are absurdly hard strongly held out math questions. And on ARC, the semi-private test set and public validation set scores are 87.5% (private) and 91.5% (public). (7/n)

English
20
76
887
153.4K
David Dohan
David Dohan@dmdohan·
We are used to the cadence of big model releases: GPT2->3->4 took two years each time We’re in a different world now o1 was announced months ago, now already on next generation Expect faster improvement going forward: o1 is like gpt2 if we could jump to gpt4 ~immediately
David Dohan tweet mediaDavid Dohan tweet mediaDavid Dohan tweet mediaDavid Dohan tweet media
English
12
24
191
48.4K