Jonathan Lee

42 posts

Jonathan Lee banner
Jonathan Lee

Jonathan Lee

@jon_lee0

research @GoogleDeepMind. co-developed gemini deep think. co-led model training for IMO 🥇 | prev: RL PhD at @StanfordAILab

Mountain View, CA Katılım Temmuz 2025
140 Takip Edilen920 Takipçiler
Sabitlenmiş Tweet
Jonathan Lee
Jonathan Lee@jon_lee0·
I’m excited to share the news of Gemini Deep Think’s gold-medal level performance 🥇 at the International Math Olympiad! It has been an absolute blast building Deep Think this year and then scaling it to the IMO.
Google DeepMind@GoogleDeepMind

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

English
11
9
105
24.6K
Yu Bai
Yu Bai@yubai01·
🧄GPT-5.4 is here. 🚀 If you have felt the step-change improvement in GPT-5.3-Codex, GPT-5.4 brings a similar and bigger improvement into ChatGPT, API, and Codex as a unified model. Super proud of what the @OpenAI team has achieved together!
OpenAI@OpenAI

GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.

English
9
4
197
12.2K
Jonathan Lee
Jonathan Lee@jon_lee0·
We ran our internal system Aletheia (Deep Think) on FirstProof’s research problems during the week they were released. Aletheia returned solutions to problems 2, 5, 7, 8, 9, and 10. We think there’s a pretty good chance they are correct, based on expert analysis.
Jonathan Lee tweet media
Thang Luong@lmthang

Thrilled to share: #Aletheia, our math research agent, just solved 6/10 notoriously hard FirstProof problems autonomously, the best result in the inaugural challenge! To me, this is even bigger than our historic IMO-gold achievement last year; these problems challenge even top mathematicians. We share our results transparently, see paper and full thoughts in the thread. 👇

English
4
10
134
10.7K
Jonathan Lee retweetledi
Thang Luong
Thang Luong@lmthang·
Yes, we provided 3 things for AI-assisted math: * Human-AI interaction (HAI) card (photo), inspired by model cards * Full transcripts github.com/google-deepmin… * A label for novelty-autonomy, inspired by SAE Levels of autonomy, see #Aletheia paper arxiv.org/abs/2602.10177
Thang Luong tweet media
Daniel Litt@littmath

Really good question (note that DeepMind shared transcripts in their recent Aletheia paper, and I think this is clearly best practice). Hopefully OAI follows suit.

English
4
18
123
15K
Jonathan Lee retweetledi
Thang Luong
Thang Luong@lmthang·
Congrats to the whole Deep Think team from @GoogleDeepMind for this amazing milestone of #DeepThink V2 launch! Such a great a model that powers so many state-of-the-art results from reasoning (ARC-AGI2) to deep knowledge (Humanity's Last Exam), multimodality (MMMU-Pro), coding (Codeforces), the math research agent #Aletheia, and scientific discovery (that we shared just yesterday)! Blog: blog.google/innovation-and… It has been a privilege witnessing the relentless progress 🔥: * ChatGPT -> Bard announcement (Mar 2023): 100 days * Announcement of IMO-gold achievement -> DeepThink v1 launch (Jul 2025): 10 days * Announcement of Aletheia agent & advancements in scientific research -> Deep Think v2 launch (Feb 2026): 1 day More to come! Stay tuned!
Thang Luong tweet media
English
15
25
308
22.5K
Jonathan Lee retweetledi
Yi Tay
Yi Tay@YiTayML·
Gemini 3 Deep Think is here! 😎 This model is not only super strong in math and coding (IMO gold and 3455 codeforces ELO), it is also gold standard in physics and chemistry olympiads. 😃 Also sets new records on ARC-AGI-2 and HLE. Proud to be a (core) member of the Deep Think team. 🦾😆. Feeling the AGI!
Yi Tay tweet media
English
10
26
331
15.8K
Jonathan Lee
Jonathan Lee@jon_lee0·
cool new model
Jonathan Lee tweet media
English
2
0
33
1.8K
Jonathan Lee
Jonathan Lee@jon_lee0·
Our latest versions of Deep Think are helping accelerate math research. Our new paper dives into examples of the agents semi-autonomously (and sometimes autonomously) contributing new knowledge.
Thang Luong@lmthang

Research-level mathematics draws on advanced techniques from vast literature, with papers often spanning dozens of pages. While foundation models possess a large knowledge base from pretraining, their understanding of advanced subjects remains superficial due to data scarcity, and they are also prone to hallucinations. As such, in the first paper, "Towards Autonomous Mathematics Research", we built #Aletheia (ancient Greek word for "Truth"), a math research agent, that can iteratively generate, verify, and revise solutions end-to-end in natural language. Link to the paper: github.com/google-deepmin… (to be on arXiv soon!) There are 3 main sources that power Aletheia ...

English
0
0
12
1.1K
Jason Lee
Jason Lee@jasondeanlee·
Thank you gdm for the gemini Ultra. The deep think with gemini 3 is surprisingly good and really fast compared gpt pro. Being 5 or 10 x faster means faster iteration which more important than smart but limited to one shot. with 10 prompts, always get the deep think >> 5.2 pro
English
4
2
109
16.7K
Jonathan Lee retweetledi
Taelin
Taelin@VictorTaelin·
For these wondering, and as expected, Gemini 3 Deep Think solves the stack underflow bug that cost me a few days. The answer is a more decisive than Opus 4.5, the only other public model to solve it (even Gemini 3 Pro fails). It even points the exact location confidently. It takes forever though... I don't have harder tests for now, most my benchmarks are saturated and I'm super busy with SupGen stuff, so that's all I have to say about this one
Taelin tweet media
English
34
21
787
52.1K
Jonathan Lee retweetledi
Google Gemini
Google Gemini@GeminiApp·
Gemini 3 Deep Think is here. Deep Think is our most advanced reasoning mode that explores multiple hypotheses simultaneously to give you an even more sophisticated output.
English
407
750
6K
7.7M
Jonathan Lee retweetledi
Quoc Le
Quoc Le@quocleix·
Gemini 3 Deep Think is next level. Deep Think was the the engine behind our gold medal-level wins at IMO and ICPC, and now powers an even stronger version of Gemini 3. SOTA above SOTA. More to come soon!
Quoc Le tweet media
English
23
54
426
92.3K
Jonathan Lee retweetledi
Thang Luong
Thang Luong@lmthang·
Continuing our IMO-gold journey, I’m delighted to share our #EMNLP2025 paper “Towards Robust Mathematical Reasoning”, which tells some of the key stories behind the success of our advanced Gemini #DeepThink at this year IMO. Finding the right north-star metrics was highly critical for our IMO effort and we did it with #IMOBench, a suite of advanced reasoning benchmarks for foundation models. More importantly, we encourage the community to go beyond short answers and showed that automatic grading of long-form answers is promising! Read on to see our project page, paper, and datasets in the thread 🙂
Thang Luong tweet media
Thang Luong@lmthang

Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this effort and I am grateful to everyone in the team for such an amazing achievement! Blog post in the thread and more to share soon!

English
13
107
711
187.5K
Jonathan Lee retweetledi
Epoch AI
Epoch AI@EpochAIResearch·
We evaluated Gemini 2.5 Deep Think on FrontierMath. There is no API, so we ran it manually. The results: a new record! We also conducted a more holistic evaluation of its math capabilities. 🧵
Epoch AI tweet media
English
22
90
632
148.2K
Jonathan Lee retweetledi
Annie Xie
Annie Xie@_anniexie·
Super excited to share Gemini Robotics 1.5!! Our high-level reasoning model Gemini Robotics-ER 1.5 is also publicly available now! The model is particularly strong at spatial and temporal reasoning, and can use thinking to improve its answers 🧠🤖
Annie Xie tweet media
Google DeepMind@GoogleDeepMind

We’re making robots more capable than ever in the physical world. 🤖 Gemini Robotics 1.5 is a levelled up agentic system that can reason better, plan ahead, use digital tools such as @Google Search, interact with humans and much more. Here’s how it works 🧵

English
1
2
10
1.5K
Jonathan Lee retweetledi
Ted Xiao
Ted Xiao@xiao_ted·
📢The next milestone for intelligent general-purpose robots has arrived! Announcing Gemini Robotics 1.5, our flagship system which brings breakthroughs from frontier models to the physical world with two new SOTA generalists: the GR 1.5 VLA and GR 1.5 embodied reasoning model 🧵
Ted Xiao tweet media
English
6
36
187
26.3K
Jonathan Lee retweetledi
Heng-Tze Cheng
Heng-Tze Cheng@HengTze·
I’m excited to announce that an advanced version of Gemini Deep Think achieved gold-medal level performance at the 2025 ICPC World Finals, one of the world’s most prestigious programming competitions! 🥇Learn more in our blog post: bit.ly/46rvjLs An inspiring moment for me personally was when our model solved a problem that no university team solved during the contest — a true moment of innovation. With Gemini Deep Think achieving gold-level across ICPC & IMO, I think we’re seeing a profound leap in generalization across coding, math and reasoning capabilities, to generate novel solutions to complex problems. This is a huge milestone for us on an amazing journey. Really grateful and proud of our team, for all the hard work and teamwork that made this breakthrough possible. Looking forward to continuing our research, helping people use Gemini to solve some of the hardest unsolved problems in the world!
English
11
44
328
40.7K
Jonathan Lee retweetledi
Dan Hendrycks
Dan Hendrycks@hendrycks·
Few people are aware of how good Gemini Deep Think is. It's at the point where "Should I ask an expert to chew on this or Deep Think?" is often answered with Deep Think. GPT-5 Pro is more "intellectual yet idiot" while Deep Think has better taste. I've been repeating this a lot frequently so deciding to tweet it instead.
English
34
41
535
57.3K