Jonathan Lai

30 posts

Jonathan Lai

Jonathan Lai

@_JLai

Post training @GoogleDeepMind, Gemini Reasoning, training algorithms, RL, opinions are my own

Katılım Kasım 2012
198 Takip Edilen568 Takipçiler
Prateek Yadav
Prateek Yadav@prateeky2806·
Last week I joined @GoogleDeepMind after leaving Meta. There is so much going on, it's an exciting time. Looking forward to what I will end up doing.
English
21
8
478
22.4K
Jonathan Lai
Jonathan Lai@_JLai·
@denny_zhou Once the model's innovations are better than ours we can finally end our 997 and go on vacation.
English
0
0
1
587
Denny Zhou
Denny Zhou@denny_zhou·
Scaling laws hit a wall. Human innovation doesn’t.
English
39
25
442
66.8K
Jonathan Lai retweetledi
ARC Prize
ARC Prize@arcprize·
Gemini 3 models from @Google @GoogleDeepMind have made a significant 2X SOTA jump on ARC-AGI-2 (Semi-Private Eval) Gemini 3 Pro: 31.11%, $0.81/task Gemini 3 Deep Think (Preview): 45.14%, $77.16/task
ARC Prize tweet media
English
191
607
4.1K
2.2M
Jonathan Lai retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
This is Gemini 3: our most intelligent model that helps you learn, build and plan anything. It comes with state-of-the-art reasoning capabilities, world-leading multimodal understanding, and enables new agentic coding experiences. 🧵
English
213
1.1K
6.5K
1.7M
Jonathan Lai
Jonathan Lai@_JLai·
@XingyouSong Bitter lesson strikes again! RLM’s are such an elegant solution. Congratulations Richard and Yash!
English
0
0
2
130
Jonathan Lai
Jonathan Lai@_JLai·
@yacineMTB It’s more like how a dog learns new tricks, i.e. trying random actions until human gives it a treat. Humans can learn more efficiently by leveraging reasoning.
English
0
0
3
59
kache
kache@yacineMTB·
so how does PPO work. is it basically just button mashing and then figuring out what works? so like exactly like how a human learns?
English
11
0
22
10.9K
yi
yi@agihippo·
Pixel 10 xl or not?
English
1
0
4
1.1K
Jonathan Lai retweetledi
Yi Tay
Yi Tay@YiTayML·
Excited to share that I'll be hosting some of the world's best AI researchers and engineers for our @GoogleDeepMind Gemini event next week in Singapore 🇸🇬! Join @JeffDean, @quocleix, @benoitschilling, @melvinjohnsonp and @denny_zhou for a day of technical conversations, panels and talks about AI, reasoning and our mission to build a world class AI frontier lab in Singapore. If you're in town and would like to attend, please check the RSVP link below👇. Note, subject to capacity constraints and you'll need to be approved to join.
Yi Tay tweet media
English
20
40
337
112.6K
Jonathan Lai
Jonathan Lai@_JLai·
@denny_zhou @jamesjyan117153 Reasoning underpins AGI. I am grateful for the opportunity to contribute to reasoning in Gemini. RL for LLMs was an enormous breakthrough, and there are more fundamental algorithmic breakthroughs yet to come!
English
0
0
16
1.1K
Denny Zhou
Denny Zhou@denny_zhou·
The technique of RL finetuning for reasoning was independently discovered by several labs. At Google DeepMind, credit goes to Jonathan Lai (@_JLai) and James An (@jamesjyan117153) on my team.
English
1
3
82
20.2K
Denny Zhou
Denny Zhou@denny_zhou·
Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial insight is that transformer models can become nearly arbitrarily powerful by generating many intermediate tokens, without the need of scaling the model size (arxiv.org/abs/2402.12875). 2. Pretrained models, even without any fine-tuning, are capable of reasoning. The challenge is that reasoning-based outputs often don’t appear at the top of the output distribution, so standard greedy decoding fails to surface them (arxiv.org/abs/2402.10200) 3. Prompting techniques (e.g., chain-of-thought prompting or "let’s think step by step") and supervised finetuning were commonly used to elicit reasoning. Now, RL finetuning has emerged as the most powerful method. This trick was independently discovered by several labs. At Google, credit goes to Jonathan Lai on my team. Based on our theory ( see point 1), scaling RL should focus on generating long responses rather than something else. 4. LLM reasoning can be hugely improved by generating multiple responses and then aggregating them, rather than relying on a single response (arxiv.org/abs/2203.11171).
English
48
482
3.1K
449.9K
Jonathan Lai retweetledi
Tu Vu
Tu Vu@tuvllms·
Excited to share that our paper on model merging at scale has been accepted to Transactions on Machine Learning Research (TMLR). Huge congrats to my intern @prateeky2806 and our awesome co-authors @_JLai, @alexandraxron, @manaalfar, @mohitban47, and @TsendeeMTS 🎉!!
Tu Vu tweet media
Prateek Yadav@prateeky2806

Ever wondered if model merging works at scale? Maybe the benefits wear off for bigger models? Maybe you considered using model merging for post-training of your large model but not sure if it generalizes well? cc: @GoogleAI @GoogleDeepMind @uncnlp 🧵👇 Excited to announce my internship work on large-scale model merging! We explore what happens when you combine larger and larger language models (up to 64B parameters!) and how different factors –model size, base model quality, merging methods, and # of experts– impact held-in performance and generalization. 📰: arxiv.org/abs/2410.03617

English
2
19
90
9.6K
Jonathan Lai
Jonathan Lai@_JLai·
@BorisMPower @demishassabis We actually published our first Thinking / Reasoning model before the o-series was announced. Math-Specialized 1.5 Pro.
Oriol Vinyals@OriolVinyalsML

Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra. As a math undergrad, our drastic results in mathematics are particularly exciting to me! In section 7 of the tech report, we present new results on a math-specialised variant of Gemini 1.5 Pro which performs strongly on competition-level math problems, including a breakthrough performance of 91.1% on Hendryck’s MATH benchmark without tool-use (examples below 🧵). Gemini 1.5 is widely available, try it out for free here aistudio.google.com & read the full tech report here: goo.gle/GeminiV1-5

English
6
2
131
5.8K
Boris Power
Boris Power@BorisMPower·
@demishassabis Yes, you may have been working on it for a very long time, but you should acknowledge that OpenAI o-series models beat you to it :)
English
24
2
109
9.9K
Demis Hassabis
Demis Hassabis@demishassabis·
We’ve been working on planning and thinking capabilities for our AI models since our AlphaGo days. When these models are given more time to think, responses improve. At I/O we introduced Gemini 2.5 Pro Deep Think, a new enhanced reasoning mode that makes 2.5 Pro even better!
GIF
English
73
135
1.7K
206.9K
Jonathan Lai
Jonathan Lai@_JLai·
@Swarooprm7 Congrats Swaroop! Looking forward to seeing what you and MAI cook up!
English
0
0
2
255
Swaroop Mishra
Swaroop Mishra@Swarooprm7·
Excited to join Microsoft AI. Watch out for some cool stuff coming your way 😎
Swaroop Mishra tweet media
English
84
12
1.4K
90.6K
Swaroop Mishra
Swaroop Mishra@Swarooprm7·
After an amazing journey at Google—first with Google Brain, then Google DeepMind—I’ve made the tough decision to leave. Immensely grateful for the brilliant colleagues and lifelong friends I’ve made along the way. I'm excited to keep riding the AI wave, creating stories that will inspire the next generation!
Swaroop Mishra tweet media
English
83
21
1.3K
140.9K
Jonathan Lai retweetledi
Tu Vu
Tu Vu@tuvllms·
🚨 New paper 🚨 Excited to share my first paper w/ my PhD students!! We find that advanced LLM capabilities conferred by instruction or alignment tuning (e.g., SFT, RLHF, DPO, GRPO) can be encoded into model diff vectors (à la task vectors) and transferred across model versions. 💡You don’t necessarily need to fine-tune from scratch again for every new base model version. Instead, fine-tune once and add the diff vector to updated versions! ♻️♻️♻️. This can also offer a stronger and more computationally efficient starting point when further training is feasible. 📰: tinyurl.com/finetuning-tra… More 👇
Tu Vu tweet mediaTu Vu tweet mediaTu Vu tweet media
English
14
93
439
48K
Jonathan Lai retweetledi
Ankesh Anand
Ankesh Anand@ankesh_anand·
shoutout to the believers!
Ankesh Anand tweet media
English
41
66
1.9K
201.4K
Jonathan Lai retweetledi
Jonathan Lai
Jonathan Lai@_JLai·
A historic elo margin on LMSYS and also crushed almost all reasoning and STEM benchmarks!! So proud of this team!!
Arena.ai@arena

BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer Query, and Multi-Turn! Massive congrats to @GoogleDeepMind for this incredible Arena milestone! 🙌 More highlights in thread👇

English
0
1
4
358
Jonathan Lai retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
1/ Gemini 2.5 is here, and it’s our most intelligent AI model ever. Our first 2.5 model, Gemini 2.5 Pro Experimental is a state-of-the-art thinking model, leading in a wide range of benchmarks – with impressive improvements in enhanced reasoning and coding and now #1 on @lmarena_ai by a significant margin. With a model this intelligent, we wanted to get it to people as quickly as possible.  Find it on Google AI Studio and in the @geminiapp for Gemini Advanced users now – and in Vertex in the coming weeks. This is the start of a new era of thinking models – and we can’t wait to see where things go from here.
English
298
936
6.9K
863.5K