Adam Sadovsky

132 posts

Adam Sadovsky

Adam Sadovsky

@asadovsky

CVP, AI at Microsoft AI (past: @GoogleDeepMind, @Google)

Katılım Aralık 2007
530 Takip Edilen1.2K Takipçiler
Dustin Tran
Dustin Tran@dustinvtran·
Post-training at xAI: Over the past few months, our team of a dozen overhauled the RL recipe using user preference on real conversations; and agentic reward models that grade using strong reasoning capabilities. We also scaled up RL an order of magnitude more than the existing pretraining-like scale in Grok 4. Over the mutiple iterations, we learned so much behind the core product, response quality, and style. What I’m personally most proud of with Grok 4.1 is how well we nailed the “fast path”—the default mode without reasoning. Most questions don’t actually need a chain-of-thought. They just need a quick, high-quality answer. Turning reasoning off drops output tokens from ~2300→850, and Grok 4.1 still ranks #2 in LMArena, ahead of every model that’s leaning on reasoning. I've been using 4.1 as my daily driver for the past few weeks. It just feels a lot better than what's available. Less slop-like content. Less generic templating of headers & emojis. Fewer unnecessary guardrails. More personally, it's been three months after leaving Google, and I'm glad to contribute a new model to push RLHF further than ever. @melvinjohnsonp Taking back #1 :-)
English
79
65
1.3K
219.6K
Adam Sadovsky retweetledi
Mustafa Suleyman
Mustafa Suleyman@mustafasuleyman·
Meet our third @MicrosoftAI model: MAI-Image-1 #9 on LMArena, striking an impressive balance of generation speed and quality Excited to keep refining + climbing the leaderboard from here! We're just getting started. microsoft.ai/news/introduci…
Mustafa Suleyman tweet mediaMustafa Suleyman tweet media
English
36
78
507
146.1K
Adam Sadovsky retweetledi
Nando de Freitas
Nando de Freitas@NandoDF·
This was an amazing week at ⁦@MicrosoftAI⁩ !! We released MAI 1 preview and a taste of MAI Voice. I’m super happy with this team - only about 100 people and already shipping in ⁦@lmarena_ai⁩ in less than a year. Strong support. More soon. Thanks for feedback!
English
14
9
166
50K
Adam Sadovsky
Adam Sadovsky@asadovsky·
SOTA just got way cheaper
English
0
1
7
818
Adam Sadovsky retweetledi
Martin Baeuml
Martin Baeuml@mbaeuml·
Just shipped a few updates 1. Gemini 2.5 Pro to try for free on gemini.google.com in the model drop down. Advanced has higher limits. 2. Canvas with 2.5 Pro in Advanced. Our best coding model yet. We had so much fun building demos internally, can't wait to see what y'all come up with!
Google Gemini@GeminiApp

Gemini 2.5 Pro is taking off 🚀🚀🚀 The team is sprinting, TPUs are running hot, and we want to get our most intelligent model into more people’s hands asap. Which is why we decided to roll out Gemini 2.5 Pro (experimental) to all Gemini users, beginning today. Try it at no cost at gemini.google.com

English
12
17
392
55.9K
Adam Sadovsky retweetledi
Bindu Reddy
Bindu Reddy@bindureddy·
WE HAVE A NEW BEST MODEL IN THE WORLD! GEMINI 2.5 IS #1 ON LIVEBENCH
Bindu Reddy tweet media
English
106
163
1.4K
165.4K
Adam Sadovsky retweetledi
Arena.ai
Arena.ai@arena·
BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer Query, and Multi-Turn! Massive congrats to @GoogleDeepMind for this incredible Arena milestone! 🙌 More highlights in thread👇
Arena.ai tweet media
Google DeepMind@GoogleDeepMind

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf

English
73
401
2.3K
466.9K
Adam Sadovsky retweetledi
Kyle Corbitt
Kyle Corbitt@corbtt·
If you're fine-tuning LLMs, Gemma 3 is the new 👑 and it's not close. Gemma 3 trounces Qwen/Llama models at every size! - Gemma 3 4B beats 7B/8B competition - Gemma 3 27B matches 70B competiton Vision benchmarks coming soon!
Kyle Corbitt tweet media
English
19
54
491
36.6K
Adam Sadovsky
Adam Sadovsky@asadovsky·
Wow, quite impressive for a 27B model!
Arena.ai@arena

🎉 Congrats to @GoogleDeepMind on Gemma-3-27B, the newest and one of the strongest open models in Arena! 💠 Top 10 overall - beating out many proprietary models with only 27B parameter 💠 2nd best open model only below DeepSeek-R1 💠 128K context window Check out their blog to learn more about Gemma 3. We can't wait to see where this goes next! 🔥👏

English
0
0
52
2.6K
Adam Sadovsky retweetledi
Subhash Choudhary
Subhash Choudhary@subhashchy·
We replaced GPT-4o with Gemini-2.0 Flash for Bot9, reducing our costs by about 20× with no visible loss in accuracy. This change was implemented on a highly complex support agent that makes 32 tool calls. I was seriously not expecting this. At the application layer, it also made us one of the top 10 apps built with Gemini worldwide — and the only one from India in the list. - Data source : Openrouter.
Subhash Choudhary tweet mediaSubhash Choudhary tweet mediaSubhash Choudhary tweet media
English
40
65
1.2K
129.7K
Adam Sadovsky retweetledi
Farzad Mostashari
Farzad Mostashari@Farzad_MD·
1/ After residency at Mass General Hospital, I reported to Atlanta to meet my fellow CDC Epidemic Intelligence Service Officers. I have never felt so intimidated by my peers The best and the brightest, they were star clinicians, had served in disaster zones; MD/PhDs and MSF.
English
408
4.9K
18.3K
2.6M
Kyunghyun Cho
Kyunghyun Cho@kchonyc·
the lack of consistency in Gemini is driving me crazy. somehow i can only share the conversation if it were with Gemini Advanced 2.0 Flash but not with any other models that are actually helpful and useful ... 🤦‍♂️ what kind of LLM hell is this ???
Kyunghyun Cho tweet mediaKyunghyun Cho tweet mediaKyunghyun Cho tweet media
English
3
0
16
4.7K
Adam Sadovsky retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
We have to take the LLMs to school. When you open any textbook, you'll see three major types of information: 1. Background information / exposition. The meat of the textbook that explains concepts. As you attend over it, your brain is training on that data. This is equivalent to pretraining, where the model is reading the internet and accumulating background knowledge. 2. Worked problems with solutions. These are concrete examples of how an expert solves problems. They are demonstrations to be imitated. This is equivalent to supervised finetuning, where the model is finetuning on "ideal responses" for an Assistant, written by humans. 3. Practice problems. These are prompts to the student, usually without the solution, but always with the final answer. There are usually many, many of these at the end of each chapter. They are prompting the student to learn by trial & error - they have to try a bunch of stuff to get to the right answer. This is equivalent to reinforcement learning. We've subjected LLMs to a ton of 1 and 2, but 3 is a nascent, emerging frontier. When we're creating datasets for LLMs, it's no different from writing textbooks for them, with these 3 types of data. They have to read, and they have to practice.
Andrej Karpathy tweet media
English
386
1.8K
11.8K
695K
Adam Sadovsky retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
Our latest update to our Gemini 2.0 Flash Thinking model (available here: goo.gle/4jsCqZC) scores 73.3% on AIME (math) & 74.2% on GPQA Diamond (science) benchmarks. Thanks for all your feedback, this represents super fast progress from our first release just this past Dec! Latest version also includes code execution, a 1M token content window & a reduced likelihood of thought-answer contradictions. We’ve been pioneering these types of planning systems for over a decade, starting with programs like AlphaGo, and it is exciting to see the powerful combination of these ideas with the most capable foundation models.
Demis Hassabis tweet media
English
122
352
2.6K
695.3K