Adam Sadovsky

132 posts

Adam Sadovsky

@asadovsky

CVP, AI at Microsoft AI (past: @GoogleDeepMind, @Google)

Katılım Aralık 2007

530 Takip Edilen1.2K Takipçiler

Adam Sadovsky@asadovsky·18 Kas

@dustinvtran Nice work!!

English

2.1K

Dustin Tran@dustinvtran·18 Kas

Post-training at xAI: Over the past few months, our team of a dozen overhauled the RL recipe using user preference on real conversations; and agentic reward models that grade using strong reasoning capabilities. We also scaled up RL an order of magnitude more than the existing pretraining-like scale in Grok 4. Over the mutiple iterations, we learned so much behind the core product, response quality, and style. What I’m personally most proud of with Grok 4.1 is how well we nailed the “fast path”—the default mode without reasoning. Most questions don’t actually need a chain-of-thought. They just need a quick, high-quality answer. Turning reasoning off drops output tokens from ~2300→850, and Grok 4.1 still ranks #2 in LMArena, ahead of every model that’s leaning on reasoning. I've been using 4.1 as my daily driver for the past few weeks. It just feels a lot better than what's available. Less slop-like content. Less generic templating of headers & emojis. Fewer unnecessary guardrails. More personally, it's been three months after leaving Google, and I'm glad to contribute a new model to push RLHF further than ever. @melvinjohnsonp Taking back #1 :-)

English

1.3K

219.6K

Adam Sadovsky retweetledi

Mustafa Suleyman@mustafasuleyman·13 Eki

Meet our third @MicrosoftAI model: MAI-Image-1 #9 on LMArena, striking an impressive balance of generation speed and quality Excited to keep refining + climbing the leaderboard from here! We're just getting started. microsoft.ai/news/introduci…

English

507

146.1K

Adam Sadovsky retweetledi

Nando de Freitas@NandoDF·30 Ağu

This was an amazing week at ⁦@MicrosoftAI⁩ !! We released MAI 1 preview and a taste of MAI Voice. I’m super happy with this team - only about 100 people and already shipping in ⁦@lmarena_ai⁩ in less than a year. Strong support. More soon. Thanks for feedback!

English

166

50K

Adam Sadovsky@asadovsky·28 Ağu

hello, world!

Arena.ai@arena

🚨Text Leaderboard Update: A new model provider, @MicrosoftAI has broken into the Top 15 this week! 💠MAI-1-preview by @MicrosoftAI debuts at #13. Congrats to the Microsoft AI team! As the Text Arena is one of the most competitive races, breaking into the Top 15 is no small feat. 💪

English

110

20.9K

Adam Sadovsky@asadovsky·6 Nis

Interesting

Xeophon@xeophon

Llama 4 on LMsys is a totally different style than Llama 4 elsewhere, even if you use the recommended system prompt. Tried various prompts myself META did not do a specific deployment / system prompt just for LMsys, did they? 👀

English

1.1K

Adam Sadovsky@asadovsky·4 Nis

SOTA just got way cheaper

English

818

Adam Sadovsky@asadovsky·2 Nis

Quite interesting to see how some models generalize dramatically better!

Mislav Balunović@mbalunovic

Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.

English

2.2K

Adam Sadovsky retweetledi

Martin Baeuml@mbaeuml·30 Mar

Just shipped a few updates 1. Gemini 2.5 Pro to try for free on gemini.google.com in the model drop down. Advanced has higher limits. 2. Canvas with 2.5 Pro in Advanced. Our best coding model yet. We had so much fun building demos internally, can't wait to see what y'all come up with!

Google Gemini@GeminiApp

Gemini 2.5 Pro is taking off 🚀🚀🚀 The team is sprinting, TPUs are running hot, and we want to get our most intelligent model into more people’s hands asap. Which is why we decided to roll out Gemini 2.5 Pro (experimental) to all Gemini users, beginning today. Try it at no cost at gemini.google.com

English

392

55.9K

Adam Sadovsky@asadovsky·29 Mar

Gemini 2.5 Pro is SOTA on pretty much everything

Silas Alberti@silasalberti

Wow we just ran Gemini 2.5 Pro on our evals and it got a new state of the art. Congrats to the Gemini team! Sharing preliminary results here and working on bringing it into Devin:

English

342

26.9K

Adam Sadovsky retweetledi

Bindu Reddy@bindureddy·26 Mar

WE HAVE A NEW BEST MODEL IN THE WORLD! GEMINI 2.5 IS #1 ON LIVEBENCH

English

106

163

1.4K

165.4K

Adam Sadovsky retweetledi

Arena.ai@arena·25 Mar

BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer Query, and Multi-Turn! Massive congrats to @GoogleDeepMind for this incredible Arena milestone! 🙌 More highlights in thread👇

Google DeepMind@GoogleDeepMind

Think you know Gemini? 🤔 Think again. Meet Gemini 2.5: our most intelligent model 💡 The first release is Pro Experimental, which is state-of-the-art across many benchmarks - meaning it can handle complex problems and give more accurate responses. Try it now → goo.gle/4c2HKjf

English

401

2.3K

466.9K

Adam Sadovsky@asadovsky·25 Mar

cook or die

English

373

Adam Sadovsky retweetledi

Kyle Corbitt@corbtt·21 Mar

If you're fine-tuning LLMs, Gemma 3 is the new 👑 and it's not close. Gemma 3 trounces Qwen/Llama models at every size! - Gemma 3 4B beats 7B/8B competition - Gemma 3 27B matches 70B competiton Vision benchmarks coming soon!

English

491

36.6K

Adam Sadovsky@asadovsky·12 Mar

Wow, quite impressive for a 27B model!

Arena.ai@arena

🎉 Congrats to @GoogleDeepMind on Gemma-3-27B, the newest and one of the strongest open models in Arena! 💠 Top 10 overall - beating out many proprietary models with only 27B parameter 💠 2nd best open model only below DeepSeek-R1 💠 128K context window Check out their blog to learn more about Gemma 3. We can't wait to see where this goes next! 🔥👏

English

2.6K

Adam Sadovsky retweetledi

Subhash Choudhary@subhashchy·19 Şub

We replaced GPT-4o with Gemini-2.0 Flash for Bot9, reducing our costs by about 20× with no visible loss in accuracy. This change was implemented on a highly complex support agent that makes 32 tool calls. I was seriously not expecting this. At the application layer, it also made us one of the top 10 apps built with Gemini worldwide — and the only one from India in the list. - Data source : Openrouter.

English

1.2K

129.7K

Adam Sadovsky retweetledi

Farzad Mostashari@Farzad_MD·15 Şub

1/ After residency at Mass General Hospital, I reported to Atlanta to meet my fellow CDC Epidemic Intelligence Service Officers. I have never felt so intimidated by my peers The best and the brightest, they were star clinicians, had served in disaster zones; MD/PhDs and MSF.

English

408

4.9K

18.3K

2.6M

Adam Sadovsky@asadovsky·11 Şub

@kchonyc Fix is on the way. @mbaeuml @pkanez

English

170

Kyunghyun Cho@kchonyc·8 Şub

the lack of consistency in Gemini is driving me crazy. somehow i can only share the conversation if it were with Gemini Advanced 2.0 Flash but not with any other models that are actually helpful and useful ... 🤦‍♂️ what kind of LLM hell is this ???

English

4.7K

Adam Sadovsky retweetledi

Andrej Karpathy@karpathy·30 Oca

We have to take the LLMs to school. When you open any textbook, you'll see three major types of information: 1. Background information / exposition. The meat of the textbook that explains concepts. As you attend over it, your brain is training on that data. This is equivalent to pretraining, where the model is reading the internet and accumulating background knowledge. 2. Worked problems with solutions. These are concrete examples of how an expert solves problems. They are demonstrations to be imitated. This is equivalent to supervised finetuning, where the model is finetuning on "ideal responses" for an Assistant, written by humans. 3. Practice problems. These are prompts to the student, usually without the solution, but always with the final answer. There are usually many, many of these at the end of each chapter. They are prompting the student to learn by trial & error - they have to try a bunch of stuff to get to the right answer. This is equivalent to reinforcement learning. We've subjected LLMs to a ton of 1 and 2, but 3 is a nascent, emerging frontier. When we're creating datasets for LLMs, it's no different from writing textbooks for them, with these 3 types of data. They have to read, and they have to practice.

English

386

1.8K

11.8K

695K

Adam Sadovsky retweetledi

Raveesh 折図@raveeshbhalla·26 Oca

@deedydas @_sholtodouglas As @swyx’s Pareto frontier graph shows, Gemini is arguably the real story of the past few months

English

581

463.4K

Adam Sadovsky retweetledi

Demis Hassabis@demishassabis·22 Oca

Our latest update to our Gemini 2.0 Flash Thinking model (available here: goo.gle/4jsCqZC) scores 73.3% on AIME (math) & 74.2% on GPQA Diamond (science) benchmarks. Thanks for all your feedback, this represents super fast progress from our first release just this past Dec! Latest version also includes code execution, a 1M token content window & a reduced likelihood of thought-answer contradictions. We’ve been pioneering these types of planning systems for over a decade, starting with programs like AlphaGo, and it is exciting to see the powerful combination of these ideas with the most capable foundation models.

English

122

352

2.6K

695.3K

Keşfet

@dustinvtran @melvinjohnsonp @MicrosoftAI @lmarena_ai @GoogleDeepMind @kchonyc @mbaeuml @pkanez