
Chenxi Liu
245 posts

Chenxi Liu
@chenxi116
Research scientist @Meta Superintelligence Labs. Previously @GoogleDeepMind Gemini post-trainer. Opinions my own.


🚨Breaking: @GoogleDeepMind’s latest Gemini-2.5-Pro is now ranked #1 across all LMArena leaderboards 🏆 Highlights: - #1 in all text arenas (Coding, Style Control, Creative Writing, etc) - #1 on the Vision leaderboard with a ~70 pts lead! - #1 on WebDev Arena, surpassing Claude for the first time This is the first-ever sweep across text, vision, and WebDev by any model!🥇 Huge congrats to @GoogleDeepMind on this incredible breakthrough!

The Gemini team cooked hard with Gemini 2.5 Pro, it's an awesome model that continues to lead @lmarena_ai - huge congrats to the team! Try it for yourself in the @GeminiApp now. Can't wait for you all to see what else we've been cooking 👀

The release version of Llama 4 has been added to LMArena after it was found out they cheated, but you probably didn't see it because you have to scroll down to 32nd place which is where is ranks

BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer Query, and Multi-Turn! Massive congrats to @GoogleDeepMind for this incredible Arena milestone! 🙌 More highlights in thread👇

Ravens are re-signing LT Ronnie Stanley to a three-year, $60 million deal.

Our latest update to our Gemini 2.0 Flash Thinking model (available here: goo.gle/4jsCqZC) scores 73.3% on AIME (math) & 74.2% on GPQA Diamond (science) benchmarks. Thanks for all your feedback, this represents super fast progress from our first release just this past Dec! Latest version also includes code execution, a 1M token content window & a reduced likelihood of thought-answer contradictions. We’ve been pioneering these types of planning systems for over a decade, starting with programs like AlphaGo, and it is exciting to see the powerful combination of these ideas with the most capable foundation models.

Gemini-2.0-Flash-Thinking #1 across all categories!

It’s still an early version, but check out how the model handles a challenging puzzle involving both visual and textual clues: (2/3)


We’re kicking off the start of our Gemini 2.0 era with Gemini 2.0 Flash, which outperforms 1.5 Pro on key benchmarks at 2X speed (see chart below). I’m especially excited to see the fast progress on coding, with more to come. Developers can try an experimental version in AI Studio and Vertex AI today. It is also available to try in @GeminiApp on the web today, mobile coming soon.

What a way to celebrate one year of incredible Gemini progress -- #1🥇across the board on overall ranking, as well as on hard prompts, coding, math, instruction following, and more, including with style control on. Thanks to the hard work of everyone in the Gemini team and elsewhere at Google! 🎊

Woah, huge news again from Chatbot Arena🔥 @GoogleDeepMind’s just released Gemini (Exp 1121) is back stronger (+20 points), tied #1🏅Overall with the latest GPT-4o-1120 in Arena! Ranking gains since Gemini-Exp-1114: - Overall #3 → #1 - Overall (StyleCtrl): #5 -> #2 - Hard Prompts (StyleCtrl): #3 → #1 - Coding: #3 → #1 - Vision: #1 - Math: #2 → #1 - Creative Writing #2 → #1 Congrats again @GoogleDeepMind! The LLM race is on fire — progress is now measured in days! See more analysis below👇

The Gemini app, now available on iPhone. Download it now in the App Store → goo.gle/4hN1SZe



