Chenxi Liu

245 posts

Chenxi Liu banner
Chenxi Liu

Chenxi Liu

@chenxi116

Research scientist @Meta Superintelligence Labs. Previously @GoogleDeepMind Gemini post-trainer. Opinions my own.

Katılım Ağustos 2015
284 Takip Edilen1.6K Takipçiler
Chenxi Liu retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
Having a deep think...
Sundar Pichai tweet media
English
856
946
30.3K
2.7M
Chenxi Liu
Chenxi Liu@chenxi116·
GIF
Arena.ai@arena

🚨Breaking: @GoogleDeepMind’s latest Gemini-2.5-Pro is now ranked #1 across all LMArena leaderboards 🏆 Highlights: - #1 in all text arenas (Coding, Style Control, Creative Writing, etc) - #1 on the Vision leaderboard with a ~70 pts lead! - #1 on WebDev Arena, surpassing Claude for the first time This is the first-ever sweep across text, vision, and WebDev by any model!🥇 Huge congrats to @GoogleDeepMind on this incredible breakthrough!

ZXX
1
0
30
1.5K
Chenxi Liu
Chenxi Liu@chenxi116·
2.5 Flash 🚀 This IS the norm 👇
Chenxi Liu tweet media
English
0
3
53
2.9K
Chenxi Liu
Chenxi Liu@chenxi116·
BOOM! You might have guessed pro thinking, but bet you didn't expect 2.5 :) Seriously, congratulations to everyone involved. Everything came together so beautifully (and so fast!)
Arena.ai@arena

BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆 Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer Query, and Multi-Turn! Massive congrats to @GoogleDeepMind for this incredible Arena milestone! 🙌 More highlights in thread👇

English
0
0
11
820
Chenxi Liu
Chenxi Liu@chenxi116·
TIL mendelssohn has a violin concerto *in d*??
English
0
0
0
322
Chenxi Liu
Chenxi Liu@chenxi116·
More to come, in several senses :)
Demis Hassabis@demishassabis

Our latest update to our Gemini 2.0 Flash Thinking model (available here: goo.gle/4jsCqZC) scores 73.3% on AIME (math) & 74.2% on GPQA Diamond (science) benchmarks. Thanks for all your feedback, this represents super fast progress from our first release just this past Dec! Latest version also includes code execution, a 1M token content window & a reduced likelihood of thought-answer contradictions. We’ve been pioneering these types of planning systems for over a decade, starting with programs like AlphaGo, and it is exciting to see the powerful combination of these ideas with the most capable foundation models.

English
4
5
170
19.6K
Chenxi Liu
Chenxi Liu@chenxi116·
what is vacation
English
1
0
2
772
Chenxi Liu retweetledi
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
🤔
ART
11
15
747
212.1K
Chenxi Liu
Chenxi Liu@chenxi116·
We've obviously trained this model for a little while, so today was just a normal day at work. But seeing the somewhat-familiar numbers, now not from raw internal docs but from nicely-formatted CEO's tweet, is an odd feeling unlike any other. Amazing stories. Insane team work.
Sundar Pichai@sundarpichai

We’re kicking off the start of our Gemini 2.0 era with Gemini 2.0 Flash, which outperforms 1.5 Pro on key benchmarks at 2X speed (see chart below). I’m especially excited to see the fast progress on coding, with more to come.  Developers can try an experimental version in AI Studio and Vertex AI today. It is also available to try in @GeminiApp on the web today, mobile coming soon.

English
13
13
467
41.9K
Chenxi Liu
Chenxi Liu@chenxi116·
we ate a pudding, then carried on sprinting Gemini-Exp-1121 and Gemini-Exp-1114 score much higher than Gemini-1.5-Pro-002
Arena.ai@arena

Woah, huge news again from Chatbot Arena🔥 @GoogleDeepMind’s just released Gemini (Exp 1121) is back stronger (+20 points), tied #1🏅Overall with the latest GPT-4o-1120 in Arena! Ranking gains since Gemini-Exp-1114: - Overall #3#1 - Overall (StyleCtrl): #5 -> #2 - Hard Prompts (StyleCtrl): #3#1 - Coding: #3#1 - Vision: #1 - Math: #2#1 - Creative Writing #2#1 Congrats again @GoogleDeepMind! The LLM race is on fire — progress is now measured in days! See more analysis below👇

English
6
1
60
7.8K