Kefan XIAO

230 posts

Kefan XIAO

Kefan XIAO

@KevinKiao

weightlifting 🏋️ & AI - GDM, previous Anthropic, previous pretraining/data research of Gemini at Google Deepmind. Only represents my personal opinions.

Katılım Mart 2022
389 Takip Edilen569 Takipçiler
Sabitlenmiş Tweet
Kefan XIAO
Kefan XIAO@KevinKiao·
Gemini3 is out! Personally super proud of the coding capabilities in dev’s daily usage. Have been dedicating to it with @pengchengyin for the last several months and partnering with @melvinjohnsonp team to launch it!
Sundar Pichai@sundarpichai

Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting.  Find Gemini 3 Pro rolling out today in the @Geminiapp and AI Mode in Search. For developers, build with it now in @GoogleAIStudio and Vertex AI.  Excited for you to try it!

English
13
6
44
5.9K
Andrew M. Dai
Andrew M. Dai@AndrewDai·
After almost 12 years in Brain/DeepMind, I’ve finally decided to take the leap. My cofounders: @yinfeiy, Seth and I have kicked-off @ElorianAI. The first multimodal reasoning lab founded and led by former LLM pretraining, data and multimodal leads. youtu.be/YlvfNpOMeOY?si… (1/n)
YouTube video
YouTube
English
82
71
778
315.7K
Kefan XIAO retweetledi
Shashwat Goel
Shashwat Goel@ShashwatGoel7·
New Blogpost: How to game the METR plot🚨 In 2025, a single graph changed AGI timelines, investments, research priorities, model quality assessments and much more. But if you squint harder, only 14 prompts shaped AI discourse over this year. Thats all the data in the 1-4 hour horizon length regime that matters. 🕵️ What's more? A majority of these are about Cybersecurity capture the flag contests, and training a Machine Learning model. > Post-train your model on CTF and ML codebases > profit 📈! its METR horizon length will increase. Exactly what OpenAI has been targeting in its Codex model releases... and is Anthropic underperforming in the 2-4hr range because it mostly consists of cybersecurity, which is dual-use for safety? To be clear, I think its an excellent idea to track horizon lengths instead of benchmark accuracy. But under the current modelling assumption of success probability being a logistic function of task length, SWAA+HCAST accuracy improvements alone might explain the exponential progress in horizon length 🔎 In the blog, I show detailed evidence for why we need to stop overindexing on the METR plot. Share it with anyone you see making decisions based on where the latest model lands on the METR plot. shash42.substack.com/p/how-to-game-…
Shashwat Goel tweet media
English
37
69
764
206.3K
Kefan XIAO retweetledi
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Introducing Nano Banana Pro 🍌 aka Gemini 3 Pro Image, our new SOTA image generation and editing model. It is all the things you loved about @NanoBanana, but with some wild new improvements. It is available right now for developers in the Gemini API and in the Gemini App!
English
118
99
1.9K
105K
Kefan XIAO retweetledi
Melvin Johnson
Melvin Johnson@melvinjohnsonp·
I’m especially proud of where we landed on coding and agentic use cases. Looking at the charts for Terminal-bench 2.0, SWE-Bench and 2-Bench compared to Gemini 2.5 shows the incredible jump, but using it to actually solve hard problems is the real win.
Melvin Johnson tweet media
English
1
1
7
443
Melvin Johnson
Melvin Johnson@melvinjohnsonp·
Been waiting a long time to share this one. Meet Gemini 3 Pro. It’s our most intelligent multimodal model that’s deeply capable. x.com/sundarpichai/s…
Sundar Pichai@sundarpichai

Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting.  Find Gemini 3 Pro rolling out today in the @Geminiapp and AI Mode in Search. For developers, build with it now in @GoogleAIStudio and Vertex AI.  Excited for you to try it!

English
11
17
226
28.4K
Jiao Sun
Jiao Sun@sunjiao123sun_·
In the past two months, our small Webapp Coding team have been cooking hard to make Gemini great at WedDev, and we are thrilled to claim the 👑! Yes, we saw your enthusiasm — pelican riding a bike, game controller, please keep trying and sending your best WebDev prompts to our the way! We love them! Besides Webdev Arena, we also achieved #1 on Design Arena across categories: website gen, game gen, ui component gen etc! Website lovers, designers, we can’t wait to hear your feedback!
Google DeepMind@GoogleDeepMind

Our first release is Gemini 3 Pro, which is rolling out globally starting today. It significantly outperforms 2.5 Pro across the board: 🥇 Tops LMArena and WebDev @arena leaderboards 🧠 PhD-level reasoning on Humanity’s Last Exam 📋 Leads long-horizon planning on Vending-Bench 2

English
15
19
264
153.5K
Kefan XIAO retweetledi
Kefan XIAO
Kefan XIAO@KevinKiao·
Gemini3 is out! Personally super proud of the coding capabilities in dev’s daily usage. Have been dedicating to it with @pengchengyin for the last several months and partnering with @melvinjohnsonp team to launch it!
Sundar Pichai@sundarpichai

Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting.  Find Gemini 3 Pro rolling out today in the @Geminiapp and AI Mode in Search. For developers, build with it now in @GoogleAIStudio and Vertex AI.  Excited for you to try it!

English
13
6
44
5.9K
Ankesh Anand
Ankesh Anand@ankesh_anand·
Gemini3 Pro is out, very exciting to be able to push the frontier with this one! There was never a dull day post-training this model, I hope the combination of a strong base model with sota reasoning is evident! This is obviously a big leap compared to 2.5 Pro, but I am excited about our research agenda more than ever. The models will continue to get smarter!
Ankesh Anand tweet media
English
7
8
98
19.5K
Kefan XIAO
Kefan XIAO@KevinKiao·
And this has been a great team work with many friends!
English
0
0
1
152
Kefan XIAO
Kefan XIAO@KevinKiao·
One behavior I really like of gemini3 pro is that it actively uses tools to explore and verify. And it has been helping my daily works! Please apply it in your workflow and tell us how do you feel!
English
1
0
1
179
Varun Mohan
Varun Mohan@_mohansolo·
Excited to launch Google Antigravity, our next generation agentic IDE, now powered by Gemini 3!
English
337
508
8.2K
6.2M