
Gemini 2.5 Pro #1 across ALL categories, tied #1 with Grok-3/GPT-4.5 for Hard Prompts and Coding, and edged out across all others to take the lead 🏇🏆
Dmitry (Dima) Lepikhin
627 posts

@lepikhin
Gemini Pretraining co-lead

Gemini 2.5 Pro #1 across ALL categories, tied #1 with Grok-3/GPT-4.5 for Hard Prompts and Coding, and edged out across all others to take the lead 🏇🏆



Today, we see the result: Muse Spark. I'll be honest, I was surprised how competitive this model is. Progress at OAI, Anthropic, and GDM has been continuous, built on compounding breakthroughs. But MSL's team seems to have taken a single binary leap, catching up on many fronts at once.


08 Champs ☘️


Mark Qiu, CEO of RoboSense, sat down with Bloomberg to discuss our first-ever quarterly profit and explain how digital LiDAR is transforming the industry. Watch the full interview 👇 #RoboSense #DigitalLiDAR #Bloomberg #Robotics




I painstakingly ran all 20 EsoLang-Bench hard problems through Claude webui. It solved 20/20 (100%). No specialized scaffolding, no expert prompting, no few-shot examples, it just solves them natively. This benchmark just suffocated the models with constrictive scaffolding.




Ohhh nooo not my private IP how dare someone use that to train an AI model, only Anthropic has the right to use everyone elses IP nooooo, this cannot stand!




We ran a randomized controlled trial to see if LLMs can help novices perform molecular biology in a wet-lab. The results: LLMs may help in some aspects, but we found no significant increase at the core tasks end-to-end. That's lower than what experts predicted. Our findings 🧵