

Xinyang (Young) Geng
68 posts

@younggeng
Research scientist at Google DeepMind. Opinions are my own.




Gemini 2.5 Flash is described as being optimized for speed and scalability. Despite its lighter design, the community voted for it's impressive performance on Hard Prompts, Coding, and Long Queries. Matching the strength of its older sibling, Gemini 2.5 Pro at #1 in these categories

Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.






Unpopular opinion: benchmarks like these are moving the field in the wrong direction No I don't want an AI to be able to memorize (useless?) questions like "How many paired tendons are supported by a sesamoid bone?" in its weights I want the "intern", as @karpathy is suggesting

two aidanbench updates: > gemini-2.0-flash-thinking is now #2 (explanation for score change below) > deepseek v3 is #22 (thoughts below)

We released Gemini 2.0 Flash Thinking today! ⚡️🤔 It's a small step towards improved reasoning via inference-time compute, built on top of our small and mighty 2.0 Flash!

This has been and will continue to be my recommendation for anyone in this position. Learn jax and sign up for sites.research.google/trc/about/ Its one of the best things Google has ever done. You can do meaningful research for free, but the learning curve is steep. strap in