Martin Vechev
226 posts

Martin Vechev
@mvechev
Professor of Computer Science, ETH Zurich. Founder of INSAIT (https://t.co/bqKTA6e8X0). Works on Safe/Secure AI, LLMs, Quantum. Co-founder of 6 Deep-Tech start-ups.

You should delete your CLAUDE․md/AGENTS․md file. I have a study to prove it.

Full @GoogleAI Gemini 3 results are up on MathArena: ➡️ #1 on 2025 Final-Answer Competitions ➡️ #1 on Apex: 5.2% -> 23.4% new SOTA ➡️ #1 on Visual Math: 79% -> 84% new SOTA ➡️ #2 on Project Euler: 62%, huge jump compared to 2.5 Pro (15%)













We used DeepSeek OCR to extract every dataset from tables/charts across 500k+ AI arXiv papers for $1000 🚀 See which benchmarks are trending and discover datasets you didn't know existed Doing the same task with Mistral OCR would've cost $7500 👀

MathArena goes visual: We evaluated models such as GPT-5 on Math Kangaroo 2025, a recent contest for ages 6-19 where most tasks require visual reasoning. Models struggle the most with tasks for younger kids. For example, they get this task for 1st graders only 3% of the time 🧵

















