
Arena.ai
2.9K posts

Arena.ai
@arena
Where AI meets the real world. Formerly LMArena. We measure and advance the frontier of AI through community-driven evaluation. We’re hiring → https://t.co/XBZCrseaWF


Grok 4.20 Beta Reasoning makes @xAI a top 5 lab in Vision Arena. Scoring 1240, this model ranks #11 across all Vision models today. Congrats to the @xAI team for this milestone!

Qwen 3.5 Max Preview has landed in top 10 for Arena Expert and top 15 for Text Arena. It shows particular strength in Math. Highlights: - #3 Math - #10 Expert - #15 Text Arena - Top 20 for Writing, Literature & Language, Life, Physical, & Social Science, Entertainment, Sports, & Media, and Medicine & Healthcare Congrats to the @Alibaba_Qwen team for this new milestone!



GPT-5.4 mini is available today in ChatGPT, Codex, and the API. Optimized for coding, computer use, multimodal understanding, and subagents. And it’s 2x faster than GPT-5 mini. openai.com/index/introduc…










MAI-Image-2 debuts at #5 in the Image Arena! Highlights: - #5 in Text-to-Image overall - #5 for 3D Imaging & Modeling, Cartoon, Anime & Fantasy, Photorealistic & Cinematic Imagery, Art and Portraits - #6 for Product, Branding & Commercial Design Congrats to the @MicrosoftAI team on this milestone!




Qwen 3.5 Max Preview has landed in top 10 for Arena Expert and top 15 for Text Arena. It shows particular strength in Math. Highlights: - #3 Math - #10 Expert - #15 Text Arena - Top 20 for Writing, Literature & Language, Life, Physical, & Social Science, Entertainment, Sports, & Media, and Medicine & Healthcare Congrats to the @Alibaba_Qwen team for this new milestone!



Qwen 3.5 Max Preview has landed in top 10 for Arena Expert and top 15 for Text Arena. It shows particular strength in Math. Highlights: - #3 Math - #10 Expert - #15 Text Arena - Top 20 for Writing, Literature & Language, Life, Physical, & Social Science, Entertainment, Sports, & Media, and Medicine & Healthcare Congrats to the @Alibaba_Qwen team for this new milestone!






Introducing MiniMax-M2.7, our first model which deeply participated in its own evolution, with an 88% win-rate vs M2.5 - Production-Ready SWE: With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%), M2.7 reduced intervention-to-recovery time for online incidents to 3-min on certain occasions. - Advanced Agentic Abilities: Trained for Agent Teams and tool search tool, with 97% skill adherence across 40+ complex skills. M2.7 is on par with Sonnet 4.6 in OpenClaw. - Professional Workspace: SOTA in professional knowledge, supports multi-turn, high-fidelity Office file editing. MiniMax Agent: agent.minimax.io API: platform.minimax.io Token Plan: platform.minimax.io/subscribe/toke…

