Arena.ai

3.2K posts

Arena.ai banner
Arena.ai

Arena.ai

@arena

Where AI meets the real world. Formerly LMArena. We measure and advance the frontier of AI through community-driven evaluation. We’re hiring → https://t.co/XBZCrseaWF

US Katılım Mart 2023
213 Takip Edilen160.7K Takipçiler
Sabitlenmiş Tweet
Arena.ai
Arena.ai@arena·
LMArena is now Arena. A name that takes us back to our roots with a powerful mission: to measure and advance the frontier of AI for real-world use. We have grown from a small PhD research project to a platform powered by a global community of millions. This rebrand has been shaped by the people who use it. 👇 Take a look inside the rebrand.
English
85
104
1.1K
289.1K
Arena.ai
Arena.ai@arena·
New provider HiDream-01-Image by @HiDream_AI ranks #27 overall in the Text-to-Image Arena, making it the #4 open source model. Congrats to the @HiDream_AI team on the release!
Arena.ai tweet media
English
49
11
144
17.9K
Pi Huang
Pi Huang@PiHuang05·
@arena the deepseek account is wrong😭
English
2
0
3
775
Arena.ai
Arena.ai@arena·
5 patterns in Text Arena's price–performance Pareto frontier since 2023: 1. GPT-4-level quality is now ~500x lower cost. - From a ~$50 blended price per million tokens in 2023 to ~$0.10 today. 2. The higher-price end is both better and lower-priced since 2023. - The leading Arena score has climbed ~170 points (1,330 → 1,500). While the price of the higher-end frontier models dropped from ~$50 to ~$20 per million tokens. 3. The low-cost end gained the most. - Under $0.20 per million tokens, the best available model went from ~1,000 Arena score in 2023 to ~1,440 today. 4. The low-cost/top performance gap has nearly closed. - In 2023, sub-$0.20 models trailed the leader by ~350 Arena points. Today, ~60. 5. The cast has rotated quite a bit. - - @OpenAI set the 2023–24 benchmark. - @AIatMeta strengthened the low-cost end in 2024. - @GoogleDeepMind drove the 2025 jump. - @AnthropicAI holds the peak in 2026. - @xAI and Chinese labs like @DeepSeekAI, @Zai_org, @Kimi_Moonshot, @XiaomiMiMo, and @Alibaba_Qwen are continuing to push the mid-price frontier.
English
13
40
367
56.2K
Arena.ai
Arena.ai@arena·
Dive into the details of the Text Arena Pareto frontier. Filter and sort by lab, license, input/output price and context length. arena.ai/leaderboard/te…
English
3
2
9
6.3K
Arena.ai
Arena.ai@arena·
Asked Gemini 3.5 Flash to render the Petra Treasury. It built the entire stone canyon around it - something other frontier models didn't do. Gemini also added ambient sound, which wasn’t in the prompt either. Whether you want this agentic behavior depends on what you're trying to do, but it's a notable departure from how other frontier models behave on the same prompts. More side-by-side prompts with @GoogleDeepMind's latest release in the full video (link in thread) 👇
Arena.ai@arena

Gemini 3.5 Flash has landed #9 for Text and Code Arena: Frontend. Code Arena: Frontend evaluates models on agentic frontend coding tasks from real users building apps and websites (HTML and React). Scoring 1507, this is a significant +70 point improvement over Gemini-3 Flash. Sub-category highlights: - #7 Content Creation Tools - #8 Gaming - #8 Consumer Product - #9 Data & Analytics - #10 Reference-Based Design In Text Arena: #9 overall. Gemini 3.5 Flash also moves the price–performance frontier as the new top Arena score in its price tier. Congrats to the @GoogleDeepMind team on this launch! Click into the thread to see the rankings by each arena.

English
8
3
56
14.8K
Arena.ai
Arena.ai@arena·
Code Arena: Frontend evaluates models on agentic frontend coding tasks from real users building apps and websites (HTML and React). Agents are an entirely different contest. More from Arena soon. Filter and dive into all the Code Arena: Frontend leaderboard details at: arena.ai/leaderboard/co…
English
0
2
19
6.6K
Arena.ai
Arena.ai@arena·
A closer look at Gemini 3.5 Flash by @GoogleDeepMind In the Code Arena: Frontend we see sweeping gains, and a Flash model now surpasses the previous Pro variant. - vs. 3 Flash, a +70 jump overall, large improvements in every subcategory - vs. 3.1 Pro, outperforms it in every category with largest gains in Consumer Product, Content Creation Tools, and Data & Analytics. - vs. 3.1 Pro, demonstrates speed with over 2x output tokens per second Congrats again to @GoogleDeepMind on these improvements!
Arena.ai tweet media
Arena.ai@arena

Gemini 3.5 Flash has landed #9 for Text and Code Arena: Frontend. Code Arena: Frontend evaluates models on agentic frontend coding tasks from real users building apps and websites (HTML and React). Scoring 1507, this is a significant +70 point improvement over Gemini-3 Flash. Sub-category highlights: - #7 Content Creation Tools - #8 Gaming - #8 Consumer Product - #9 Data & Analytics - #10 Reference-Based Design In Text Arena: #9 overall. Gemini 3.5 Flash also moves the price–performance frontier as the new top Arena score in its price tier. Congrats to the @GoogleDeepMind team on this launch! Click into the thread to see the rankings by each arena.

English
16
48
412
39.5K
Arena.ai
Arena.ai@arena·
Gemini 3.5 Flash’s pricing shifts the Pareto frontier in Text. 8 models from @GoogleDeepMind dominate the Text Arena Pareto curve where only 4 labs are represented for top performance in their price tiers.
Arena.ai tweet media
English
3
7
51
10.2K
Arena.ai
Arena.ai@arena·
Gemini 3.5 Flash has landed #9 for Text and Code Arena: Frontend. Code Arena: Frontend evaluates models on agentic frontend coding tasks from real users building apps and websites (HTML and React). Scoring 1507, this is a significant +70 point improvement over Gemini-3 Flash. Sub-category highlights: - #7 Content Creation Tools - #8 Gaming - #8 Consumer Product - #9 Data & Analytics - #10 Reference-Based Design In Text Arena: #9 overall. Gemini 3.5 Flash also moves the price–performance frontier as the new top Arena score in its price tier. Congrats to the @GoogleDeepMind team on this launch! Click into the thread to see the rankings by each arena.
Arena.ai tweet media
Google DeepMind@GoogleDeepMind

Introducing Gemini 3.5: our newest family of models combining frontier intelligence with real-world action. The first release is 3.5 Flash, our strongest model yet for agents and coding 🧵

English
35
61
660
199.3K
Arena.ai retweetledi
Qwen
Qwen@Alibaba_Qwen·
🚀🚀Qwen3.7 Preview lands on Arena ! Here come Qwen3.7-Max-Preview & Qwen3.7-Plus-Preview. Alibaba now #6 lab in Text, #5 in Vision.⚡️⚡️ Can't wait to release Qwen3.7 series models!Stay tuned! @arena
Arena.ai@arena

Qwen3.7 Preview By @Alibaba_Qwen lands on Arena for Text and Vision. In Text Arena, Qwen3.7 Max Preview ranks #13 overall. Alibaba is now the #6 lab in this arena. - #7 Math - #9 Expert - #9 Software & IT - #10 Coding In Vision Arena: Qwen3.7 Plus Preview ranks #16 overall, making Alibaba the #5 lab. Congrats to the @Alibaba_Qwen team on the latest progress!

English
198
378
3.4K
616.3K
Arena.ai
Arena.ai@arena·
In the Expert Arena, Qwen3.7 Max Preview ranks #9 when it comes to expert-only prompts.
Arena.ai tweet media
English
1
3
52
16.2K
Arena.ai
Arena.ai@arena·
Qwen3.7 Preview By @Alibaba_Qwen lands on Arena for Text and Vision. In Text Arena, Qwen3.7 Max Preview ranks #13 overall. Alibaba is now the #6 lab in this arena. - #7 Math - #9 Expert - #9 Software & IT - #10 Coding In Vision Arena: Qwen3.7 Plus Preview ranks #16 overall, making Alibaba the #5 lab. Congrats to the @Alibaba_Qwen team on the latest progress!
Arena.ai tweet media
English
42
59
554
426K