Arena.ai (@arena) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Arena.ai@arena·28 Oca

LMArena is now Arena. A name that takes us back to our roots with a powerful mission: to measure and advance the frontier of AI for real-world use. We have grown from a small PhD research project to a platform powered by a global community of millions. This rebrand has been shaped by the people who use it. 👇 Take a look inside the rebrand.

English

85

104

1.1K

289.1K

Arena.ai@arena·4d

See the Text-to-Image Arena leaderboard details at: arena.ai/leaderboard/te…

English

3

0

11

7.5K

Arena.ai@arena·4d

New provider HiDream-01-Image by @HiDream_AI ranks #27 overall in the Text-to-Image Arena, making it the #4 open source model. Congrats to the @HiDream_AI team on the release!

English

49

11

144

17.9K

Arena.ai@arena·4d

@PiHuang05 Thanks for the catch, it should be: @DeepSeek_AI

English

0

3

717

Pi Huang@PiHuang05·4d

@arena the deepseek account is wrong😭

English

2

0

3

775

Arena.ai@arena·4d

5 patterns in Text Arena's price–performance Pareto frontier since 2023: 1. GPT-4-level quality is now ~500x lower cost. - From a ~$50 blended price per million tokens in 2023 to ~$0.10 today. 2. The higher-price end is both better and lower-priced since 2023. - The leading Arena score has climbed ~170 points (1,330 → 1,500). While the price of the higher-end frontier models dropped from ~$50 to ~$20 per million tokens. 3. The low-cost end gained the most. - Under $0.20 per million tokens, the best available model went from ~1,000 Arena score in 2023 to ~1,440 today. 4. The low-cost/top performance gap has nearly closed. - In 2023, sub-$0.20 models trailed the leader by ~350 Arena points. Today, ~60. 5. The cast has rotated quite a bit. - - @OpenAI set the 2023–24 benchmark. - @AIatMeta strengthened the low-cost end in 2024. - @GoogleDeepMind drove the 2025 jump. - @AnthropicAI holds the peak in 2026. - @xAI and Chinese labs like @DeepSeekAI, @Zai_org, @Kimi_Moonshot, @XiaomiMiMo, and @Alibaba_Qwen are continuing to push the mid-price frontier.

English

13

40

367

56.2K

Arena.ai@arena·4d

Dive into the details of the Text Arena Pareto frontier. Filter and sort by lab, license, input/output price and context length. arena.ai/leaderboard/te…

English

3

2

9

6.3K

Arena.ai@arena·4d

Watch a walkthrough of the Pareto frontier on Arena: youtube.com/watch?v=G8WSWU…

YouTube

English

2

12

11.2K

Arena.ai@arena·5d

Watch the full video with more comparisons of @GoogleDeepMind's latest Gemini 3.5 Flash on YouTube: youtube.com/watch?v=BScuyW…

YouTube

English

0

9

7.5K

Arena.ai@arena·5d

Asked Gemini 3.5 Flash to render the Petra Treasury. It built the entire stone canyon around it - something other frontier models didn't do. Gemini also added ambient sound, which wasn’t in the prompt either. Whether you want this agentic behavior depends on what you're trying to do, but it's a notable departure from how other frontier models behave on the same prompts. More side-by-side prompts with @GoogleDeepMind's latest release in the full video (link in thread) 👇

Arena.ai@arena

Gemini 3.5 Flash has landed #9 for Text and Code Arena: Frontend. Code Arena: Frontend evaluates models on agentic frontend coding tasks from real users building apps and websites (HTML and React). Scoring 1507, this is a significant +70 point improvement over Gemini-3 Flash. Sub-category highlights: - #7 Content Creation Tools - #8 Gaming - #8 Consumer Product - #9 Data & Analytics - #10 Reference-Based Design In Text Arena: #9 overall. Gemini 3.5 Flash also moves the price–performance frontier as the new top Arena score in its price tier. Congrats to the @GoogleDeepMind team on this launch! Click into the thread to see the rankings by each arena.

English

8

3

56

14.8K

Arena.ai@arena·6d

Code Arena: Frontend evaluates models on agentic frontend coding tasks from real users building apps and websites (HTML and React). Agents are an entirely different contest. More from Arena soon. Filter and dive into all the Code Arena: Frontend leaderboard details at: arena.ai/leaderboard/co…

English

0

2

19

6.6K

Arena.ai@arena·6d

A closer look at Gemini 3.5 Flash by @GoogleDeepMind In the Code Arena: Frontend we see sweeping gains, and a Flash model now surpasses the previous Pro variant. - vs. 3 Flash, a +70 jump overall, large improvements in every subcategory - vs. 3.1 Pro, outperforms it in every category with largest gains in Consumer Product, Content Creation Tools, and Data & Analytics. - vs. 3.1 Pro, demonstrates speed with over 2x output tokens per second Congrats again to @GoogleDeepMind on these improvements!

Arena.ai@arena

Gemini 3.5 Flash has landed #9 for Text and Code Arena: Frontend. Code Arena: Frontend evaluates models on agentic frontend coding tasks from real users building apps and websites (HTML and React). Scoring 1507, this is a significant +70 point improvement over Gemini-3 Flash. Sub-category highlights: - #7 Content Creation Tools - #8 Gaming - #8 Consumer Product - #9 Data & Analytics - #10 Reference-Based Design In Text Arena: #9 overall. Gemini 3.5 Flash also moves the price–performance frontier as the new top Arena score in its price tier. Congrats to the @GoogleDeepMind team on this launch! Click into the thread to see the rankings by each arena.

English

16

48

412

39.5K

Arena.ai@arena·6d

Dive into Gemini 3.5 Flash across all the leaderboards at: arena.ai/leaderboard

English

1

0

19

7.3K

Arena.ai@arena·6d

Gemini 3.5 Flash’s pricing shifts the Pareto frontier in Text. 8 models from @GoogleDeepMind dominate the Text Arena Pareto curve where only 4 labs are represented for top performance in their price tiers.

English

3

7

51

10.2K

Arena.ai@arena·6d

Gemini 3.5 Flash has landed #9 for Text and Code Arena: Frontend. Code Arena: Frontend evaluates models on agentic frontend coding tasks from real users building apps and websites (HTML and React). Scoring 1507, this is a significant +70 point improvement over Gemini-3 Flash. Sub-category highlights: - #7 Content Creation Tools - #8 Gaming - #8 Consumer Product - #9 Data & Analytics - #10 Reference-Based Design In Text Arena: #9 overall. Gemini 3.5 Flash also moves the price–performance frontier as the new top Arena score in its price tier. Congrats to the @GoogleDeepMind team on this launch! Click into the thread to see the rankings by each arena.

Google DeepMind@GoogleDeepMind

Introducing Gemini 3.5: our newest family of models combining frontier intelligence with real-world action. The first release is 3.5 Flash, our strongest model yet for agents and coding 🧵

English

35

61

660

199.3K

Arena.ai retweetledi

Qwen@Alibaba_Qwen·18 May

🚀🚀Qwen3.7 Preview lands on Arena ！ Here come Qwen3.7-Max-Preview & Qwen3.7-Plus-Preview. Alibaba now #6 lab in Text, #5 in Vision.⚡️⚡️ Can't wait to release Qwen3.7 series models！Stay tuned! @arena

Arena.ai@arena

Qwen3.7 Preview By @Alibaba_Qwen lands on Arena for Text and Vision. In Text Arena, Qwen3.7 Max Preview ranks #13 overall. Alibaba is now the #6 lab in this arena. - #7 Math - #9 Expert - #9 Software & IT - #10 Coding In Vision Arena: Qwen3.7 Plus Preview ranks #16 overall, making Alibaba the #5 lab. Congrats to the @Alibaba_Qwen team on the latest progress!

English

198

378

3.4K

616.3K

Arena.ai retweetledi

Qwen@Alibaba_Qwen·18 May

🚀🚀

Arena.ai@arena

In the Vision Arena, Qwen3.7 Plus Preview makes @Alibaba_Qwen the #5 lab, ranking #16 overall.

ART

27

26

609

54.9K

Arena.ai@arena·18 May

See more leaderboard details across modalities at: arena.ai/leaderboard

English

2

0

16

10.9K

Arena.ai@arena·18 May

In the Expert Arena, Qwen3.7 Max Preview ranks #9 when it comes to expert-only prompts.

English

1

3

52

16.2K

Arena.ai@arena·18 May

Qwen3.7 Preview By @Alibaba_Qwen lands on Arena for Text and Vision. In Text Arena, Qwen3.7 Max Preview ranks #13 overall. Alibaba is now the #6 lab in this arena. - #7 Math - #9 Expert - #9 Software & IT - #10 Coding In Vision Arena: Qwen3.7 Plus Preview ranks #16 overall, making Alibaba the #5 lab. Congrats to the @Alibaba_Qwen team on the latest progress!

English

42

59

554

426K

Arena.ai

Keşfet