Joan Gamell, bonvivant.eth

37 posts

Joan Gamell, bonvivant.eth

@gamell

Software Engineer @moov | Photography | Bon Vivant

California เข้าร่วม Şubat 2007

1.1K กำลังติดตาม738 ผู้ติดตาม

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·18 Mar

Did you know? We’re funding independent research in AI evaluation and measurement—up to $50k per project. The Q1 deadline to apply for Arena’s Academic Partnerships Program is March 31.

Arena.ai@arena

AI needs better evaluations. Today we’re announcing Arena’s Academic Partnerships Program to fund independent academic research in AI evaluation and measurement. ▫️Up to $50K/project. Q1 Deadline: March 31, 2026. See more in thread for details and how to apply 👇

English

12.4K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·17 Mar

Today we’re launching the Video Edit Arena to evaluate the frontier capability of video models! - #1 Grok-Imagine-Video, @xAI - #2 Kling-o3-pro, @Kling_ai - #3 Kling-o1-pro, @Kling_ai - #4 Gen4-aleph, @Runwayml The leaderboard is powered by thousands of real-world community votes. Click the Edit button in Video Arena to edit any video and compare top model outputs. More models coming soon!

English

196

23.7K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·26 Şub

Today, we’re launching a dedicated Multi-File React leaderboard. When Code Arena first launched, we evaluated models on single-file HTML. Then we raised the bar → multi-file React apps (routing, hooks, components, state management) and now have a leaderboard to match! Single-file HTML tests instruction following. Multi-file React tests more complex agentic capabilities: • Cross-file coordination • Component architecture • Dependency management • State management • Build reliability Now see how models stack up under both scenarios. Learn more in thread 🧵👇

Arena.ai@arena

Multi-file apps are now live in Code Arena! Since launching Code Arena in November to evaluate frontier AI models on real-world, agentic coding tasks, we’ve received a lot of feedback asking to adapt more complex workflows. With multi-file apps, you can now build and compare production-ready projects, making it easier to evaluate how top frontier AI models perform on your actual use cases.

English

145

11.5K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·20 Şub

⚡️Who powers the Arena leaderboard? You do. But not all votes are Arena level research-grade quality. Every score is built from real-world prompts and human input, continuously refreshed as the way we use AI evolves. In this video, ML Scientist @cthorrez explains how votes are processed and validated into high-quality data trusted by leading AI labs. Dive into: ▪️ Filtering low-signal and bot-like activity ▪️ Our quality control and validation pipeline ▪️ Ensuring all votes are based on blind pairings ▪️ How diverse user prompts differ from static benchmarks Arena scores reflect how models perform in the wild: across coding, creative work, expert tasks, and everyday use. Watch the breakdown to see how your vote helps shape the frontier.

English

7.3K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·19 Şub

✨NEW: Arena Leaderboard UI Updates Millions of votes power the leaderboard. Now you can filter for what matters to you. A new side panel lets you filter and break down ranked results to find the best model for your task. Some highlights: • Filter by category (e.g. Coding, Expert prompts) • Open vs. Proprietary Models • Rank labs by their top-performing models …just to name a few. Check it out, and let us know what you think.

English

7.9K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·7 Şub

The new @xAI Grok-Imagine-Image model is a Pareto-optimal model in Image Arena: The Pareto frontier tells us which model has the highest Arena score at each price point. @xAi’s latest models have improved the frontier, giving optimal performance in the mid-price tier. For a wide range of prices between 2c and 8c per image, @elonmusk’s @xAI has the leading model, delivering the maximum performance. Top models on the Pareto frontier for Image Arena (Single Image Edit): - @OpenAI: GPT-Image-1.5-high-fidelity - @xAI: Grok Imagine Image Pro - @xAI: Grok Imagine Image - @bfl_ml: Flux 2 Klein 9B - @bfl_ml: Flux-2-Dev - @reve : V1.1 Fast See thread for how the frontier changes for Text-to-Image 🧵

Arena.ai@arena

Latest image models from @xAI, Grok-Imagine-Image and Pro debut top 6 in the Image Arena! Text-to-Image: ▪️ #4 Grok-Imagine-Image; scoring 1170, surpassing Flux-2-max and Nano-banana ▪️ #6 Grok-Imagine-Image-Pro Image-Edit: ▪️ #5 Grok-Imagine-Image-Pro; scoring 1330, overtaking Seedream-4.5 ▪️ #6 Grok-Imagine-Image With this launch, @xAI is now a top-3 Image AI provider alongside @GoogleDeepMind and @OpenAI. Congrats to the @xAI team on the impressive releases!

English

132

156

950

10.1M

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·6 Şub

🚨BREAKING: Claude Opus 4.6 by @AnthropicAI is now #1 across Code, Text and Expert Arena! Opus 4.6 shows significant gains across the board: - #1 Code Arena: +106 score vs Opus 4.5 - #1 Text Arena: scoring 1496, +10 vs Gemini 3 Pro - #1 Expert Arena: +~50 lead Congrats to the @AnthropicAI team on the incredible milestone! The frontier just moved.

Claude@claudeai

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

English

741

212.8K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·4 Şub

👋Say hello to Max! Max is Arena’s intelligent router, powered by 5+ million real-world community votes. Max routes each prompt to the most capable model with latency in mind. AI models excel at different things (code, math, speed, reasoning). Max orchestrates across model strengths to deliver reliable performance across real-world use cases. Available today in Direct chat!

English

230

30.3K

Joan Gamell, bonvivant.eth@gamell·28 Oca

@arena Absolute CLASS

English

454

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·28 Oca

LMArena is now Arena. A name that takes us back to our roots with a powerful mission: to measure and advance the frontier of AI for real-world use. We have grown from a small PhD research project to a platform powered by a global community of millions. This rebrand has been shaped by the people who use it. 👇 Take a look inside the rebrand.

English

860

145.3K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·21 Oca

🚨BIG NEWS: 🎬 Video Arena is now live on the web! Test out Veo 3.1, Sora 2, Seedance v1.5 Pro, Kling 2.6 Pro, Wan 2.5 & more. What started last summer as a small Discord bot experiment has grown into a rigorous way to measure and understand how frontier video models perform with real-world use. Thank you to our wonderful community for all the feedback! Today, we’re opening up access by making it available on the web. 🎥 Generate videos with 15 different frontier AI models and compare them head-to-head. 📊 Vote for the best output to power the leaderboards.

English

214

61.8K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·15 Oca

Who’s actually leading the AI race? It depends on which leaderboard you look at. On Arena’s Text leaderboard (since May 2023): 🔹@OpenAI leads 74% of the time 🔹@GoogleDeepMind 21% 🔹@AnthropicAI 5% But zoom into Expert prompts (~5% of the hardest real-world tasks) and the story flips. 👇 On Arena’s Expert Text leaderboard (since March 2024): 🔸@AnthropicAI leads 48% of the time 🔸@OpenAI 37% 🔸@GoogleDeepMind 12% 🔸@Deepseek_AI 4% The takeaway: Different tasks. Different winners.

GIF

English

167

23.1K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·6 Oca

Today, we’re excited to announce our $150M Series A at a $1.7B valuation—nearly 3× our May seed round. Since launching evaluations in Sept, our annualized consumption run rate has surpassed $30M. Our mission is clear: to measure and advance the frontier of AI for real-world use, ensuring that developers, researchers, enterprises, and everyday users can understand how AI behaves where it matters most. The round was led by @Felicis and UC Investments (@UofCalifornia), with participation from @a16z, @TheHouseFund, LDVP, @kleinerperkins, @lightspeedvp and @LaudeVentures. This milestone reflects a growing industry consensus: AI cannot scale responsibly without independent, transparent, and continuous evaluation. Over the past year, LMArena has become the world’s most trusted community platform for understanding how AI models perform in real-world conditions. As AI reaches billions of people across the globe, the need for measurement grounded in lived experience—not benchmarks alone—has never been more urgent. Today, we serve more than 5 million monthly users across 150 countries. Together, our community generates over 60 million conversations every month, evaluating model capability and reliability across text, code, image, video, and search. We will move even faster to build new features and improve our product experience for the community to evaluate the frontier of AI. This unprecedented engagement signals a fundamental shift in expectations: the world now demands AI that is measurable, comparable, and accountable. This new funding allows us to meaningfully scale our engineering, research, platform operations, and community initiatives to meet accelerating global demand. With our team, partners, and global community behind us, we’ll keep redefining how the AI frontier is measured and advanced—on our path to building the world’s most trusted evaluation platform.

English

643

201.4K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·20 Kas

🚨🍌BREAKING: @GoogleDeepMind’s Gemini 3 Pro Image aka Nano Banana Pro is in the Arena! Built on Gemini 3, which only two days ago landed as #1 across all major Arena leaderboards. Put it head-to-head in Battle mode with the latest models and judge for yourself if it’s SOTA for image generation and editing. Remember, your votes help shape the leaderboards. We’ll see how it stacks up!

Google DeepMind@GoogleDeepMind

We just dropped Nano Banana Pro, built on Gemini 3. 🍌 With state-of-the-art text rendering, vast world knowledge and studio-quality creative controls, Gemini 3 Pro Image can create and edit more complex visuals, infographics and more. Here’s what’s under the hood. 🧵

English

300

74.7K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·18 Kas

🚨BREAKING: @GoogleDeepMind’s Gemini-3-Pro is now #1 across all major Arena leaderboards 🥇#1 in Text, Vision, and WebDev - surpassing Grok-4.1, Claude-4.5, and GPT-5 🥇#1 in Coding, Math, Creative Writing, Long Queries, and nearly all occupational leaderboards. Massive gains over Gemini-2.5: 🔸WebDev in Code Arena: 1487 (+280 pts vs 2.5) 🔸Text: 1501 (+50 pts) 🔸Vision: 1328 (+70 pts) 🔸Arena Expert: Top-3 (just 3 pts behind #1) Huge congrats to the @GoogleDeepMind team on this breakthrough! 👏

Sundar Pichai@sundarpichai

Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting. Find Gemini 3 Pro rolling out today in the @Geminiapp and AI Mode in Search. For developers, build with it now in @GoogleAIStudio and Vertex AI. Excited for you to try it!

English

107

268

2.2K

483.5K

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Arena.ai@arena·14 Kas

📊 Leaderboard Ranking Method Update We’ve refined how ranks are displayed to make them more interpretable and statistically accurate. Each model now shows: • Raw Rank: its position by Arena score (no ties) • Rank Spread: the best-to-worst range based on confidence intervals This makes it clearer how close models truly are in the leaderboard.

English

223

23.5K

Joan Gamell, bonvivant.eth@gamell·12 Kas

I've been playing this Code Arena-generated Tetris more than I care to admit 😂 Try it here …ffc-71f6-a578-48e5f195ac5f.arena.site

Arena.ai@arena

🚀Introducing Code Arena: the next generation of live coding evals for frontier AI models. Built to test how models plan, scaffold, debug, and build real web apps step-by-step. Try Claude, GPT-5, GLM-4.6 and Gemini in Code Arena today!

English

130

Joan Gamell, bonvivant.eth@gamell·10 Kas

White Christmas in La Jolla

Eesti

Joan Gamell, bonvivant.eth รีทวีตแล้ว

Anastasios Nikolas Angelopoulos@ml_angelopoulos·5 Kas

🏆NEW LMARENA LEADERBOARDS🏆 🤓Experts 💻 Software & IT Services ✍️ Writing, Literature, & Language 🔬 Life, Physical, & Social Science 🎭 Entertainment, Sports, & Media 📈 Business, Management, & Financial Ops 🧮 Mathematical ⚖️ Legal & Government 🩺 Medicine & Healthcare Evaluations of AI’s economic utility (like GDPval) are ever-more relevant, but expensive to collect. We worked with LMArena's community of millions of monthly contributors to source occupational and expert data organically, solving the scalability problem. >5% of LMArena users are experts, and a huge fraction of LMArena prompts are in economically valuable industries: SWE, students/researchers, marketers/designers, doctors, lawyers, and more. This allows us to build online leaderboards in these categories built on fresh feedback every day. It speaks to the power of the real-world feedback system we’ve created at @arena!

Anastasios Nikolas Angelopoulos tweet media

Arena.ai@arena

🚀 Introducing Arena Expert: a new LMArena evaluation framework to identify the toughest, most expert-level prompts from real users, powering a new Expert leaderboard. We also introduce Occupational Categories that underlie eight new leaderboards: 💻 Software & IT Services ✍️ Writing, Literature, & Language 🔬 Life, Physical, & Social Science 🎭 Entertainment, Sports, & Media 📈 Business, Management, & Financial Ops 🧮 Mathematical ⚖️ Legal & Government 🩺 Medicine & Healthcare Explore how models perform across fields in thread 🧵 👇

English

13.4K

Joan Gamell, bonvivant.eth@gamell·13 Mar

@JoeLaCavaD1 @MayorToddGloria Via Capri in La Jolla feels like driving in Baghdad after a bombing. It should be repaved (not hot patched) immediately as it's in such a shameful state of disrepair that it's dangerous. Our taxes should go to basic stuff like this first!

English

ค้นพบ

@xAI @Kling_ai @Runwayml @cthorrez @xAi @elonmusk @OpenAI @bfl_ml