Scale AI

2.3K posts

Scale AI banner
Scale AI

Scale AI

@scale_AI

making AI work

Katılım Temmuz 2016
483 Takip Edilen75.5K Takipçiler
Sabitlenmiş Tweet
Scale AI
Scale AI@scale_AI·
The future runs on proof. 😤
English
1
6
43
6.1K
Scale AI retweetledi
Philip de Guzman
Philip de Guzman@PhilipofGuzman·
The humans stay. That’s the idea behind @scale_ai's new brand campaign. 10 years of building AI has taught us something: the most important decisions belong to humans. The AI that works in decisions of consequence keeps humans at the center. Going live in SF and NYC. Where to next? 👀
Philip de Guzman tweet mediaPhilip de Guzman tweet media
English
2
9
48
6.1K
Scale AI
Scale AI@scale_AI·
🚨 JUST IN: Scale AI milestone incoming. Stay tuned.
English
3
3
49
6.8K
Scale AI
Scale AI@scale_AI·
This month we turn 10. The hard work started in 2016, and it hasn’t stopped. Shortcuts are for losers. Winners welcome. scale.com/careers
English
7
25
110
41.9K
Scale AI retweetledi
Scale Labs
Scale Labs@ScaleAILabs·
Today we’re releasing Refactoring, the final leaderboard of our SWE Atlas suite. This new leaderboard is the ultimate test of an agent's ability to restructure code without breaking the system. Claude Opus 4.7 with Claude Code takes the top spot🥇
Scale Labs tweet media
English
40
52
675
104.9K
Scale AI
Scale AI@scale_AI·
Proud to share @CDAODoW has expanded its enterprise agreement with Scale AI raising the ceiling from $100M to $500M. This expansion reflects our continued commitment to accelerating the adoption of AI capabilities across the Pentagon to help America stay prepared, resilient, and strong. scale.com/blog/Scale-ai-…
English
3
5
39
3.6K
Scale AI retweetledi
Jason Droege
Jason Droege@jdroege·
AI pretenders vs. AI contenders. It's those who still haven’t realized reliability is the product vs. those who can deliver reliability and outcomes. That's what the enterprise AI race comes down to. Here's a note I sent the Scale team this week.
Jason Droege@jdroege

x.com/i/article/2052…

English
4
11
45
18.6K
Scale AI retweetledi
Scale Labs
Scale Labs@ScaleAILabs·
We recently built HiL-Bench, the first benchmark to test a critical question: do AI agents know what they’re missing and when to ask? Frontier models perform well with perfect specs. But remove a few key details, and they confidently guess and ship plausible wrong answers. We just added GPT-5.5, Opus 4.7, and Kimi K2.6 to the leaderboard. Here’s what we’re seeing ⬇️🧵
Scale Labs tweet media
English
31
67
655
79.2K
Scale AI
Scale AI@scale_AI·
Scale AI has acquired ICG Solutions, a defense technology firm specializing in real-time streaming data analytics. This is another step forward in how we support the U.S. defense and intelligence community with AI systems built to serve America’s most important national security missions. scale.com/blog/scale-acq…
English
5
4
30
7.7K
Scale AI
Scale AI@scale_AI·
Key takeaway for model builders: capability and judgment are orthogonal axes. Scaling SWE-Bench alone won't close this. Current post-training doesn’t penalize an agent for confidently solving the wrong problem. Ask-F1 is the first verifiable signal that does, and it transfers across domains. The goal isn't full autonomy. It's selective escalation: agents that know what they don't know.
English
2
0
8
1.9K
Scale AI
Scale AI@scale_AI·
New @ScaleAILabs Research: Your AI agent just gave you an answer but did it actually solve the problem, get lucky, or just sound right? Today’s benchmarks can’t tell. We built HiL-Bench (Human-in-Loop Benchmark) to test a critical skill: does your agent know what it’s missing and when to ask for clarification? 🧵
Scale AI tweet media
English
5
13
69
8.7K
Scale AI retweetledi
Scale Labs
Scale Labs@ScaleAILabs·
Big research update: we have 6 papers accepted at ICLR 2026! 🎉 We’re pushing the frontier of eval-driven RL, rubric-based rewards, and agentic capabilities. Over the next week, we’ll be sharing insights from our accepted papers. Here’s a preview into the work we'll be presenting in Brazil 🧵
English
3
6
91
16.6K
Scale AI
Scale AI@scale_AI·
Breaking: @AIatMeta just released Muse Spark — now live across @ScaleAILabs leaderboards. Here’s how it stacks up: Tied for 🥇on SWE-Bench Pro Tied for 🥇on HLE Tied for 🥇on MCP Atlas Tied for 🥇on PR Bench - Legal Tied for 🥈on SWE Atlas Test Writing 🥈on PR Bench - Finance 🥉on SWE Atlas QnA
Scale AI tweet media
English
7
20
154
21.5K