Scale AI

2.3K posts

Scale AI banner
Scale AI

Scale AI

@scale_AI

making AI work

Katılım Temmuz 2016
482 Takip Edilen75K Takipçiler
Scale AI
Scale AI@scale_AI·
Scale AI has acquired ICG Solutions, a defense technology firm specializing in real-time streaming data analytics. This is another step forward in how we support the U.S. defense and intelligence community with AI systems built to serve America’s most important national security missions. scale.com/blog/scale-acq…
English
4
3
26
6.7K
Scale AI
Scale AI@scale_AI·
Key takeaway for model builders: capability and judgment are orthogonal axes. Scaling SWE-Bench alone won't close this. Current post-training doesn’t penalize an agent for confidently solving the wrong problem. Ask-F1 is the first verifiable signal that does, and it transfers across domains. The goal isn't full autonomy. It's selective escalation: agents that know what they don't know.
English
2
0
7
1.5K
Scale AI
Scale AI@scale_AI·
New @ScaleAILabs Research: Your AI agent just gave you an answer but did it actually solve the problem, get lucky, or just sound right? Today’s benchmarks can’t tell. We built HiL-Bench (Human-in-Loop Benchmark) to test a critical skill: does your agent know what it’s missing and when to ask for clarification? 🧵
Scale AI tweet media
English
5
12
64
7.9K
Scale AI retweetledi
Scale Labs
Scale Labs@ScaleAILabs·
Big research update: we have 6 papers accepted at ICLR 2026! 🎉 We’re pushing the frontier of eval-driven RL, rubric-based rewards, and agentic capabilities. Over the next week, we’ll be sharing insights from our accepted papers. Here’s a preview into the work we'll be presenting in Brazil 🧵
English
3
6
91
16.1K
Scale AI
Scale AI@scale_AI·
Breaking: @AIatMeta just released Muse Spark — now live across @ScaleAILabs leaderboards. Here’s how it stacks up: Tied for 🥇on SWE-Bench Pro Tied for 🥇on HLE Tied for 🥇on MCP Atlas Tied for 🥇on PR Bench - Legal Tied for 🥈on SWE Atlas Test Writing 🥈on PR Bench - Finance 🥉on SWE Atlas QnA
Scale AI tweet media
English
7
20
155
20.8K
Scale AI retweetledi
Bing Liu
Bing Liu@vbingliu·
Excited to share we have 3 papers from @scale_AI accepted to the ACL 2026 Main Conference! 🎉 These works span reasoning, multimodal, and agentic evaluations, pushing the frontier of how we measure and improve AI systems. 👇 🔧 Agentic Rubrics as Contextual Verifiers for SWE Agents arxiv.org/abs/2601.04171 A new execution-free verification signal for SWE agents: an expert agent explores the codebase to generate context-grounded rubric checklists, outperforming strong baselines on SWE-Bench tasks. We found that rubric scores are consistent with ground-truth tests while also flagging issues that tests fail to capture, providing a richer signal than execution-based verification alone. 📄 PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning arxiv.org/abs/2511.11562 We introduce a large-scale public benchmark for professional reasoning, featuring 1,100 expert-authored tasks across Legal and Finance domains contributed by 182 qualified professionals spanning 114 countries. Top models score only 0.37-0.39 on hard subsets. Common failure modes include inaccurate judgments, a lack of process transparency, and incomplete reasoning, highlighting critical gaps for reliable professional adoption. 🎙️ Audio MultiChallenge A benchmark of 452 natural multi-turn spoken conversations evaluating end-to-end audio systems on memory, instruction retention, voice editing, and audio-cue reasoning. Even the best frontier models pass only ~55% of challenges, failing most on the new axes, with Self Coherence degrading over longer audio context, reflecting fundamental difficulty in tracking edits, audio cues, and long-range context in natural spoken dialogue.
English
4
15
113
13.1K
Scale AI
Scale AI@scale_AI·
We’re proud to support the @DeptofWar Golden Dome for America. Agentic AI capabilities turn data into insights at the speed threats demand – powering decision advantage when it matters most.
Scale AI tweet media
English
4
16
72
9.9K
Scale AI
Scale AI@scale_AI·
You’re only as competitive as your ability to capture and scale your organization’s knowledge. Dialect is the missing link between AI capability and outcomes – bringing your experts’ judgment into systems your teams can trust and build on.
English
2
2
27
3.5K
Scale AI
Scale AI@scale_AI·
Proud to announce a new collaboration with @BAESystemsInc to bring AI capabilities to the @DeptofWar's most capable platforms and systems. 🤝
Scale AI tweet media
English
1
0
20
3.7K
Scale AI retweetledi
Scale Labs
Scale Labs@ScaleAILabs·
We're launching Voice Showdown — the first arena for voice AI built entirely on real spoken prompts and human preference. Most voice benchmarks use synthetic speech and are English-only. We send raw user audio to models from a global user base speaking 60+ languages. Here’s what we’ve learned so far ↓
Scale Labs tweet mediaScale Labs tweet media
English
1
5
34
4.4K