Scale AI

2.3K posts

Scale AI

@scale_AI

making AI work

Katılım Temmuz 2016

484 Takip Edilen76.6K Takipçiler

Scale AI retweetledi

Scale Labs@ScaleAILabs·4d

We appreciate the community's feedback on SWE-Bench Pro. Much of it maps to changes already underway in v1.1, which we've been building for a while. Keeping evals current with frontier models is hard, and we're always iterating. SWE-Bench Pro Verified coming soon. 👀

English

175

36.3K

Scale AI@scale_AI·4 Tem

Celebrating 250 years and building reliable, mission-ready AI for what's next.

English

11.2K

Scale AI retweetledi

Jason Droege@jdroege·16 Haz

Introducing the 6% Report. Everyone is talking about enterprise AI adoption, yet very few are producing real outcomes. Our new research shows only 6% of organizations have successfully deployed AI at scale and achieved measurable business value. Here's what we found they’re doing differently: scale.com/six-percent

English

15.5K

Scale AI@scale_AI·12 Haz

We spoke with over 50 healthcare professionals to understand what they need from AI. The conversations came down to one thing: trust. Our latest healthcare findings: labs.scale.com/blog/healthcar…

English

Scale AI retweetledi

Scale Labs@ScaleAILabs·1 Haz

Today we're releasing HiL-Dynamics, the first open-source tool that measures how production agents actually collaborate with humans under uncertainty. Not just whether they got the answer. Now you can measure exactly when your agent asks for help, when it makes assumptions, and when it'll confidently ship the wrong answer. Our findings 🧵

English

7.6K

Scale AI@scale_AI·26 May

To understand our story, you have to go back to the beginning. It started with self-driving cars. Ten years later, it's the architecture underneath AI that actually works, across frontier labs, enterprises, governments, and mission-critical systems around the world.

English

8.9K

Scale AI retweetledi

Philip de Guzman@PhilipofGuzman·20 May

The humans stay. That’s the idea behind @scale_ai's new brand campaign. 10 years of building AI has taught us something: the most important decisions belong to humans. The AI that works in decisions of consequence keeps humans at the center. Going live in SF and NYC. Where to next? 👀

English

10.5K

Scale AI@scale_AI·18 May

The future runs on proof. 😤

English

8.9K

Scale AI@scale_AI·14 May

It's our birthday. 🎂 scale.com/blog/ten-years…

English

4.4K

Scale AI@scale_AI·14 May

🚨 JUST IN: Scale AI milestone incoming. Stay tuned.

English

8.9K

Scale AI@scale_AI·14 May

This month we turn 10. The hard work started in 2016, and it hasn’t stopped. Shortcuts are for losers. Winners welcome. scale.com/careers

English

122

70.1K

Scale AI retweetledi

Scale Labs@ScaleAILabs·7 May

Today we’re releasing Refactoring, the final leaderboard of our SWE Atlas suite. This new leaderboard is the ultimate test of an agent's ability to restructure code without breaking the system. Claude Opus 4.7 with Claude Code takes the top spot🥇

English

677

107.2K

Scale AI@scale_AI·7 May

Proud to share @CDAODoW has expanded its enterprise agreement with Scale AI raising the ceiling from $100M to $500M. This expansion reflects our continued commitment to accelerating the adoption of AI capabilities across the Pentagon to help America stay prepared, resilient, and strong. scale.com/blog/Scale-ai-…

English

4.7K

Scale AI retweetledi

Jason Droege@jdroege·6 May

AI pretenders vs. AI contenders. It's those who still haven’t realized reliability is the product vs. those who can deliver reliability and outcomes. That's what the enterprise AI race comes down to. Here's a note I sent the Scale team this week.

Jason Droege@jdroege

x.com/i/article/2052…

English

20K

Scale AI retweetledi

Scale Labs@ScaleAILabs·4 May

We recently built HiL-Bench, the first benchmark to test a critical question: do AI agents know what they’re missing and when to ask? Frontier models perform well with perfect specs. But remove a few key details, and they confidently guess and ship plausible wrong answers. We just added GPT-5.5, Opus 4.7, and Kimi K2.6 to the leaderboard. Here’s what we’re seeing ⬇️🧵

English

652

80.3K

Scale AI@scale_AI·21 Nis

Scale AI has acquired ICG Solutions, a defense technology firm specializing in real-time streaming data analytics. This is another step forward in how we support the U.S. defense and intelligence community with AI systems built to serve America’s most important national security missions. scale.com/blog/scale-acq…

English

8.4K

Scale AI@scale_AI·20 Nis

Paper: static.scale.com/uploads/67a153… Data: huggingface.co/datasets/Scale… Leaderboard: labs.scale.com/leaderboard/hil Code & Harness: github.com/hilbenchauthor…

English

2.1K

Scale AI@scale_AI·20 Nis

Key takeaway for model builders: capability and judgment are orthogonal axes. Scaling SWE-Bench alone won't close this. Current post-training doesn’t penalize an agent for confidently solving the wrong problem. Ask-F1 is the first verifiable signal that does, and it transfers across domains. The goal isn't full autonomy. It's selective escalation: agents that know what they don't know.

English

2.4K

Scale AI@scale_AI·20 Nis

New @ScaleAILabs Research: Your AI agent just gave you an answer but did it actually solve the problem, get lucky, or just sound right? Today’s benchmarks can’t tell. We built HiL-Bench (Human-in-Loop Benchmark) to test a critical skill: does your agent know what it’s missing and when to ask for clarification? 🧵

English

9.5K

Keşfet

@CDAODoW @ScaleAILabs @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA