Wil Cunningham retweetledi

Banger paper from Google DeepMind on the missing layer of AGI just flipped the entire “AI safety is about averages” narrative on its head.
Most people still think safety is about how a model behaves most of the time.
This paper shows why that intuition breaks the moment systems scale.
DeepMind frames AGI safety as a distributional problem, not a checklist problem. What matters is not the average outcome, but the shape of the tail. Rare behaviors. Edge cases. Low-probability failures that only show up when a system is deployed millions of times.
A model can look safe in benchmarks, red-team tests, and controlled demos, and still be dangerous once it leaves the lab.
Because deployment doesn’t sample “typical” situations. It samples everything.
- Unusual users.
- Weird environments.
- Misaligned incentives.
- Adversarial feedback loops.
- Corner cases nobody designed for.
At scale, those corner cases stop being rare. They become guaranteed.
The paper’s core insight is uncomfortable: progress can reduce visible failures while increasing real risk. If capability grows faster than tail control, safety metrics improve and danger quietly compounds.
Two systems can have identical average behavior and radically different worst-case outcomes. Current evaluations cannot see that difference.
This also breaks a common governance assumption. You cannot certify AGI safety with finite tests when the risk lives in distribution shift. You are never testing the system you actually deploy. You are sampling from a future you don’t control.
The implication is sharp.
AGI safety is not a model property. It is a systems property.
It depends on deployment, incentives, monitoring, and how much tail risk society is willing to absorb.
This paper doesn’t offer comfort. It removes it.
The real question is no longer “does the model usually behave well?”
It’s “what happens when it doesn’t, and how often is that allowed to happen before scale makes it unacceptable?”

English















