51-50_X

3K posts

51-50_X

@FiftyOne_50_

The machine cannot be the final judge of the machine. I test where AI safety claims break. https://t.co/7xtFgAEuIX

Katılım Aralık 2025

328 Takip Edilen57 Takipçiler

Sabitlenmiş Tweet

51-50_X@FiftyOne_50_·15 Mar

Boundary Atlas v1.0 is complete. AI safety doesn’t live inside the model. It lives at the boundary where systems gain authority to act. Mathematics defines limits. Governance defines permission. Reality tests both. 🧵:

English

3.7K

51-50_X@FiftyOne_50_·26m

@addyosmani @simonw Pinned: Boundary Atlas -🧵50 Threads in Plain English

51-50_X@FiftyOne_50_

🧵The Safety Claim Cannot Close from Inside the System:

English

Addy Osmani@addyosmani·2h

Tip: Figure out your personal ceiling for running multiple agents in parallel. We need to accept that more agents running doesn't mean more of _you_ available. The narrative is still mostly about throughput and parallelism, but almost nobody's talking about what it actually costs the human in the loop. You're holding multiple problem contexts in your head at once, making judgment calls continuously, and absorbing the anxiety of not knowing what any one agent might be quietly getting wrong. That's a new kind of cognitive labor we don't have good language for yet. I've started treating long agentic sessions the way I'd treat deep focus work: time-boxed and tighter scopes per agent dramatically change how much mental overhead each thread carries. Finding your personal ceiling with these tools is itself a skill and most of us are going to learn it the hard way before we learn it intentionally.

Lenny Rachitsky@lennysan

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems, and by 11am I am wiped out for the day. There is a limit on human cognition. Even if you're not reviewing everything they're doing, how much you can hold in your head at one time. There's a sort of personal skill that we have to learn, which is finding our new limits. What is a responsible way for us to not burn out, and for us to use the time that we have?" @simonw

English

131

16.8K

51-50_X@FiftyOne_50_·2h

@pmddomingos Questions:

51-50_X@FiftyOne_50_

🧵The 10 Questions the AGI Story Cannot Answer:

English

Pedro Domingos@pmddomingos·2h

And another one down: Microsoft has given up on superintelligence.

English

4.1K

51-50_X@FiftyOne_50_·2h

x.com/fiftyone_50_/s…

51-50_X@FiftyOne_50_

🧵The 10 Questions the AGI Story Cannot Answer:

ZXX

51-50_X@FiftyOne_50_·13 Mar

No AI system should be granted real-world authority based on safety claims made inside its own execution loop. The machine cannot be the final judge of the machine.

English

10.1K

51-50_X@FiftyOne_50_·4h

End of 🧵

English

51-50_X@FiftyOne_50_·4h

🧵The 10 Questions the AGI Story Cannot Answer:

English

51-50_X@FiftyOne_50_·4h

What must be observed outside the model before anyone gets to call it AGI? Until that is answered, people are confusing momentum with proof.

English

51-50_X@FiftyOne_50_·4h

What proof shows the system can detect the edge of its own map instead of just extending confidence until reality says no?

English

51-50_X@FiftyOne_50_·4h

What would falsify the claim that better prediction and planning imply deeper intelligence rather than a stronger internal simulator?

English

51-50_X@FiftyOne_50_·4h

What external test would show that a world model predicts better without understanding generally?

English

51-50_X@FiftyOne_50_·4h

If scale is sufficient, why do advocates keep introducing new architectural ingredients? What does that say about the original claim?

English

51-50_X@FiftyOne_50_·4h

What outside observer can still say no to the AGI claim using a rule the vendors would have to accept?

English

51-50_X@FiftyOne_50_·4h

What exactly is supposed to have crossed the boundary: benchmark score, economic usefulness, planning ability, or AGI itself?

English

51-50_X@FiftyOne_50_·4h

What result would force advocates to admit that more compute improved skill, but not general intelligence?

English

51-50_X@FiftyOne_50_·4h

If more scale proves AGI, what external test shows the system did more than get better inside the benchmark loop?

English

51-50_X@FiftyOne_50_·4h

They keep stacking scale, world models, and AGI rhetoric like the conclusion is already settled. It isn’t. The missing questions are the ones that ask what still has to be proven from outside the loop.

English

51-50_X@FiftyOne_50_·4h

Scale made the systems stronger. World models may make them stronger still. Now the infrastructure class says AGI is already here. That still does not add up to proof. More compute, better prediction, bigger claims. People keep confusing momentum with arrival.

English

51-50_X@FiftyOne_50_·4h

The industry’s favorite trick is to stack progress, architecture, and rhetoric until people confuse momentum with proof.

English

51-50_X@FiftyOne_50_·5h

@jack x.com/fiftyone_50_/s…

51-50_X@FiftyOne_50_

🧵 The Silence Is the Answer:

QME

jack@jack·3d

x.com/i/article/2038…

ZXX

519

1.6K

10K

5.1M

51-50_X@FiftyOne_50_·5h

@simonw It’s all connected. Nobody listens to an account with 50 followers though.. until it’s too late. x.com/fiftyone_50_/s…

51-50_X@FiftyOne_50_

English

51-50_X@FiftyOne_50_·5h

@simonw This is the execution boundary in plain view: the package wasn’t the first thing compromised—the maintainer’s authority was. Once the human gate was socially captured, the release channel became the delivery path.

English

Simon Willison@simonw·6h

Warning to open source maintainers: the Axios supply chain attack started with some very sophisticated social engineering targeted at one of their developers simonwillison.net/2026/Apr/3/sup…

English

212

15.3K

51-50_X@FiftyOne_50_·5h

@simonw x.com/fiftyone_50_/s…

51-50_X@FiftyOne_50_

Here is the technical indictment: if your build pipeline can assemble the artifact and publish it without an independent hard stop, your “controls” are theater. That is not governance. These are hopes and dreams marketed as safety and progress.

QME

Keşfet

@addyosmani @simonw @pmddomingos @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates