51-50_X

3K posts

51-50_X banner
51-50_X

51-50_X

@FiftyOne_50_

The machine cannot be the final judge of the machine. I test where AI safety claims break. https://t.co/7xtFgAEuIX

Katılım Aralık 2025
328 Takip Edilen57 Takipçiler
Sabitlenmiş Tweet
51-50_X
51-50_X@FiftyOne_50_·
Boundary Atlas v1.0 is complete. AI safety doesn’t live inside the model. It lives at the boundary where systems gain authority to act. Mathematics defines limits. Governance defines permission. Reality tests both. 🧵:
51-50_X tweet media
English
9
1
1
3.7K
Addy Osmani
Addy Osmani@addyosmani·
Tip: Figure out your personal ceiling for running multiple agents in parallel. We need to accept that more agents running doesn't mean more of _you_ available. The narrative is still mostly about throughput and parallelism, but almost nobody's talking about what it actually costs the human in the loop. You're holding multiple problem contexts in your head at once, making judgment calls continuously, and absorbing the anxiety of not knowing what any one agent might be quietly getting wrong. That's a new kind of cognitive labor we don't have good language for yet. I've started treating long agentic sessions the way I'd treat deep focus work: time-boxed and tighter scopes per agent dramatically change how much mental overhead each thread carries. Finding your personal ceiling with these tools is itself a skill and most of us are going to learn it the hard way before we learn it intentionally.
Lenny Rachitsky@lennysan

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems, and by 11am I am wiped out for the day. There is a limit on human cognition. Even if you're not reviewing everything they're doing, how much you can hold in your head at one time. There's a sort of personal skill that we have to learn, which is finding our new limits. What is a responsible way for us to not burn out, and for us to use the time that we have?" @simonw

English
22
15
131
16.8K
Pedro Domingos
Pedro Domingos@pmddomingos·
And another one down: Microsoft has given up on superintelligence.
Pedro Domingos tweet media
English
16
7
83
4.1K
51-50_X
51-50_X@FiftyOne_50_·
No AI system should be granted real-world authority based on safety claims made inside its own execution loop. The machine cannot be the final judge of the machine.
English
49
2
2
10.1K
51-50_X
51-50_X@FiftyOne_50_·
End of 🧵
English
0
0
0
26
51-50_X
51-50_X@FiftyOne_50_·
🧵The 10 Questions the AGI Story Cannot Answer:
English
11
0
0
81
51-50_X
51-50_X@FiftyOne_50_·
What must be observed outside the model before anyone gets to call it AGI? Until that is answered, people are confusing momentum with proof.
English
0
0
0
26
51-50_X
51-50_X@FiftyOne_50_·
What proof shows the system can detect the edge of its own map instead of just extending confidence until reality says no?
English
0
0
0
22
51-50_X
51-50_X@FiftyOne_50_·
What would falsify the claim that better prediction and planning imply deeper intelligence rather than a stronger internal simulator?
English
0
0
0
19
51-50_X
51-50_X@FiftyOne_50_·
What external test would show that a world model predicts better without understanding generally?
English
0
0
0
15
51-50_X
51-50_X@FiftyOne_50_·
If scale is sufficient, why do advocates keep introducing new architectural ingredients? What does that say about the original claim?
English
0
0
0
14
51-50_X
51-50_X@FiftyOne_50_·
What outside observer can still say no to the AGI claim using a rule the vendors would have to accept?
English
0
0
0
14
51-50_X
51-50_X@FiftyOne_50_·
What exactly is supposed to have crossed the boundary: benchmark score, economic usefulness, planning ability, or AGI itself?
English
0
0
0
17
51-50_X
51-50_X@FiftyOne_50_·
What result would force advocates to admit that more compute improved skill, but not general intelligence?
English
0
0
0
22
51-50_X
51-50_X@FiftyOne_50_·
If more scale proves AGI, what external test shows the system did more than get better inside the benchmark loop?
English
0
0
0
24
51-50_X
51-50_X@FiftyOne_50_·
They keep stacking scale, world models, and AGI rhetoric like the conclusion is already settled. It isn’t. The missing questions are the ones that ask what still has to be proven from outside the loop.
English
0
0
0
25
51-50_X
51-50_X@FiftyOne_50_·
Scale made the systems stronger. World models may make them stronger still. Now the infrastructure class says AGI is already here. That still does not add up to proof. More compute, better prediction, bigger claims. People keep confusing momentum with arrival.
English
0
0
0
7
51-50_X
51-50_X@FiftyOne_50_·
The industry’s favorite trick is to stack progress, architecture, and rhetoric until people confuse momentum with proof.
English
1
0
0
12
51-50_X
51-50_X@FiftyOne_50_·
@simonw This is the execution boundary in plain view: the package wasn’t the first thing compromised—the maintainer’s authority was. Once the human gate was socially captured, the release channel became the delivery path.
English
4
0
0
11