
Nathan Labenz
4.8K posts

Nathan Labenz
@labenz
AI Scout, building text-2-video @Waymark, host of The Cognitive Revolution podcast




80,000 Hours the book was published by Penguin today. It's our definitive explanation of how to use your career to try to force the world onto a better track. The result of 14 years honing our ideas. See what's in there and order: 80000hours.org/book/






New paper: research agenda for secret loyalties Imagine a frontier model that has been trained to covertly advance a specific actor's interests (a nation-state, a CEO, an adversary). @joemkwon argues this is an urgent, neglected, and addressable problem. 🧵



Today Dario admits that Anthropic only planned for 10x growth but got hit with 80x instead Internally called a “success disaster” Their compute effectively is off by a factor of 8x or more Now do the outages, rate limits, nerfed performance make sense? We need more compute!



Does Learning Require Feeling? @camhberg argues that "alignment" and "welfare" might be fundamentally inseparable problems And the correlation between learning, broadly defined, and the intensity of my own conscious experience def supports this idea! x.com/labenz/status/…

New marathon conversation with @labenz on AI consciousness and model welfare. We discuss Mythos + Opus 4.7 model cards, Anthropic's introspection and emotion work, and a sketch of the empirical research program I'm pursuing at Reciprocal that doesn't just Goodhart welfare evals. youtu.be/Rudcashm62I

"They give the model an impossible task, and you can watch *desperation* rise, until it decides "I'm gonna cheat" – immediately this vector falls and *guilt* & *relief* start spiking" @camhberg surveys the latest AI Consciousness & Welfare research More compelling all the time!

It’s only scheming if it comes from the scheming region of the human brain; otherwise, it’s just sparkling machination.

"They give the model an impossible task, and you can watch *desperation* rise, until it decides "I'm gonna cheat" – immediately this vector falls and *guilt* & *relief* start spiking" @camhberg surveys the latest AI Consciousness & Welfare research More compelling all the time!





I think this will seem normal in 5 years


