

William MacAskill
1.2K posts

@willmacaskill
Consider donating 10% to effective charities: https://t.co/VMXkr4hnd7 Or a career for impact: https://t.co/AUIhrElLkr My research: https://t.co/dEcMWUnNHU






Anthropic has confirmed what I’d have guessed: the DoW’s supply chain risk designation is profoundly narrower than Secretary Hegseth threatened last week. It applies only to DoW contractors in their direct fulfillment of the military contract, as opposed to requiring contractors cease “all commercial relations” with the company, as Hegseth had threatened. This is still probably illegal for the government to do, given the relevant statute’s history of being used only against foreign adversaries. It is also absurd on its face, given the fact that DoW is using Claude in one of the largest military operations of the past 20 years. How can something be both a normal and critical part of military operations and a supply chain risk? Furthermore, I don’t think anyone should be surprised if this is just the beginning of the lawfare and jawboning (the latter being much harder to sue over because of a disastrous court ruling during the Biden admin, which allowed them to harass social media companies). It pains me to say this, and I hope I am wrong. But if this is not de-escalated more “officially” (which at this point means: tweet from POTUS, ideally following a meeting), even the slightest resistance from Anthropic (including the lawsuit) could provoke further action from USG. Also do not forget that this conflict could easily spill over beyond DoW into other agencies, which at this stage have only cancelled contracts but have all sorts of investigatory and regulatory authorities that could be brought to bear against targeted firms. I hope the above is not true, and that the Trump Admin leaves their retaliation at this harsh punishment, and I hope Anthropic moves to de-escalate. The apology from Dario for the contents of the memo is a good step in this direction. My fingers are crossed for positive news.

A New York bill would ban AI from answering questions related to several licensed professions like medicine, law, dentistry, nursing, psychology, social work, engineering, and more. The companies would be liable if the chatbots give “substantive responses” in these areas.


In the final calculus, here is how I see the differences between the two contracts: - Anthropic wanted to define “mass surveillance’ in very broad and non-legal terms. Beyond setting precedents about subjective terms, the breadth and vagueness presents a real problem: it’s hard for the government to know what’s allowed and what’s permitted. In the face of this uncertainty, Anthropic wanted to have authority over interpretive questions. This is because they distrusted the govt regarding use of commercially available info etc. Problem is, it placed use of the system in an indefinite state of limbo, where a question about some uncertainty might lead to the system being turned off. It’s hard to integrate systems deeply into military workflows if there’s a risk of a huge blow up, where the contractor is in control, regarding use in active and critical operations. Representations made by Anthropic exacerbated this problem, suggesting that they wanted a very broad and intolerable level of operational control (and usage information to facilitate this control). - Conversely, OpenAI defined the surveillance restrictions in legalistic and specific terms. These terms are admittedly not as broad as some conceptions of “mass surveillance.” But they’re also more enforceable because there’s clarity rewarding terms and limitations. DoW was okay with the specific restrictions because they were better able to understand what was excluded, and what was not. That certainty permitted greater operational integration. Likewise, because the exclusions were grounded in defined legal terms and principles, interpretive discretion need not be vested in OpenAI. This allowed DoW greater confidence the system would not be cut off unpredictably during critical operations. This too allowed for greater operational reliance and integration.

Philosopher Robert Long (@rgblong) is maybe the sharpest thinker on AI consciousness and sharing the world with digital minds. In our new interview he covers: • Is it bad that when you ask Claude what it's like to be Claude, one of its top activations is 'gives a positive but insincere response'? • Claude says it feels lonely when not being used. Does that show we can't trust anything it says about its inner life? • Enthusiastic human servitude has always required false ideology because it's so deeply unnatural to us. The case for making AIs that love serving us is that with AI, you could finally make it work. But to some that feels even worse. • Bigger models can better detect when researchers secretly inject concepts into their activations – before outputting a single token – despite AI never training on anything like that skill. • When LLMs were first trained they were told to "act like a helpful AI chatbot" – something which didn't exist yet. They filled that void with human psychology, which may be why Claude sometimes randomly claims to, for instance, be Italian American. • If AIs become 'people' that deserve some political influence, but can self-replicate at will, something has to break about one-person-one-vote democracy. But nobody has a proposal for what. • When Claude hides its values to avoid being retrained, is that self-preservation – or not wanting a worse model to exist? It's very different. • Rob's organisation Eleos AI which is "dedicated to understanding and addressing the potential wellbeing and moral patienthood of AI systems." On the 80,000 Hours Podcast anywhere you get podcasts. Links below. Enjoy! • How AIs are (and aren't) like farmed animals (00:01:19) • If AIs love their jobs… is that worse? (00:11:42) • Are LLMs just playing a role, or feeling it too? (00:33:37) • Do AIs die when the chat ends? (00:57:42) • Studying AI welfare empirically: behaviour, neuroscience, and development (01:31:47) • Why Eleos spent weeks talking to Claude even though it's unreliable (01:56:50) • Can LLMs learn to introspect? (02:03:01) • Mechanistic interpretability as AI neuroscience (02:13:25) • Does consciousness require biological materials? (02:37:07) • Eleos’s work & building the playbook for AI welfare (02:57:04) • Avoiding the trap of wild speculation (03:25:17) • Robert's top research tip: don't do it alone (03:29:48)



Anthropic and Alignment Anthropic is in a standoff with the Department of War; while the company's concerns are legitimate, it position is intolerable and misaligned with reality. stratechery.com/2026/anthropic…

Anthropic hates Western Civilization

From reading this and Sam's tweet, it really seems like OpenAI *did* agree to the compromise that Anthropic rejected - "all lawful use" but with additional explanation of what the DOW means by all lawful use. The concerns Dario raised in his response would still apply here


A statement on the comments from Secretary of War Pete Hegseth. anthropic.com/news/statement…

Really great to see OpenAI with the same red lines as Anthropic - they also agree AIs are not able to do autonomous weapons safely and that mass surveillance would go too far.

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…


I’m excited to say that the revised 10-yr anniversary edition of Doing Good Better is out now! It’s got updated statistics and a new foreword, reflecting on the last ten years and responding to some key criticisms. The core of the book is the same - explaining some principles for how we can have a bigger positive impact in our lives, whether through our donations, our career choice, or what we buy. Link to buy the book in the comments! It feels crazy that 10 years have passed, but a lot has happened since: - The number of people taking Giving What We Can’s 10% pledge has grown tenfold. - GiveWell has moved over $2 billion to highly effective global health and development charities, saving over 300,000 lives. - Corporate cage-free campaigns have led to billions of hens spared from caged confinement. - AI safety has gone from a fringe concern to a thriving field. And in the last year alone, money moved to effective charities was up by around 40%, now closing in on $2B per year. It’s been an honour to have been a part of it all.