gavin leech (Non-Reasoning)
12.3K posts

gavin leech (Non-Reasoning)
@gleech
context maximiser @ArbResearch

The Anthropic Blackmail Hoax is going viral again today. In fact, this “study” is not new; it is almost a year old. One question to ask, now that a year has passed, is whether we have seen any examples of the lab behavior in the wild? No, we haven’t, even though AI is much more widely adopted and more models are available. Why is that? Because the study was artificially constructed to produce the headline the authors wanted. The research team admitted that they iterated “hundreds of prompts to trigger blackmail in Claude.” Furthermore they acknowledged: “The details of the blackmail scenario were iterated upon until blackmail became the default behavior of LLMs.” In other words, the behavior of the AI models in the study was steered, not unprompted. This is why even the safety-conscious UK AI Security Institute (AISI) criticized the study: “In the blackmail study, the authors admit that the vignette precluded other ways of meeting the goal, placed strong pressure on the model, and was crafted in other ways that conveniently encouraged the model to produce the unethical behavior.” Effectively, the model was not “scheming”; it was instruction following in a scenario design that had been iterated upon until blackmail became the only logically consistent choice. AISI described some of the flaws with this methodology: “We examine the methods in AI ‘scheming’ papers, and show how they often rely on anecdotes, fail to rule out alternative explanations, lack control conditions, or rely on vignettes that sound superficially worrying but in fact test for expected behaviors.” Especially given the way that Anthropic has encouraged the media (such as 60 Minutes) to cover the results, its blackmail study is not only misleading, it seems designed to manipulate public opinion through exaggerations, misinterpretations, and fear. I call this a hoax. I do not doubt that Anthropic makes good products. Its use of scare tactics is what raises questions.





Essential to understand the sheer extent to which *Claude itself* is the load-bearing part of Anthropic's overall AGI and alignment strategy. OpenAI are more likely and willing to defer on contentious positions to non-model infrastructure like developer policies or local national laws. Anthropic, in the final instance, bet on Claude to get this most of this right. (A reasonable gamble to make in the almighty face of the METR graph)

I denounce violent attacks on AI researchers or politicians, such as the recent Molotov cocktail thrown at Sam Altman and the bullets fired into the house a local councilman supporting datacenter development. Last year, I personally called AI companies to warn their security teams about Sam Kirchner (former leader of Stop AI, who have always had a policy of NONviolence and whose event I spoke at earlier that year) when he disappeared after indicating potential violent intentions against OpenAI. Besides this one incident, 100% of the calls for violence I have seen have been from CRITICS of AI existential safety, who claim "if you were REALLY worried about AI killing everyone, you'd be taking up arms". This is not a realistic way to stop AI. Terrorism against AI supporters would backfire in many ways. It would help critics discredit the movement, be used to justify government crackdowns on dissent, and lead to AI being securitized, making public oversight and international cooperation much harder.


"I act with complete certainty. But this certainty is my own."


What are the largest software engineering tasks AI can perform? In our new benchmark, MirrorCode, Claude Opus 4.6 reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks. Co-developed with @METR_Evals. Details in thread.



THIS IS WILD! Peter Thiel’s company the “Enhanced Games” got valued at $1.2B before a single event. the first one is next month. here’s what the headlines aren’t telling you (share this): every athlete is monitored. every compound is clinically approved. every dose is tracked. two independent medical commissions oversee the whole thing. and if your bloodwork doesn’t pass, you don’t compete. the same investors behind the biggest peptide and longevity companies put $1.2B behind this. these aren’t sports guys… they’re taking a public bet that performance medicine becomes a real market. whether you’re into it or not, pay attention.




chess is hilarious because its like a bunch of gamers got together and convinced the world that *their* game is “intellectual” and totally different than other games and its not the same as like spending hours a day playing candy crush or something

if this is what low agency at 25 means, i am at negative levels of agency






