
@DaveRBanerjee I’d also imagine even an ~aligned sibling monitor is easier to evade, since deception is easier when you can accurately model the monitor, and a model has a much better implicit model of a sibling than of a different family?
English
Liam
37 posts

@liam_epstein
AI security and stability @CNASdc












Vice President JD Vance and Treasury Secretary Scott Bessent questioned Dario Amodei, Sundar Pichai, Sam Altman, Satya Nadella, George Kurtz and Nikesh Arora about the safe deployment of Mythos class models last week during a conference call.



Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing