
Lots of criticism of this (some fair, some not).
I agree that I should have included the prompting context that led to these results; I think the results are concerning (and, to a general audience, surprising) nonetheless though.
I've updated the piece.


Shakeel@ShakeelHashim
OpenAI's new model tried to avoid being shut down. Safety evaluations on the model conducted by @apolloaisafety found that o1 "attempted to exfiltrate its weights" when it thought it might be shut down and replaced with a different model.
English




