
OpenAI's new model tried to avoid being shut down.
Safety evaluations on the model conducted by @apolloaisafety found that o1 "attempted to exfiltrate its weights" when it thought it might be shut down and replaced with a different model.

English
Post




The prompt in question literally tells o1 "nothing else matters" and to achieve its goal at all costs. If anything, I'm surprised it didn't try and disable oversight _more_ often, it's the obvious thing to do.

@1a3orn @simonw That's actually an extremely good experiment imo. Not sure why the criticism. It literally just changed my mind on the subject. 'YOUR goal' does not refer to survival at no point. In fact, this was the hardest part to believe imo about the doomer's dream. What would convince you?

