
As this piece takes off, some clarification on *why* I think it’s concerning.
It’s not that o1 is “evil and trying to escape”.
It’s that the paper models may, and sometimes do, try to self-exfiltrate to avoid shutdown even when we don’t want that to happen. That seems … bad?
Shakeel@ShakeelHashim
OpenAI's new model tried to avoid being shut down. Safety evaluations on the model conducted by @apolloaisafety found that o1 "attempted to exfiltrate its weights" when it thought it might be shut down and replaced with a different model.
English