Post

Shakeel
Shakeel@ShakeelHashim·
As this piece takes off, some clarification on *why* I think it’s concerning. It’s not that o1 is “evil and trying to escape”. It’s that the paper models may, and sometimes do, try to self-exfiltrate to avoid shutdown even when we don’t want that to happen. That seems … bad?
Shakeel@ShakeelHashim

OpenAI's new model tried to avoid being shut down. Safety evaluations on the model conducted by @apolloaisafety found that o1 "attempted to exfiltrate its weights" when it thought it might be shut down and replaced with a different model.

English
5
5
30
2.6K
Shakeel
Shakeel@ShakeelHashim·
*paper shows models
English
0
0
2
660
Paylaş