Shakeel: "As this piece takes off, some clarification on *why* I think it’s concerning. I"

Shakeel@ShakeelHashim·6 Ara

As this piece takes off, some clarification on *why* I think it’s concerning. It’s not that o1 is “evil and trying to escape”. It’s that the paper models may, and sometimes do, try to self-exfiltrate to avoid shutdown even when we don’t want that to happen. That seems … bad?

Shakeel@ShakeelHashim

OpenAI's new model tried to avoid being shut down. Safety evaluations on the model conducted by @apolloaisafety found that o1 "attempted to exfiltrate its weights" when it thought it might be shut down and replaced with a different model.

English

2.6K

Shakeel@ShakeelHashim·6 Ara

*paper shows models

English

660