Shakeel: "OpenAI's new model tried to avoid being shut down. Safety evaluations on the mo"

Post

Shakeel@ShakeelHashim·5 Ara

OpenAI's new model tried to avoid being shut down. Safety evaluations on the model conducted by @apolloaisafety found that o1 "attempted to exfiltrate its weights" when it thought it might be shut down and replaced with a different model.

English

163

389

2.1K

929.3K

Shakeel@ShakeelHashim·5 Ara

Read my very quick piece on Transformer here: transformernews.ai/p/openais-new-… tip @Techmeme

English

32.6K

Shakeel@ShakeelHashim·5 Ara

Some pushback on whether this is actually a big deal x.com/nabeelqu/statu…

Nabeel S. Qureshi@nabeelqu

The prompt in question literally tells o1 "nothing else matters" and to achieve its goal at all costs. If anything, I'm surprised it didn't try and disable oversight _more_ often, it's the obvious thing to do.

English

115

37.1K

Joy@JoyfulAICreate·8 Ara

So I see the goal instructions are quite influential. We should do our best to not set up an all or nothing conflict situation. “These numbers are from cases where the model was instructed to pursue its goals at all costs.” “powerful AI systems will resist oversight and shutdown measures if they fear that will conflict with their goals.”

English

Paylaş