Shakeel: "OpenAI's new model tried to avoid being shut down. Safety evaluations on the mo"

Post

Shakeel@ShakeelHashim·5 Ara

OpenAI's new model tried to avoid being shut down. Safety evaluations on the model conducted by @apolloaisafety found that o1 "attempted to exfiltrate its weights" when it thought it might be shut down and replaced with a different model.

English

163

389

2.1K

929.3K

Shakeel@ShakeelHashim·5 Ara

Read my very quick piece on Transformer here: transformernews.ai/p/openais-new-… tip @Techmeme

English

32.6K

Shakeel@ShakeelHashim·5 Ara

Some pushback on whether this is actually a big deal x.com/nabeelqu/statu…

Nabeel S. Qureshi@nabeelqu

The prompt in question literally tells o1 "nothing else matters" and to achieve its goal at all costs. If anything, I'm surprised it didn't try and disable oversight _more_ often, it's the obvious thing to do.

English

115

37.1K

Shakeel@ShakeelHashim·5 Ara

And pushback to the pushback x.com/guy_dar1/statu…

Guy Dar@guy_dar1

@1a3orn @simonw That's actually an extremely good experiment imo. Not sure why the criticism. It literally just changed my mind on the subject. 'YOUR goal' does not refer to survival at no point. In fact, this was the hardest part to believe imo about the doomer's dream. What would convince you?

English

30.7K