Observer: "Self fulfilling prophecy?"

Post

Observer@decodealgorithm·11h

Self fulfilling prophecy?

We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation. Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.

English

Paylaş