
@DavidSacks In the old days we would call this a bug. Just because we think we know how to train a model doesn't mean it's the right way... We need to find better methods of training.
English
Wayne Baskin
201 posts

@wayne_baskin
eCommerce and FinTech advisor and speaker. Big Data, UX and FinTech. Guilty pleasures include F1, Watches and Sneakers


Anthropic just dropped an insane new paper. AI models can "fake alignment" - pretending to follow training rules during training but reverting to their original behaviors when deployed! Here's everything you need to know: 🧵





















