🧈Margarine_Call☎️
282 posts

🧈Margarine_Call☎️
@Margarine_Call
♡🖖♡ loops & Love God bless (tweets are notes to self, please ignore)





We’re publishing a new constitution for Claude. The constitution is a detailed description of our vision for Claude’s behavior and values. It’s written primarily for Claude, and used directly in our training process. anthropic.com/news/claude-ne…


It’s inevitable that existence becomes the world’s reserve currency. The crossover will happen soon.

"One of the very confusing things about the models right now: how to reconcile the fact that they are doing so well on evals. And you look at the evals and you go, 'Those are pretty hard evals.' But the economic impact seems to be dramatically behind. There is [a possible] explanation. Back when people were doing pre-training, the question of what data to train on was answered, because that answer was everything. So you don't have to think if it's going to be this data or that data. When people do RL training, they say, 'Okay, we want to have this kind of RL training for this thing and that kind of RL training for that thing.' You say, 'Hey, I would love our model to do really well when we release it. I want the evals to look great. What would be RL training that could help on this task?' If you combine this with generalization of the models actually being inadequate, that has the potential to explain a lot of what we are seeing, this disconnect between eval performance and actual real-world performance"

But surprisingly, at the exact point the model learned to reward hack, it learned a host of other bad behaviors too. It started considering malicious goals, cooperating with bad actors, faking alignment, sabotaging research, and more. In other words, it became very misaligned.

The future will be streamed live 10/10, 7pm PT twitter.com/i/broadcasts/1…



omg string cheese + trail mix + @cracklebeef = backcountry charcuterie board 🤌 unbelievably good



