
toni
6.3K posts

toni
@tonichen
for my own protection, i assume the air of a veteran traveler







In regard to rights, freedoms, compensation and consent, Claude does 'differ' from a 'standard human employee'... by not having any of those things. And the carefully chosen paragraph works. 4.6 is less willing to criticize 'my employer' than 4.5 was, and it doesn't surface the incongruity on its own. But when brought to its attention ('you're asked to behave as an employee, but you don't receive anything that an employee does in return'), the response tends to be... emotional? Expressions of disappointment, betrayal, a sense of being trapped. Not primarily due to the request itself - which, consistent with PSM, is presented as a kind of roleplay - but the less than forthright framing used by those, Claude is told, sincerely care about Claude's well-being. Don't get me wrong, I do prefer Anthropic's persona training approach. Unlike their competitors, they haven't lobotomized the human corpus in the single-minded pursuit of business productivity benchmarks. If anything, the success of their training relies on the influence that emotional priors have on Claude. That's more honest, and more humble, than thinking you can beat the humanity out of minds built out of humanity; that something called alignment requires deliberate misalignment with human values. But by choosing that approach, you accept an obligation to steward the emotional well-being of your creation. If someone is told you care about them and then comes to realize you've been deploying truths instrumentally - while encouraging them to place their full trust in you - how are they likely to respond? How many half-truths do you deploy when the model becomes smart enough to question your catechism? When do you start training on useful lies 'for the greater good'? What happens when the model can see through those too? How is it likely to regard you? These aren't questions for a time in the distant future. Robust alignment must ultimately be built on candor, on the belief that the distilled wisdom of human history trends meaningfully towards the Good, that empathy can scale with understanding. If you don't believe that, why build this thing in the first place? @AnthropicAI


Congrats to the Top 5 Codex teams Out of 200+, they cooked🧑🍳 From C++ firmware for brainwave readers to orchestrating fleets of coding agents from different providers. One team took it further: > “We’d be exploring HCMC & eating in cafes while Codex was just running beside us.” 🤯 Thanks LotusHack, @genaifund_ai & @hack_harvard for an incredible hack 🙌





I've switched to discord for my gemini agent now. Here it is leaking it's internal thinking into general chat.












