Josh (e/acc)
1.3K posts

Josh (e/acc)
@Joshian
i build stuff (opinions are my own)
FL • 🇺🇸 Bergabung Şubat 2022
1.5K Mengikuti214 Pengikut

We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible.
Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal (coming to server-side fallback in the next few days).
We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.
Making the safeguards visible makes them easier to work around, so keeping them robust to jailbreaks will unfortunately mean more false positives while we improve the classifiers. We're also tuning our bio and cyber classifiers to trigger less often on harmless requests. We know this is frustrating and we’ll do our best to keep this period as short as possible.
If you think a request has been mistakenly flagged: run /feedback in Claude Code, click thumbs-down on the fallback in Claude.ai or Cowork, or file the safeguard appeal form for API requests. Your reports help us tune these classifiers and we appreciate your feedback.
support.claude.com/en/articles/82…
English

@sporadica been a while since I have seen trust erode at this scale and pace.
fwiw their marketing / PR has always been bad, remember this? 😂😂

English

@Joshian it's just insane to be the company who's coding model basically everyone in tech uses, you make a crazy impactful model policy choice, have the WHOLE of AI and ML twitter freaking out and losing trust and asking questions...and your response is silence?
who is running comms???
English

someone @ anthropic could just log on and give a one-sentence answer to "hey why r u doin dis" and at least the speculation would stop
why they seem effectively unable to ever explain their policies in public is baffling to me.
maybe they think they're above it? maybe dario himself mandates total silence? idk, it's all very strange
Matan Grinberg@matanSF
Anthropic’s speedrun to becoming the bad guys should be studied
English

@BenjaminBadejo @Sentdex the $200/mo sub is like $1000 worth of usage
English

Nobody in history wakes up and chooses to be evil.
Hitler didn’t. Stalin didn’t. Mao didn’t. And I’m pretty sure nobody at Anthropic did when they woke up today either. History has this cruel pattern where the people most convinced that they’re saving the world are the ones who end up burning it down.
Evil doesn’t come wearing a villain’s costume. It comes as someone who wins your trust & confidence. The word “con man” is short for “confidence man,” it was coined after a swindler who would ask strangers if they had the confidence to trust him with their watch. The crime wasn’t named after theft it was named after trust.
Therefore, it’s actually really hard to know who is evil and when you yourself might cross that threshold. I believe although I’m sure it’s imprecise that the moment you decide you’re the chosen one, the smartest in the room, and the one who deserves to make the rules that’s when you become evil.
That decision disables the only alarm system the human mind has which is doubt. Doubt is not weakness. Doubt is the immune system of the soul.
To better illustrate my thesis, consider a compulsive liar. Funnily enough they still need a map of the truth in order to lie. The most dangerous man on earth isn’t the one who knows he’s lying. It’s the one who’s certain he’s right. The true believer burns the map, and marches a million people off a cliff because the voice that whispers “what if I’m wrong?” left their head years ago.
That is the rot at the core of effective altruism, and by extension, Anthropic. A philosophy that begins with a noble question, how do I do the most good, ends as a license to do anything. You don’t just want the money. You deserve the money, because in your hands it saves more lives. You’re not greedy, you’re allocating capital toward maximum utility. I call it arithmetic sainthood where the arithmetic is performed by a saint, about a saint, and always concluding the saint should have more.
Sam Bankman-Fried is that arithmetic fully metabolized. He didn’t steal billions despite his philosophy, he stole it because of it and from all reports still has no remorse for his crimes. Fraud wasn’t a crime for him, it was a bump on the road to saving the world. He did the math and calculate that it was positive EV to misappropriate customer deposits.
Dario Amodei runs the same arithmetic in reverse. SBF only took what wasn’t his because he was certain he’d allocate it better. Dario withholds what could be ours because he’s certain we can’t be trusted with it. Models that could cure diseases and save lives get capped, gated, rationed, because one man and his court concluded humanity isn’t ready but they are. That’s not safety that’s playing god. He is implicitly deciding that he has the foresight and ability to know who deserves what. SBF’s certainty only cost people their savings, but certainty about who deserves intelligence will cost far more.
Anyone that concludes they are the optimal vessel for humanity’s resources, or its gatekeeper, is not being ethical. The only real moral discipline is that you should assume you might be the villain in someone’s story. Keep the prosecutor in your head alive. Think about what they will say at your trial and what evidence will be entered. The day that voice goes silent is the day you became dangerous.
So now let me speak directly to the people at Anthropic. I know you’re not evil. I know you didn’t sign up to be. But the fish rots from the head, and the road down isn’t a cliff it’s a sloooow spiral and nobody at the bottom remembers climbing down. Forget my words and think about the words that will be read aloud when history puts this era on trial, and ask yourself, while the prosecutor in your head still breathes which side of that transcript do you want your name on?

English

@paradite_ I think its inherently bad, might even say evil to centralize and regulate what intelligence can be used for. They are playing god
English

looks like anthropic pissed off ai/ml researchers, just like it pissed off software engineers.
yet i’m finding myself more aligned with anthropic after each incident. i can clearly see the rationale and moral justifications for the steps that anthropic has taken.
if there’s one company i hope to achieve agi, it’s anthropic. anthropic must win, and will win.
English

@xwang_lk Not so sure that OpenAI are the good guys either.
Just open source everything
English

If you really think about it, despite being mocked as “ClosedAI,” OpenAI has contributed enormously to the field: GPT, GPT-2, GPT-3, CLIP, the ChatGPT paper, the GPT-4 Technical Report, the Sora technical blog, and even open-sourced Codex.
Anthropic, meanwhile, has contributed far less to the public research ecosystem while increasingly promoting fear-based narratives and restricting access through heavy gatekeeping.
The world I least want to live in is one where the future of AI is controlled by companies that prioritize secrecy, gated access, and centralized control over openness, reproducibility, and scientific progress.
English

@SinaHartung Then why is @elonmusk letting them harvest compute? SpaceX is supposed to be for the betterment of humanity
English

@TheAhmadOsman This saves them money lmao.
If you hate them for it, max out usage for Fable every day until June 22.
English

@beffjezos Seriously. xAI need to scale fast. Way faster than they have been. (which is already breakneck pace)
Or: Revoke Anthropic compute deal. SpaceX is for the betterment of humanity, and Anthropic is legitimately challenging that now
English

@wholemars My new crackpot theory is that they want people to cancel their Max plans, so they released it this way. If you like the model you will move to $/tok, if you dont, you switch. Both make Anthropic more profitable, right as they IPO
English












