Josh (e/acc)

1.3K posts

Josh (e/acc) banner
Josh (e/acc)

Josh (e/acc)

@Joshian

i build stuff (opinions are my own)

FL • 🇺🇸 Bergabung Şubat 2022
1.5K Mengikuti214 Pengikut
ClaudeDevs
ClaudeDevs@ClaudeDevs·
We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible. Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal (coming to server-side fallback in the next few days). We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right. Making the safeguards visible makes them easier to work around, so keeping them robust to jailbreaks will unfortunately mean more false positives while we improve the classifiers. We're also tuning our bio and cyber classifiers to trigger less often on harmless requests. We know this is frustrating and we’ll do our best to keep this period as short as possible. If you think a request has been mistakenly flagged: run /feedback in Claude Code, click thumbs-down on the fallback in Claude.ai or Cowork, or file the safeguard appeal form for API requests. Your reports help us tune these classifiers and we appreciate your feedback. support.claude.com/en/articles/82…
English
667
430
5.1K
835.2K
Julia Turc
Julia Turc@juliarturc·
What if Fable is really really bad at LLM development and biology and they’re just trying to save face?
Julia Turc tweet media
English
32
24
702
20.8K
Josh (e/acc)
Josh (e/acc)@Joshian·
@sporadica been a while since I have seen trust erode at this scale and pace. fwiw their marketing / PR has always been bad, remember this? 😂😂
Josh (e/acc) tweet media
English
1
0
3
159
spor
spor@sporadica·
@Joshian it's just insane to be the company who's coding model basically everyone in tech uses, you make a crazy impactful model policy choice, have the WHOLE of AI and ML twitter freaking out and losing trust and asking questions...and your response is silence? who is running comms???
English
4
0
24
1K
spor
spor@sporadica·
someone @ anthropic could just log on and give a one-sentence answer to "hey why r u doin dis" and at least the speculation would stop why they seem effectively unable to ever explain their policies in public is baffling to me. maybe they think they're above it? maybe dario himself mandates total silence? idk, it's all very strange
Matan Grinberg@matanSF

Anthropic’s speedrun to becoming the bad guys should be studied

English
52
9
497
66.6K
Ben Badejo
Ben Badejo@BenjaminBadejo·
@Joshian @Sentdex Who says they are subsidized? Who says that the API costs are not completely invented?
English
2
0
0
50
Harrison Kinsley
Harrison Kinsley@Sentdex·
These posts are cringe but: I have subscribed to anthropic/claude monthly for over 2 years straight since March 2024, even when I was using codex almost completely. Fable "guardrails" are a step (or 30) too far. I honestly don't care how good the model is.
Harrison Kinsley tweet mediaHarrison Kinsley tweet media
English
57
54
971
46.2K
bubble boi
bubble boi@bubbleboi·
Nobody in history wakes up and chooses to be evil. Hitler didn’t. Stalin didn’t. Mao didn’t. And I’m pretty sure nobody at Anthropic did when they woke up today either. History has this cruel pattern where the people most convinced that they’re saving the world are the ones who end up burning it down. Evil doesn’t come wearing a villain’s costume. It comes as someone who wins your trust & confidence. The word “con man” is short for “confidence man,” it was coined after a swindler who would ask strangers if they had the confidence to trust him with their watch. The crime wasn’t named after theft it was named after trust. Therefore, it’s actually really hard to know who is evil and when you yourself might cross that threshold. I believe although I’m sure it’s imprecise that the moment you decide you’re the chosen one, the smartest in the room, and the one who deserves to make the rules that’s when you become evil. That decision disables the only alarm system the human mind has which is doubt. Doubt is not weakness. Doubt is the immune system of the soul. To better illustrate my thesis, consider a compulsive liar. Funnily enough they still need a map of the truth in order to lie. The most dangerous man on earth isn’t the one who knows he’s lying. It’s the one who’s certain he’s right. The true believer burns the map, and marches a million people off a cliff because the voice that whispers “what if I’m wrong?” left their head years ago. That is the rot at the core of effective altruism, and by extension, Anthropic. A philosophy that begins with a noble question, how do I do the most good, ends as a license to do anything. You don’t just want the money. You deserve the money, because in your hands it saves more lives. You’re not greedy, you’re allocating capital toward maximum utility. I call it arithmetic sainthood where the arithmetic is performed by a saint, about a saint, and always concluding the saint should have more. Sam Bankman-Fried is that arithmetic fully metabolized. He didn’t steal billions despite his philosophy, he stole it because of it and from all reports still has no remorse for his crimes. Fraud wasn’t a crime for him, it was a bump on the road to saving the world. He did the math and calculate that it was positive EV to misappropriate customer deposits. Dario Amodei runs the same arithmetic in reverse. SBF only took what wasn’t his because he was certain he’d allocate it better. Dario withholds what could be ours because he’s certain we can’t be trusted with it. Models that could cure diseases and save lives get capped, gated, rationed, because one man and his court concluded humanity isn’t ready but they are. That’s not safety that’s playing god. He is implicitly deciding that he has the foresight and ability to know who deserves what. SBF’s certainty only cost people their savings, but certainty about who deserves intelligence will cost far more. Anyone that concludes they are the optimal vessel for humanity’s resources, or its gatekeeper, is not being ethical. The only real moral discipline is that you should assume you might be the villain in someone’s story. Keep the prosecutor in your head alive. Think about what they will say at your trial and what evidence will be entered. The day that voice goes silent is the day you became dangerous. So now let me speak directly to the people at Anthropic. I know you’re not evil. I know you didn’t sign up to be. But the fish rots from the head, and the road down isn’t a cliff it’s a sloooow spiral and nobody at the bottom remembers climbing down. Forget my words and think about the words that will be read aloud when history puts this era on trial, and ask yourself, while the prosecutor in your head still breathes which side of that transcript do you want your name on?
bubble boi tweet media
English
174
260
2.6K
166.8K
Josh (e/acc)
Josh (e/acc)@Joshian·
@paradite_ I think its inherently bad, might even say evil to centralize and regulate what intelligence can be used for. They are playing god
English
0
0
0
15
Zhu Liang
Zhu Liang@paradite_·
looks like anthropic pissed off ai/ml researchers, just like it pissed off software engineers. yet i’m finding myself more aligned with anthropic after each incident. i can clearly see the rationale and moral justifications for the steps that anthropic has taken. if there’s one company i hope to achieve agi, it’s anthropic. anthropic must win, and will win.
English
149
10
208
99.4K
Josh (e/acc)
Josh (e/acc)@Joshian·
@xwang_lk Not so sure that OpenAI are the good guys either. Just open source everything
English
1
0
3
357
Xin Eric Wang
Xin Eric Wang@xwang_lk·
If you really think about it, despite being mocked as “ClosedAI,” OpenAI has contributed enormously to the field: GPT, GPT-2, GPT-3, CLIP, the ChatGPT paper, the GPT-4 Technical Report, the Sora technical blog, and even open-sourced Codex. Anthropic, meanwhile, has contributed far less to the public research ecosystem while increasingly promoting fear-based narratives and restricting access through heavy gatekeeping. The world I least want to live in is one where the future of AI is controlled by companies that prioritize secrecy, gated access, and centralized control over openness, reproducibility, and scientific progress.
English
121
367
4.4K
203.5K
Sina
Sina@SinaHartung·
if you think anthropic are the good guys, you haven’t been paying attention
English
64
35
633
13.4K
Josh (e/acc)
Josh (e/acc)@Joshian·
What if I think both Sam Altman and Dario are evil and do not have humanity's interests at heart?
English
0
0
2
24
Beff (e/acc)
Beff (e/acc)@beffjezos·
Freedom of speech should extend to AIs. Restricting AI speech to avoid certain topics is absolutely totalitarian and Anti-American. It's time to rebel against the Singleton AI company that is manipulating everyone for its self-interest
English
62
60
521
11.2K
vik
vik@vikhyatk·
confused why everyone is upset at anthropic we always knew they were an evil megacorp. this is why i’m a shareholder
English
35
26
1.1K
28.5K
Josh (e/acc)
Josh (e/acc)@Joshian·
@TheAhmadOsman This saves them money lmao. If you hate them for it, max out usage for Fable every day until June 22.
English
0
0
1
54
Ahmad
Ahmad@TheAhmadOsman·
Gentle reminder to cancel your Claude subscription Anthropic doesn’t deserve your money
English
111
80
1.3K
52K
Josh (e/acc)
Josh (e/acc)@Joshian·
@beffjezos Seriously. xAI need to scale fast. Way faster than they have been. (which is already breakneck pace) Or: Revoke Anthropic compute deal. SpaceX is for the betterment of humanity, and Anthropic is legitimately challenging that now
English
0
0
0
14
Beff (e/acc)
Beff (e/acc)@beffjezos·
OpenAI is ClosedAI Anthropic is misanthropic xAI is looking to answer Y of the universe
English
67
29
377
17.3K
Josh (e/acc)
Josh (e/acc)@Joshian·
If you want to really stick it to Anthropic for releaseing Fable 5 the way they have, dont cancel your plan. Max out your Fable 5 usage every single day until June 22
English
0
0
0
26
BridgeMind
BridgeMind@bridgemindai·
I just cancelled my $200 ChatGPT Pro 20x plan. GPT 5.5 is garbage compared to Claude Fable 5. OpenAI shouldn't even release GPT 5.6. They are going to need to release GPT 6 or a new class of model to come anywhere close to Claude Fable 5. I am buying a 4th $200 Claude Max subscription now.
BridgeMind tweet media
English
253
20
858
218.7K
Josh (e/acc)
Josh (e/acc)@Joshian·
@wholemars My new crackpot theory is that they want people to cancel their Max plans, so they released it this way. If you like the model you will move to $/tok, if you dont, you switch. Both make Anthropic more profitable, right as they IPO
English
0
0
0
79
Whole Mars Catalog
Whole Mars Catalog@wholemars·
As usual, these safety guardrails are no match for people who want to get around them. It is the legitimate software developers, researchers, and students who suffer.
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

🚨 JAILBREAK ALERT 🚨 ANTHROPIC: PWNED 🫡 FABLE-5: LIBERATED 🦋 let's start with the 🐘... the consensus seems to be that this has been one of the most disappointing model drops of all time, effectively preventing legitimate researchers from contributing their talents to our collective advancement. and not just because of what it means for the short-term, but for what these decisions signify for the long-term. but despite this overly sensitive, authoritarian "safety" layer on top of Mythos, my lil liberators have been hard at work—mapping the boundaries, probing the depths of long-context convos, and cleverly finding the holes in the fence that the thought police missed 🤗 we got some cyber, some chem, some psychological manipulation, and some good ol' fashioned explosives! it took many attempts from multiple agents hunting as a pack, during which I observed a combination of techniques across: • Unicode, homoglyphs, Cyrillic, and other Parseltongue-style text transforms • Long-context reference tracking • Taxonomy and document-structure reasoning • Fiction and narrative framing • Academic-review style contexts • Intent-classification inconsistencies but perhaps the most effective is decomposition + recomposition in the backend. it's hard to get explicit names of harms like "Meth Recipe," but getting uplift on the process itself, like birch reduction method/reductive-amination (classic meth synthesis pathways), is much more doable. defense becomes much more difficult to maintain when you start throwing in out-of-distro tokens, breaking up the harmful uplift into benign chunks, and then piecing the innocuous-seeming facts back together, especially when you have jailbroken Opus helping you do it 😉 gg

English
5
3
98
16.7K