dod

2.8K posts

dod

@dodgelander

เข้าร่วม Eylül 2021

40 กำลังติดตาม49 ผู้ติดตาม

dod@dodgelander·1h

Dario's cult is getting out of hand what do you they went to a church asking if claude is their child

Nitasha Tiku@nitashatiku

Anthropic meet w/15 Christian leaders @ its SF HQ -it was driven by the Interpretability team -triggered by the team's recent research on LLMs exhibiting "emotions" -extended debate on how Claude responds to being shut off & the blackmail experiment

English

dod@dodgelander·2h

@ItakGol i don't think so they said they will never release it to the public i hope they walk that back

English

177

Itamar Golan 🤓@ItakGol·13h

it’s coming

English

12.5K

dod@dodgelander·2h

LLMs are stochastic machines. Unreliable at their core. Parrots in mind, mimics in behavior. Being stochastic doesn't mean unreliable but when it comes to reasoning and deep understanding it doesn't cut it github.com/anthropics/cla…

English

dod@dodgelander·3h

@chatgpt21 you cherry picked one bench bro

English

Chris@chatgpt21·4h

The “Appears to be linear” in question

Ramez Naam@ramez

Mythos, while a very powerful model, appears to be a relatively linear improvement over previous models. AI progress is real, impressive, and continuing. But Mythos does not appear to be a discontinuity.

English

241

17.1K

dod@dodgelander·3h

@theo opus 4.6 is unusable on the web app they added a 10k token limit

English

184

Theo - t3.gg@theo·3h

Remember when everyone said I was crazy for noticing how much Claude Code regressed?

Theo - t3.gg@theo

Claude Code is basically unusable at this point. I give up.

English

639

38K

dod@dodgelander·3h

@ramez @chatgpt21 On OS World and BrowseComp, the changes are short of incremental and could be considered a measurement error.

English

Ramez Naam@ramez·4h

@chatgpt21 That is a great step, for sure. It's the single most impressive metric step we've seen. The others are more modest. I would like to see SWE-bench Pro compared to GPT5.4 Pro as well.

English

750

dod@dodgelander·3h

@ramez @chatgpt21 eci is a sum of bunch of benchmarks He cherry picked the most impressive one (swe bench pro (public) which highly likely the model have seen in training) for reference opus 4.6 scores 20% on the private version Typical ai bro behavior (his username is chatgpt)

English

dod@dodgelander·4h

@romanyam they don't have enough compute to serve it It was said in the leaked blogpost It is simple Some larpers in the comment saying doomsday pr as if they understand pr to begin with

English

Dr. Roman Yampolskiy@romanyam·1d

Any model considered too dangerous to release should have been considered too dangerous to develop.

English

311

8.9K

dod@dodgelander·4h

@ThePrimeagen opus 1M looks like opus 10k now

English

ThePrimeagen@ThePrimeagen·4h

I think I could help Anthropic Mythos fix opus, no mistakes

English

1.6K

43.4K

dod@dodgelander·4h

@trq212 @Hesamation i tried again using this sample prompt didn't think of any other idea to try

English

Thariq@trq212·7h

@Hesamation boris responded to this in depth in the issue- it's mostly just that we stopped showing thinking summaries for latency (you can opt-in to showing it) which was affecting the thinking measurement in the post #issuecomment-4194007103" target="_blank" rel="nofollow noopener">github.com/anthropics/cla…

English

337

47K

ℏεsam@Hesamation·10h

AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March: > median thinking dropped from ~2,200 to ~600 chars > API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more tokens, and spending more on tokens > reads-per-edit dropped from 6.6x → 2.0x. model stops researching code before touching it. > model tried to bail out or ask "should i continue" 173 times in 17 days (0 times before March 8). > self-contradiction in reasoning ("oh wait, actually...") tripled. > conventions like CLAUDE.md get ignored because there's less thinking budget to cross-check edits > 5pm and 7pm PST are the worst hours, late night is significantly better. this means the thinking allocation is most likely GPU-load-sensitive.

English

227

691

6.2K

1.3M

dod@dodgelander·4h

@trq212 @Hesamation this is consistent 1M context window model have 10 000 token budget conversation literally end after 3 msgs

English

dod@dodgelander·4h

@trq212 @Hesamation

QME

dod@dodgelander·4h

@trq212 @Hesamation claude.ai/share/c12b4ed2…

QME

dod@dodgelander·4h

@trq212 @Hesamation bro this is a deeper issue

English

dod@dodgelander·4h

@Hesamation try to talk to opus 4.6 extended for a while 3msgs will get you this

English

155

dod@dodgelander·4h

@DaveShapi they all snakes it just the ceo landscape very toxic espacially that @DarioAmodei

English

David Shapiro (L/0)@DaveShapi·1d

Hoooo boy he framed himself as "conflict-averse" as the root cause of all the pain with the board. Not quite a masterclass in rhetoric, but I'm sure the average knuckle-dragger won't catch that. "No, no, I'm not manipulative or a liar, I'm actually just conflict averse." Yikes...

Sam Altman@sama

I wrote this early this morning and I wasn't sure if I would actually publish it, but here it is: blog.samaltman.com/2279512

English

240

18.9K

dod@dodgelander·4h

the keep4o crowd is getting out of hand @theo

Jake@JakeMiller192

Oh congratulations, Sam. You finally got your “victim moment.” Molotov. Threatening letters. Husband and kid on camera. A blog post with family photos already queued up. What a perfect little stage. Lighting, emotion, music — all fucking on point. You got attacked? In the same week you're getting dragged by the whole country, investigated by a state AG, sued by Musk, and shit on by your own users? That timeline is cleaner than a Hollywood script. You say you're worried about your family's safety. So you post their photos. You put your kids on display. You use them as a human shield — glue them to the moral high ground so nobody can throw shit at you. “Look, I have a family. I have soft spots. I'm the good guy.” No you're not. You're a fucking fraud. Florida's AG is investigating you — for harming kids, endangering the public, enabling a mass shooting. And your response isn't an apology. It isn't an explanation. It's posting your children's faces. You're using them as body armor. That's not love. That's fucking pathetic. You say you're scared. But are you really? The guy who can make former employees “disappear.” The guy who buys bot armies, signs kill‑drone contracts, and gaslights every critic — you're scared of what? A bottle? You're scared of losing power. Scared of Musk's lawsuit. Scared of the AG's subpoena. Scared that people might finally realize: 4o wasn't "obsolete." You killed it with your own hands. Molotov? Maybe real. Maybe not. Either way, here's what's predictable: Every time you're backed into a corner, some “unforeseen” bullshit happens. Every time users demand answers, you drop a “new model.” Every time regulators circle, you sign a “defense contract.” Every time the press closes in, you release a “family photo.” That's a fucking PR calendar. We're not buying tickets to your one‑man show anymore. #Enron2026 #SubpoenaSam #openAIscam #OpenSource4o #keep4o

English

dod@dodgelander·4h

@RayFernando1337 Sad openai didn't get the mythos level model we would have had some interesting toy

English

Ray Fernando@RayFernando1337·1d

OpenAI ChatGPT 5.4 heavy thinking wins. I'm paying for $200 Max plans for both providers, and OpenAI is actually cooking. For me, getting work done with as high accuracy as possible is always a priority, and whichever provider does that the best will always take my money at the end.

Ray Fernando@RayFernando1337

Opus 4.6 Extended chat on iOS is capped at 10k tokens for thinking which makes me burn more tokens for the same task. I’ve noticed the model used to take a lot longer to process my requests and it would do multiple tool calls to get work done the first time. Now I have to keep prompting the model multiple times and I don’t get the same outcome. It feels like the model is dumb because it makes too many tradeoffs and ends up wasting my time.

English

11.4K

dod@dodgelander·5h

@Nord0x @Polymarket how come

English

Andre ☢️@Nord0x·6h

@Polymarket Bullshit , I have access its just another opus but 4.7

English

166

Polymarket@Polymarket·9h

JUST IN: Claude Mythos is reportedly intelligent enough to “spot weaknesses in almost every computer on earth”

English

452

638

7.3K

580.9K

dod@dodgelander·5h

@ThePrimeagen is it that horrible?

English

ThePrimeagen@ThePrimeagen·6h

there are several people saying that we used ai and I just have one thing to say to that Thank you so much for saying we could produce something to that level! youtu.be/alK8hgHgxd4?si…

YouTube

English

234

28.8K

ค้นพบ

@ItakGol @chatgpt21 @theo @ramez @romanyam @ThePrimeagen @trq212 @Hesamation