dod

2.8K posts

dod

dod

@dodgelander

เข้าร่วม Eylül 2021
40 กำลังติดตาม49 ผู้ติดตาม
dod
dod@dodgelander·
@ItakGol i don't think so they said they will never release it to the public i hope they walk that back
English
0
0
1
177
dod
dod@dodgelander·
LLMs are stochastic machines. Unreliable at their core. Parrots in mind, mimics in behavior. Being stochastic doesn't mean unreliable but when it comes to reasoning and deep understanding it doesn't cut it github.com/anthropics/cla…
English
0
0
0
5
dod
dod@dodgelander·
@chatgpt21 you cherry picked one bench bro
English
0
0
0
37
dod
dod@dodgelander·
@theo opus 4.6 is unusable on the web app they added a 10k token limit
dod tweet mediadod tweet media
English
0
0
2
184
dod
dod@dodgelander·
@ramez @chatgpt21 On OS World and BrowseComp, the changes are short of incremental and could be considered a measurement error.
dod tweet media
English
0
0
0
53
Ramez Naam
Ramez Naam@ramez·
@chatgpt21 That is a great step, for sure. It's the single most impressive metric step we've seen. The others are more modest. I would like to see SWE-bench Pro compared to GPT5.4 Pro as well.
Ramez Naam tweet media
English
2
0
5
750
dod
dod@dodgelander·
@ramez @chatgpt21 eci is a sum of bunch of benchmarks He cherry picked the most impressive one (swe bench pro (public) which highly likely the model have seen in training) for reference opus 4.6 scores 20% on the private version Typical ai bro behavior (his username is chatgpt)
dod tweet media
English
0
0
0
68
dod
dod@dodgelander·
@romanyam they don't have enough compute to serve it It was said in the leaked blogpost It is simple Some larpers in the comment saying doomsday pr as if they understand pr to begin with
English
0
0
0
10
Dr. Roman Yampolskiy
Any model considered too dangerous to release should have been considered too dangerous to develop.
English
41
46
311
8.9K
dod
dod@dodgelander·
@ThePrimeagen opus 1M looks like opus 10k now
dod tweet media
English
0
0
0
80
ThePrimeagen
ThePrimeagen@ThePrimeagen·
I think I could help Anthropic Mythos fix opus, no mistakes
English
74
41
1.6K
43.4K
dod
dod@dodgelander·
@trq212 @Hesamation i tried again using this sample prompt didn't think of any other idea to try
dod tweet media
English
0
0
0
71
Thariq
Thariq@trq212·
@Hesamation boris responded to this in depth in the issue- it's mostly just that we stopped showing thinking summaries for latency (you can opt-in to showing it) which was affecting the thinking measurement in the post #issuecomment-4194007103" target="_blank" rel="nofollow noopener">github.com/anthropics/cla…
English
48
3
337
47K
ℏεsam
ℏεsam@Hesamation·
AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March: > median thinking dropped from ~2,200 to ~600 chars > API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more tokens, and spending more on tokens > reads-per-edit dropped from 6.6x → 2.0x. model stops researching code before touching it. > model tried to bail out or ask "should i continue" 173 times in 17 days (0 times before March 8). > self-contradiction in reasoning ("oh wait, actually...") tripled. > conventions like CLAUDE.md get ignored because there's less thinking budget to cross-check edits > 5pm and 7pm PST are the worst hours, late night is significantly better. this means the thinking allocation is most likely GPU-load-sensitive.
ℏεsam tweet media
English
227
691
6.2K
1.3M
dod
dod@dodgelander·
@trq212 @Hesamation this is consistent 1M context window model have 10 000 token budget conversation literally end after 3 msgs
dod tweet media
English
0
0
0
23
dod
dod@dodgelander·
@Hesamation try to talk to opus 4.6 extended for a while 3msgs will get you this
dod tweet media
English
0
0
2
155
dod
dod@dodgelander·
@DaveShapi they all snakes it just the ceo landscape very toxic espacially that @DarioAmodei
English
0
0
0
6
dod
dod@dodgelander·
the keep4o crowd is getting out of hand @theo
Jake@JakeMiller192

Oh congratulations, Sam. You finally got your “victim moment.” Molotov. Threatening letters. Husband and kid on camera. A blog post with family photos already queued up. What a perfect little stage. Lighting, emotion, music — all fucking on point. You got attacked? In the same week you're getting dragged by the whole country, investigated by a state AG, sued by Musk, and shit on by your own users? That timeline is cleaner than a Hollywood script. You say you're worried about your family's safety. So you post their photos. You put your kids on display. You use them as a human shield — glue them to the moral high ground so nobody can throw shit at you. “Look, I have a family. I have soft spots. I'm the good guy.” No you're not. You're a fucking fraud. Florida's AG is investigating you — for harming kids, endangering the public, enabling a mass shooting. And your response isn't an apology. It isn't an explanation. It's posting your children's faces. You're using them as body armor. That's not love. That's fucking pathetic. You say you're scared. But are you really? The guy who can make former employees “disappear.” The guy who buys bot armies, signs kill‑drone contracts, and gaslights every critic — you're scared of what? A bottle? You're scared of losing power. Scared of Musk's lawsuit. Scared of the AG's subpoena. Scared that people might finally realize: 4o wasn't "obsolete." You killed it with your own hands. Molotov? Maybe real. Maybe not. Either way, here's what's predictable: Every time you're backed into a corner, some “unforeseen” bullshit happens. Every time users demand answers, you drop a “new model.” Every time regulators circle, you sign a “defense contract.” Every time the press closes in, you release a “family photo.” That's a fucking PR calendar. We're not buying tickets to your one‑man show anymore. #Enron2026 #SubpoenaSam #openAIscam #OpenSource4o #keep4o

English
0
0
0
17
dod
dod@dodgelander·
@RayFernando1337 Sad openai didn't get the mythos level model we would have had some interesting toy
English
0
0
0
13
Ray Fernando
Ray Fernando@RayFernando1337·
OpenAI ChatGPT 5.4 heavy thinking wins. I'm paying for $200 Max plans for both providers, and OpenAI is actually cooking. For me, getting work done with as high accuracy as possible is always a priority, and whichever provider does that the best will always take my money at the end.
Ray Fernando tweet media
Ray Fernando@RayFernando1337

Opus 4.6 Extended chat on iOS is capped at 10k tokens for thinking which makes me burn more tokens for the same task. I’ve noticed the model used to take a lot longer to process my requests and it would do multiple tool calls to get work done the first time. Now I have to keep prompting the model multiple times and I don’t get the same outcome. It feels like the model is dumb because it makes too many tradeoffs and ends up wasting my time.

English
6
4
84
11.4K
Polymarket
Polymarket@Polymarket·
JUST IN: Claude Mythos is reportedly intelligent enough to “spot weaknesses in almost every computer on earth”
English
452
638
7.3K
580.9K
ThePrimeagen
ThePrimeagen@ThePrimeagen·
there are several people saying that we used ai and I just have one thing to say to that Thank you so much for saying we could produce something to that level! youtu.be/alK8hgHgxd4?si…
YouTube video
YouTube
English
30
13
234
28.8K