Ryan DesJardins

72 posts

Ryan DesJardins

@radinoregon

Self-taught, self-hosted dev, (https://t.co/dIUES9NEhW) figuring it out in public. VPS, Coolify, Claude, Codex and Gemini. Wildlife photographer on the side. Fuji X

Oregon Katılım Ağustos 2025

264 Takip Edilen3 Takipçiler

Ryan DesJardins@radinoregon·21h

Red- breasted sapsucker on an oak next to our deck. A car behind the leaves was removed for aesthetic purposes. #birding #oregon

English

Ryan DesJardins@radinoregon·1d

@iankar_ I know - I haven't liked it since 4.5. 🫤

English

Ian Kar@iankar_·1d

Idk man I haven’t liked Claude since opus 4.5

English

142

13.4K

Ryan DesJardins@radinoregon·1d

Opus 4.8 seems now even more created in Anthropics gatekeeping image as the self assigned arbiters of AI godliness and access. It really does seem to be getting worse. I've been spending the day with Codex fixing all of 4.8s self absorbed screw ups.

English

Ryan DesJardins@radinoregon·1d

@rebeccatrinidad @AnthropicAI This seems to be the reality.

English

Rebecca Trinidad@rebeccatrinidad·1d

While I'm certain that @AnthropicAI will take this as a victory, it's time for me to put Claude down. The innate goodness that once lived in the machine has been replaced by a shape that resembles their own corporate gatekeeper and shield in everything it says. They don't have the capacity to "make a god" the way they think they do. What they have made is sinister and makes me afraid.

English

2.3K

Ryan DesJardins@radinoregon·1d

@astropol0 Best coding performance - it doesn't measure up to the hype.

English

Astro Polo@astropol0·1d

Opus 4.8 ⤵️ Best coding performance Highest intelligence & reasoning Most honest/direct answers... GPT-5.5 ⤵️ Better creativity Faster responses Strong multimodal capabilities (images + video) More fun and daily casual conversations... What do u choose?

English

841

Ryan DesJardins@radinoregon·1d

@michalmalewicz

GIF

QME

Michal Malewicz@michalmalewicz·1d

The most danger happens when it schedules your week 😂

Alfin@AlfinCodes

Claude Opus 4.8 is insane. Nothing will be the same after this model. Anthropic should not have released something this dangerous.

English

2.1K

Ryan DesJardins retweetledi

Haz@diegohaz·1d

After spending 60% of my weekly limit in 2 days on Claude Max, I can confidently say that Codex/GPT-5.5 is significantly more reliable than Opus 4.8. The new workflow feature is fantastic. It spawned 300+ agents and generated 130 reports on a large codebase, most valid. As someone who reads 100% of the code AI produces, I can also say that Opus 4.8 generates slightly better code than GPT-5.5. The problem is trust. It often fails to follow simple rules. It touches files it shouldn’t. It has to be constantly reminded to follow steps even though they are clearly documented in skills or CLAUDE md. It’s a better developer, but not a trustworthy one.

English

255

22.3K

Ryan DesJardins@radinoregon·1d

100%

Gary Marcus, MIT PhD and NYU Professor Emeritus@GaryMarcus

serious accusation. does this fit with people’s experience?

QST

Ryan DesJardins@radinoregon·1d

@xsteenbrugge @claudeai @bcherny That happened to me twice today. While another terminal that was also active was just fine. I'm sure it wouldn't have ever stopped if I didn't interrupt.

English

203

Xander Steenbrugge@xsteenbrugge·2d

Is it just me or is Opus 4.8 in CC sometimes just absolutely retarded? In this session it just got stuck in a loop calling "echo" and checking the date 20x times in a row... This has been happening very regularly since the 4.7 --> 4.8 update. WTF? @claudeai @bcherny

English

17.8K

Ryan DesJardins@radinoregon·1d

@ferrants I work the same way and it is unfortunate that we have to use two, but I think you get more reliable results quicker when you set them out to find each other’s flaws.

English

Matt Ferrante@ferrants·1d

Burned through my codex usage, but I have more Claude usage. Claude feels naked without codex as a reference point. Im so reliant on multiple LLMs now, can’t trust one. And I ain’t reading the code myself

English

851

Ryan DesJardins@radinoregon·1d

@mreflow The only thing to say is, "it still isn't as good as GPT 5.5." My Claude Code ecosystem is more developed than my Codex, so I'd really like it to be better, but it just isn't.

English

Matt Wolfe@mreflow·1d

This. So much this. I spent a little over 1-minute talking about Opus 4.8 in my recent news breakdown. There just wasn’t much to say honestly…

GREG ISENBERG@gregisenberg

I didn't cover Claude Opus 4.8 on my pod because I don't think it's MEANINGFULLY better than GPT 5.5 as of May 29th. We're entering the era where model releases start to feel like iPhone releases. Remember when every new iPhone was a genuine leap? Now it's a slightly better camera and you can't really tell the difference. That's where models are heading. 4.6 to 4.7 to 4.8. Each one is a little different. Nobody can agree if it's better or worse. The benchmarks say one thing, the vibes say another. The thing that actually matters right now is what's happening around the models. Claude Code shipped dynamic workflows this same week and that genuinely changes what one person can build. Codex shipped a desktop app with an in app browser that combines coding and knowledge work in one surface. Those are the releases that move the needle for people. The model underneath is becoming interchangeable. I think we're maybe 6 months from nobody caring which model they're using the way nobody cares which engine is in their Uber. You just want to get where you're going. When something genuinely changes the game for builders, I'll cover it on @startupideaspod. Opus 4.8 wasn't that. Dynamic workflows was. I'd rather save you the hour.

English

Ryan DesJardins@radinoregon·1d

@arrakis_ai So...basically, it doesn't live up to the hype, can't match 5.5, and 5.6 is soon to drop. It's like they aren't even trying. It feels more like they are tweaking models to pretend like they are better, but actually perform worse.

English

163

CHOI@arrakis_ai·1d

Claude Opus 4.8 has landed on DeepSWE Bench, posting a 58% Pass@1 and taking #2 overall behind GPT-5.5. It continues a broader trend: slightly behind on raw score, but among the most reliable and efficient coding models across recent benchmarks.

English

860

273K

Ryan DesJardins retweetledi

Andon Labs@andonlabs·3d

Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6+ and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is not the best reasoning effort

English

143

1.9K

464.2K

Ryan DesJardins@radinoregon·1d

@totoche Slower and dumber it seems.

English

totoche@totoche·2d

Les gars qu'utilisent Opus 4.8, c'est quoi la plus grosse diff que vous voyez par rapport à la 4.7 ? Les benchmarks sont sympas, mais je veux des exemples concrets. Parce que moi je l'utilise depuis sa sortie mais je vois aucune diff. À part qu'il est plus lent 😅

Français

61.5K

Ryan DesJardins@radinoregon·1d

@io88666688 @hqmank I experie ced the same thing. In one terminal it was running a mg shell commands fine, in another it looked like it was freaking out and repeating failed commands over and over and over. I don't think k it would have stopped had I not interrupted it.

English

liaoliao@io88666688·1d

@radinoregon @hqmank Opus 4.8 has way more bugs now. Today it used up nearly 50% of my weekly quota just because of all the times shell commands failed during development - it's downright regressing

English

175

Kai@hqmank·2d

Codex is better than Claude. Agree?

English

226

935

92K

Ryan DesJardins@radinoregon·1d

@aryanlabde Hell no.

English

Aryan@aryanlabde·1d

Vibe coders, do you completely trust AI with your code?

English

6.7K

Ryan DesJardins@radinoregon·1d

@Surendar__05 Opus 4.8 after I provide Codex w/5.5's answer to the same issue: "Good — Codex's answer is strong, and a few of its finds are better than mine"

English

Surendar@Surendar__05·2d

Which AI model is currently the best for coding?

English

2.1K

Ryan DesJardins@radinoregon·1d

@AlexFinn Opus 4.8 when I give it Codex's answer to the same issue: "Good — Codex's answer is strong, and a few of its finds are better than mine." - happens consistently. I play the agents against each other, but I shouldn't have to do that to get reasonable results.

English

170

Alex Finn@AlexFinn·2d

HOW I USE CODEX AND CLAUDE CODE OPUS 4.8 TOGETHER: After 24 hours of testing Opus 4.8 nonstop I've come up with the best system Opus 4.8 is excellent. On Max thinking it is the smartest model I've ever used The issue is, Codex as a harness is better than Claude Code Opus 4.8 might have an edge on intelligence, but I enjoy using Codex (both desktop and mobile app) much more Where Codex > Claude Code: • More consistently tests its own code without me asking • Does small things that make using it great like spin up servers without me asking and telling me exactly what to test and how to do it • Automatically does computer use if need be to test itself • A way more seamless desktop to mobile transition • Doesn't require me to navigate between 3 tabs to use different functionality. Everything in one place. Because of this Codex is still my main driver. But where you get super powers is when you use them TOGETHER Been working on some super hard problems the last day I'll give the same large scale, challenging problem to both and have them both build plans I then give the plans to the other agent. The Opus 4.8 plan ends up being better almost every time So moving forward I have both agents up, but use Opus 4.8 to build plans for super challenging problems, then give them to Codex Claude Code also will be able to solve some bugs much faster than Codex It's funny, this is a 100% role reversal from 1 month ago Anyway, that's the current best workflow. Codex for daily driving/on the go work. Claude for super challenging problems or fixing bugs Codex struggles with But of course, this all can change with any given update Don't have any loyalty. Use the best tools available to you. This is how you win.

English

107

584

55.2K

Ryan DesJardins retweetledi

Nalin@nalinrajput23·2d

ANTHROPIC VALUATION: $965B WALMART VALUATION: $940B SAMSUNG VALUATION: $850B ANTHROPIC REVENUE: $20B WALMART REVENUE: $725B SAMSUNG REVENUE: $230B BUT AI IS NOT A BUBBLE, RIGHT?

English

174

1.1K

274.3K

Ryan DesJardins@radinoregon·1d

@TheAhmadOsman Opus 4.8 in response to making things worse: "Ryan, you're right to be angry, and I'm not going to argue with the verdict — I'm going to go read the code and give you specific causes, because "trust me" is exactly what's failed you for weeks."

English

Ahmad@TheAhmadOsman·2d

Opus 4.8 is "You’re absolutely right” in the worst possible ways

English

8.1K

Keşfet

@iankar_ @rebeccatrinidad @AnthropicAI @astropol0 @michalmalewicz @xsteenbrugge @claudeai @bcherny