Aldo Cortesi

6.6K posts

Aldo Cortesi

@cortesi

I make software, break software and make software that breaks software. https://t.co/j4p4bQfULO https://t.co/IGSRPMVEGm https://t.co/8DiGT9bMdJ https://t.co/Fh2Lq6leKN

Dunedin, NZ Katılım Şubat 2008

223 Takip Edilen3.4K Takipçiler

Sabitlenmiş Tweet

Aldo Cortesi@cortesi·28 Oca

Announcing spacecurve, a space-filling curve library with a web + native interactive playground.

English

944

Aldo Cortesi@cortesi·1d

x.com/cortesi/status…

Aldo Cortesi@cortesi

And here are the same graphs in term of wall-clock time. Interpret with caution because a) GPT got to reap the big wins early on, b) I stopped Claude 4.8 often in its early run for subjective code evals. I'd say this nets out to roughly the same progress slope.

ZXX

118

Aldo Cortesi@cortesi·1d

One fascinating finding. Both agents are on their respective $200 tiers. Opus is using about 1/10th of the weekly quota per day on max thinking, while GPT 5.5 is using about 1/4 weekly quota per day on xhigh. Unexpected.

Aldo Cortesi@cortesi

The longer term of my Opus 4.8 comparison actually looks a bit more flattering. Clearing issues roughly inline with GPT 5.5, in a domain where all the big wins have been reaped.

English

3.7K

Aldo Cortesi@cortesi·1d

x.com/cortesi/status…

ZXX

Aldo Cortesi@cortesi·1d

This can be completely explained by how the interaction is framed in terms of the training corpus and doesn't require any reasoning about model agency, consciousness or personality.

English

Aldo Cortesi@cortesi·1d

It's my firm belief that many people get sub-optimal results because they're rude or abusive to the models. In the age of AI, nicer people also produce better code.

English

141

Aldo Cortesi@cortesi·1d

@snesworld90 I mean, this is surely because of some nonsense you have in your system prompt or memory, right? Did this 10x, and all responses were reasonable.

English

1.9K

Brian Phaze@snesworld90·1d

Ladies and Gentlemen, a band new "high-end" AI model in mid-2026. Pathetic. #KeepSonnet45 #SaveSonnet45 #StopAIPaternalism #ClaudeAI

English

272

15.2K

Aldo Cortesi@cortesi·1d

English

233

Aldo Cortesi@cortesi·1d

More data in my ongoing Opus 4.8 vs GPT 5.5 task clearing runoff. Agents are doing a very large C++ to Rust port. The tasks are extracted unit tests from upstream that need to give the same result in our Rust type checker. Deep in diminishing returns now.

English

149

Aldo Cortesi@cortesi·1d

@hen0s1s

QME

henosis@hen0s1s·1d

@cortesi Can you share your cache hit rate for each?

English

104

Aldo Cortesi@cortesi·1d

@hen0s1s No fast mode.

English

henosis@hen0s1s·1d

@cortesi with or without fast mode on each?

English

106

Aldo Cortesi@cortesi·1d

@Xxi5olc For this particular project, doing this particular piece of work, running only a single agent... yes, that appears to be the case.

English

Axi@Xxi5olc·1d

@cortesi Are you saying it’s intentionall impossible to deplete Claude’s weekly limit?

English

Aldo Cortesi@cortesi·1d

@antor D'oh. Of course. Let's just say I had some other things on my mind! :)

English

Andrés Miguel Torrubia Sáez@antor·1d

@cortesi you mean 5.5 right? RIGHT? 😂

English

Aldo Cortesi@cortesi·2d

Early data on Opus 4.8. I switched a task queue for a complex project over from GPT 5.6. Case resolution progress slowed down... BUT the patches read very well and show taste - often including strong consolidation and code quality improvements.

English

428

Aldo Cortesi@cortesi·1d

The longer term of my Opus 4.8 comparison actually looks a bit more flattering. Clearing issues roughly inline with GPT 5.5, in a domain where all the big wins have been reaped.

English

4.4K

Aldo Cortesi@cortesi·2d

This is Claude trying to warn you that you're about to blow through your whole token budget in 30 minutes.

English

2.4K

Aldo Cortesi@cortesi·4d

This is all the more mysterious because the data to correct this generated right there during training...

English

Aldo Cortesi@cortesi·4d

One clear gap in today's coding models is their inability to estimate how long a task will take them. This has real consequences. I noticed Claude getting more and more unambitious on a large implementation plan after convincing itself it would take months to complete.

English

224

Aldo Cortesi@cortesi·5d

We urgently need to teach the boomers what this little symbol means.

English

120

Keşfet

@snesworld90 @hen0s1s @Xxi5olc @antor @elonmusk @BarackObama @taylorswift13 @cristiano