Aldo Cortesi

6.6K posts

Aldo Cortesi banner
Aldo Cortesi

Aldo Cortesi

@cortesi

I make software, break software and make software that breaks software. https://t.co/j4p4bQfULO https://t.co/IGSRPMVEGm https://t.co/8DiGT9bMdJ https://t.co/Fh2Lq6leKN

Dunedin, NZ Katılım Şubat 2008
223 Takip Edilen3.4K Takipçiler
Sabitlenmiş Tweet
Aldo Cortesi
Aldo Cortesi@cortesi·
Announcing spacecurve, a space-filling curve library with a web + native interactive playground.
Aldo Cortesi tweet media
English
1
3
5
944
Aldo Cortesi
Aldo Cortesi@cortesi·
This can be completely explained by how the interaction is framed in terms of the training corpus and doesn't require any reasoning about model agency, consciousness or personality.
English
0
0
2
87
Aldo Cortesi
Aldo Cortesi@cortesi·
It's my firm belief that many people get sub-optimal results because they're rude or abusive to the models. In the age of AI, nicer people also produce better code.
English
1
0
4
141
Aldo Cortesi
Aldo Cortesi@cortesi·
@snesworld90 I mean, this is surely because of some nonsense you have in your system prompt or memory, right? Did this 10x, and all responses were reasonable.
Aldo Cortesi tweet media
English
6
1
10
1.9K
Aldo Cortesi
Aldo Cortesi@cortesi·
And here are the same graphs in term of wall-clock time. Interpret with caution because a) GPT got to reap the big wins early on, b) I stopped Claude 4.8 often in its early run for subjective code evals. I'd say this nets out to roughly the same progress slope.
Aldo Cortesi tweet media
English
0
1
1
233
Aldo Cortesi
Aldo Cortesi@cortesi·
More data in my ongoing Opus 4.8 vs GPT 5.5 task clearing runoff. Agents are doing a very large C++ to Rust port. The tasks are extracted unit tests from upstream that need to give the same result in our Rust type checker. Deep in diminishing returns now.
Aldo Cortesi tweet media
English
0
0
0
149
henosis
henosis@hen0s1s·
@cortesi Can you share your cache hit rate for each?
English
1
0
0
104
henosis
henosis@hen0s1s·
@cortesi with or without fast mode on each?
English
1
0
0
106
Aldo Cortesi
Aldo Cortesi@cortesi·
@Xxi5olc For this particular project, doing this particular piece of work, running only a single agent... yes, that appears to be the case.
English
0
0
0
27
Axi
Axi@Xxi5olc·
@cortesi Are you saying it’s intentionall impossible to deplete Claude’s weekly limit?
English
1
0
0
92
Aldo Cortesi
Aldo Cortesi@cortesi·
@antor D'oh. Of course. Let's just say I had some other things on my mind! :)
English
0
0
1
15
Aldo Cortesi
Aldo Cortesi@cortesi·
Early data on Opus 4.8. I switched a task queue for a complex project over from GPT 5.6. Case resolution progress slowed down... BUT the patches read very well and show taste - often including strong consolidation and code quality improvements.
Aldo Cortesi tweet media
English
2
1
2
428
Aldo Cortesi
Aldo Cortesi@cortesi·
The longer term of my Opus 4.8 comparison actually looks a bit more flattering. Clearing issues roughly inline with GPT 5.5, in a domain where all the big wins have been reaped.
Aldo Cortesi tweet media
English
0
0
5
4.4K
Aldo Cortesi
Aldo Cortesi@cortesi·
This is Claude trying to warn you that you're about to blow through your whole token budget in 30 minutes.
English
0
1
2
2.4K
Aldo Cortesi
Aldo Cortesi@cortesi·
This is all the more mysterious because the data to correct this generated right there during training...
English
0
1
0
73
Aldo Cortesi
Aldo Cortesi@cortesi·
One clear gap in today's coding models is their inability to estimate how long a task will take them. This has real consequences. I noticed Claude getting more and more unambitious on a large implementation plan after convincing itself it would take months to complete.
Aldo Cortesi tweet media
English
2
1
0
224
Aldo Cortesi
Aldo Cortesi@cortesi·
We urgently need to teach the boomers what this little symbol means.
Aldo Cortesi tweet media
English
0
0
0
120