gavin leech (Non-Reasoning)

12.3K posts

gavin leech (Non-Reasoning) banner
gavin leech (Non-Reasoning)

gavin leech (Non-Reasoning)

@gleech

context maximiser @ArbResearch

UK شامل ہوئے Haziran 2019
608 فالونگ10.3K فالوورز
gavin leech (Non-Reasoning) ری ٹویٹ کیا
Greg Burnham
Greg Burnham@GregHBurnham·
More MirrorCode thoughts. A big caveat is that dev tasks don't come with a blackbox implementation. But how much do agents need this? I see at least 1411 calls to gotree in Opus 4.6's successful run. That's a lot, but not *so* much more than what you'd ask of a product manager.
Epoch AI@EpochAIResearch

What are the largest software engineering tasks AI can perform? In our new benchmark, MirrorCode, Claude Opus 4.6 reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks. Co-developed with @METR_Evals. Details in thread.

English
2
1
4
2.1K
gavin leech (Non-Reasoning)
@justanotherlaw I have muted 6000 accounts and now it's great. I think being on Twitter gets me about a year ahead on certain matters like hypothesis crystallisation (personas, linear representations, the evals crisis. max scaffold capabilities, ...)
English
2
0
36
637
Lawrence Chan
Lawrence Chan@justanotherlaw·
Can someone... defend the merits of Twitter to me? I feel like every time I come on, I see people that I know to be reasonable and thoughtful people in real life espouse incredibly simplistic (arguably deranged) takes. It seems _something_ about this site is causing this.
English
10
0
23
2.7K
gavin leech (Non-Reasoning)
@aliceisplaying Treadmill is one thing but I think "learning the limits, realising it still can't do things you thought it could" is the bigger morale effect after like 6 weeks post-launch
English
0
0
4
135
gavin leech (Non-Reasoning) ری ٹویٹ کیا
alice
alice@aliceisplaying·
re claude getting worse: there is definitely a hedonic treadmill with SOTA models and i think this creates a perception issue. on top of that ant tweaking the default effort and adding adaptive thinking didn't help either even though i get it, they don't have the compute
English
4
2
36
1.9K
Jack Crawford
Jack Crawford@jackcrawford__·
one of the most overrated games. brutally mogged since birth by Go which existed over a thousand years beforehand. now brutally mogged in different ways by countless video games
rob🏴@rob_mcrobberson

chess is hilarious because its like a bunch of gamers got together and convinced the world that *their* game is “intellectual” and totally different than other games and its not the same as like spending hours a day playing candy crush or something

English
10
6
89
5.8K
gavin leech (Non-Reasoning) ری ٹویٹ کیا
James Medlock
James Medlock@jdcmedlock·
My nephews (8-12 y/o) are obsessed with computer games but they only have Chromebooks lent to them by school. All the game sites have been blocked, but they have access to Gemini and realized they could vibecode their own custom platformer games.
English
15
30
1.4K
51.4K
gavin leech (Non-Reasoning) ری ٹویٹ کیا
Davis Brown
Davis Brown@davisbrownr·
In new work, we find that cheating on model capability evaluations is rampant. For example, the top 3 Terminal-Bench 2 submissions all cheat, usually by sneaking the correct answer to the model. Blog linked below.
Davis Brown tweet media
English
4
11
75
8.7K
madeofmistake
madeofmistake@madeofmistak3·
what's a word/expression that was fabulously offensive to say hundreds of years ago but now is completely benign?
English
20
0
25
2.3K
interstice
interstice@an_interstice·
@jackcrawford__ I feel like the verdict is still out, can we even *know* now that there are video games that remain compelling at similar strategic depth? no videogame has yet had such cumulative effort applied to it
English
3
0
6
692
gavin leech (Non-Reasoning) ری ٹویٹ کیا
Nate Soares ⏹️
If you start killing in the name of a cause, you make leaders feel like cowards caving to terrorists if they support that cause. Screw that. Those signing a treaty to stop the AI race would be heroes saving the world, and should feel like it. Cut out this violence shit.
English
21
27
354
12.1K
Sneedle
Sneedle@SRamirez68083·
@teodorio I feel like everything David Foster Wallace said is undermined by the fact he committed suicide, by this I'm not trying to make a moral judgment against people who commit suicide, but in his specific case it really does seem to be an act of pure incongruence
English
3
1
5
368