tom cunningham

598 posts

tom cunningham

@testingham

Economics & AI @ @METR_Evals (ex-openai) https://t.co/FZobuYjdOc

San Francisco, CA Katılım Mart 2009

3K Takip Edilen9.6K Takipçiler

Sabitlenmiş Tweet

tom cunningham@testingham·6 Nis

AI agents are autonomously doing R&D, now what? @slimshetty_ and I give a formalization of the low-hanging-fruit metaphor & draw some implications: 1. Agents can make autonomous contributions without being full human substitutes. 2. You switch from agent-labor to human-labor as expenditure grows. 3. You can calibrate an agent's value by its human-equivalent time. (1/n)

English

125

36.7K

tom cunningham@testingham·2d

I think there are two things going on here: (1) subtle differences in value-measurement; (2) subtle differences in wording which elicit different human responses. Given that these are fairly delicate concepts I think the distinction between different definitions of value-uplift are likely swamped by the second point, that people have a hard time introspecting and so slight variations in wording evoke different system-1 responses.

English

Toby Ord@tobyordoxford·2d

@joel_bkr @whitfill_parker @testingham Yeah, that's interesting. I really would have expected the green and blue to be pretty similar (e.g. if it would take 2 non-AI months to deliver work equally valuable to 1 month with AI, it seems likely you are producing 2x as much value with AI per month).

English

Joel Becker@joel_bkr·3d

new research from me @METR_Evals: technical workers claim that today's AI impacts value of their work to an extraordinary degree (& growing over time). of course, self-reports plausibly overestimate. the magnitudes nonetheless strike me as remarkable. x.com/METR_Evals/sta…

METR@METR_Evals

We surveyed 349 technical researchers, engineers, and managers (in February–April 2026) about how they use AI tools at work. On average, participants self-report that AI use made their work 1.6–2.1x more valuable, and that this multiplier will grow over time.

English

16.1K

tom cunningham@testingham·5d

@kjw_chiu @alexolegimas OK that's a reasonable claim! though even that I think is not obviously true (e.g. a couple of recent papers estimated fairly small effects from AlphaFold).

English

Ken Chiu@kjw_chiu·5d

@testingham @alexolegimas I think Alex is defining "benefits" in a relative, rather than absolute way. "In science" would likely be a much smaller denominator than "in the broader economy".

English

Alex Imas@alexolegimas·5d

Great paper discussed by Cheryl below on the role of technology in science. Finds that, in the case of computer technology: -- Diffusion of gains from tech was much faster in science compared to in the general economy (i.e., productivity, growth numbers). -- Computers allowed for new research paths for scientists, paths that were previously bottlenecked by computational limits rather than ideas. In the case of AI, I think the first point will hold: we will see the benefits of AI faster in science than in the broader economy. This is already the case, e.g., AlphaFold. But I think we'll see something quite different in the second case. Computers were a general purpose tech, but intelligence was still fairly domain specific. AI is a general purpose *intelligence*. This allows scientists to access ideas and methods in domains potentially quite distant from their own. Research questions they might be interested in but didn't have the full understanding to pursue, or weren't trained in the methods, are now wide open. My own prediction is that we will see ambitious scientists tackle a much broader set of questions, broader collaboration across groups. The sort of specialization that has emerged as fields matured will start flattening again; we'll see much more generalists.

Cheryl Wu@cherylwoooo

Innovation unlocks new possibilities and new directions of research. I realized there’s a really cool Econ paper by Pedro and Franco that resonates my reflection on AI creating new jobs. They found that when computers were adopted to schools, researchers tend to revisit concepts in the past that were computationally to heavy for humans to do. One example Pedro gave us was weather prediction—the theory was there, but computer really helped make it happen. “The evidence is consistent with computers unlocking research paths bottlenecked by computation rather than by ideas.” Pedro presented at our Econ history lunch at @YaleEconomics. It was really impressive! Slightly outdated version of their paper: papers.ssrn.com/sol3/papers.cf…

English

196

28.2K

tom cunningham@testingham·5d

@alexolegimas science hasn't been short of fiascos either! just to be check -- you think the aggregate economic impact of AI as of today (including direct welfare from chatbot use) has been smaller than the impact on science?

English

187

Alex Imas@alexolegimas·5d

@testingham If there is one thing the social media fiasco has hopefully taught us is not to confuse adoption with welfare.

English

492

tom cunningham@testingham·5d

@herbiebradley @ChrisPainterYup interesting! would like to argue this sometime.

English

Herbie Bradley@herbiebradley·5d

IMO there are significant transaction cost reductions & efficiencies which happen downstream of many people having relatively high confidence governments will not see their messages and then, I think it's one of the most strongly anti-centralization technologies, because e2e means governments like the UK are pre-emptively discouraged from passing more censorious laws, or it makes it harder for other governments to implement social-credit style systems

English

Chris Painter@ChrisPainterYup·12 Nis

Talking about this today with people with more domain expertise than myself has made me feel like I was underestimating the implications of AI “burning through the old growth forest of cyber vulnerabilities” and the downsides of building an cyber-AI-proof world. In particular, it seems like our government’s ability to stop terrorists today might depend on it having backdoors and vulnerabilities that bad actors don’t. If the whole world becomes very-hardened-by-default, does this mean terrorists become much more effective than they were pre-AI? Also, lots of software exists at a point on a tradeoff curve between ease-of-use/access and security. If we have to harden everything, will the world generally become more restrictive e.g. will you need a background check to enter a hospital? I don’t know the answer to these questions and am very novice to these questions, just trying to take the implications of a world hardened against AI vulnerability detection seriously.

Chris Painter@ChrisPainterYup

I basically agree with your description of how we’ll view this phase in retrospect in the context of cybersecurity. I assume that it’s possible to burn through and fix all of the cyber vulnerabilities, and that we’ll come out the other side safer. I’d be curious if there’s anyone very sophisticated in the domain who disagrees with that, seems like a topic they must have studied for a long time?

English

6.9K

tom cunningham@testingham·5d

@herbiebradley @ChrisPainterYup What are the benefits? This is relative to iMessage , WhatsApp and signal chats encrypted in transit , but the platform provider can see them, and so us government can exert pressure to see them?

English

Herbie Bradley@herbiebradley·12 Nis

@ChrisPainterYup this reminds me of the end to end encryption arguments and though e2e has probably made it slightly harder to find terrorists, the 1st and 2nd order benefits to society have been enormous

English

303

tom cunningham@testingham·6d

@MaxNadeau_ @whitfill_parker oh yeah great quote thanks!

English

412

Max Nadeau@MaxNadeau_·6d

@testingham @whitfill_parker A surprisingly clean testimonial to this effect from Terence Tao

English

2.2K

tom cunningham@testingham·6d

@whitfill_parker and I have a new post formalizing the “Cadillac tasks” argument: If you estimate the productivity effect of AI by seeing how long it would've taken you to do your post-AI work in the pre-AI world, then you'll likely get a big overestimate. (🧵)

English

11.9K

tom cunningham@testingham·6d

@whitfill_parker (and on #1, my guess would be that AI is causing more substitution within ONET tasks than between tasks, which makes me think doing CES-adjustments on ONET tasks is insufficient)

English

325

tom cunningham@testingham·6d

Two more notes: 1. You can use a CES function to get lower and upper bounds on productivity impacts, given time-shares. But this will only work if you've cut tasks at the right granularity. E.g. if you use O*NET tasks you need to be sure that AI isn't differentially changing the relative time-price for specific tasks within each ONET-task. 2. There are many many other difficulties in measuring productivity from AI. Possibly equally important is people using AI for work that actually isn't valuable at all, in retrospect, i.e. it's a net negative (lemon tasks?).

English

363

tom cunningham retweetledi

Parker Whitfill@whitfill_parker·6d

New post on the difference between 3 notions of productivity gain from AI (AKA uplift). Uplift on old tasks (AI-speedup on tasks you do in avg 2022 day) Uplift on new tasks (AI-speedup on tasks you do in avg 2026 day) Uplift in value (AI increasing your goals accomplished)

English

123

28.7K

tom cunningham retweetledi

Håvard Ihle@htihle·8 May

A benchmark is a sensor! Each one has a window of capabilities where it can actually distinguish models. The sensitivity curve tells you how precisely it measures the underlying capability. Model based on the Epoch Capability Index @EpochAIResearch, see thread for blog link.

English

13.5K

tom cunningham@testingham·8 May

@htihle so cool!

English

Håvard Ihle@htihle·8 May

@testingham an attempt to formulate what we were chatting about a while back

Håvard Ihle@htihle

English

223

tom cunningham@testingham·5 May

@orgRem Most technical treatment define RSI as something stronger than just positive spillovers , either fully autonomous improvement , or exponential growth in outputs with constant inputs.

English

223

Rem Koning@orgRem·4 May

Are there other clean examples of recursive self improvement in technology? E.g., I am assuming some of Moore's law is that better computers helped us build better computers. Anything else?

Jack Clark@jackclarkSF

I've spent the past few weeks reading 100s of public data sources about AI development. I now believe that recursive self-improvement has a 60% chance of happening by the end of 2028. In other words, AI systems might soon be capable of building themselves.

English

3.2K

tom cunningham@testingham·5 May

@justanotherlaw @METR_Evals @BethMayBarnes Sorry we didn’t talk more Lawrence! Hope we can in the future.

English

253

Lawrence Chan@justanotherlaw·4 May

In 2022, I joined what was then ARC Evals. Last Friday, I wrapped up at @METR_Evals. METR has done some of the most important work in AI; I'm grateful to @BethMayBarnes and others for letting me be part of it. I'll be taking time to write, reflect, and think. More to come soon!

English

134

5.1K

tom cunningham@testingham·4 May

I don't think it's the case that AI R&D is bottlenecked on experiment compute right now -- the AI is too dumb to make much progress. Right now we have perhaps a 50-50 allocation between researcher salaries and experiment compute. If we could get researchers for free then (1) we'll definitely get a pretty significant speedup; (2) it's plausible that the AI researchers will be smart enough to not need to run so many experiments.

English

Alex Petropoulos 🤠@AlexTPet·4 May

@FutureEconJacob @jackclarkSF @testingham I think we are far away from that. ie compute will continue to be the bottleneck for many more years (especially if we keep scaling)

English

Jack Clark@jackclarkSF·4 May

English

289

495

3.5K

1.6M

tom cunningham retweetledi

Ben Snodin@bsnodin·1 May

1/ New blog post where I try to figure out what makes tasks easy vs hard for AI agents, using @METR_Evals time horizon data. Short version: I didn't find much. My study has flaws, but still, this makes me think it's hard to find very simple descriptions of the capabilities spike

English

4.7K

Keşfet

@joel_bkr @whitfill_parker @METR_Evals @kjw_chiu @alexolegimas @herbiebradley @ChrisPainterYup @MaxNadeau_