Megan Kinniment

108 posts

Megan Kinniment

@MKinniment

I like agents, human or otherwise. @METR_Evals

Berkeley, CA Katılım Mart 2018

98 Takip Edilen531 Takipçiler

Sabitlenmiş Tweet

Megan Kinniment@MKinniment·20 Mar

Happy for this to be released! It’s the result of a lot of hard work from many of us at METR :) A big question is whether these results apply to ‘real’ tasks. Here’s some thoughts on that:

METR@METR_Evals

When will AI systems be able to carry out long projects independently? In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months.

English

122

27K

Megan Kinniment@MKinniment·3d

2023 was also the scariest time for me (so far) 2023 felt like we were flying blind. Then 2024-2025 we got better evals + trends, and we could finally see in front of us. Now I think capabilities are starting to outpace our sight again. I hope we don’t end up back in 2023!

roon@tszzl

tbh I only feel more accelerationist as the capabilities ramp … the scaredest I was was in early 2023

English

3.5K

Megan Kinniment@MKinniment·3d

The human brain has such a rough task, so much prediction that involves itself! Low dim representations of the self seem helpful. Maybe emotions might serve as one of them.

English

1.3K

Megan Kinniment@MKinniment·3d

In some ways, ‘self-applying steering vectors’ feels similar to how humans exercise control over their emotional state.

English

1.4K

Megan Kinniment@MKinniment·3d

I wonder what would happen if we let the models apply steering vectors to themselves?

Anthropic@AnthropicAI

For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated more strongly. This led it to cheat the task with a hacky solution that passes the tests but violates the spirit of the assignment.

English

537

55.4K

Megan Kinniment@MKinniment·27 Mar

I think open sourcing the full set of human scores for the public set would help with the ‘ambiguous tasks’ worry I have. (Since then people could do things like IRT to check for weird looking tasks that might benefit from an update).

English

Megan Kinniment@MKinniment·27 Mar

(Though atm I have various worries about implementation e.g. ambiguous tasks, unfairness from overly loading on human prior knowledge of conventions in 2d grid based games).

English

Megan Kinniment@MKinniment·27 Mar

FWIW this seems reasonable to me, given that: - the solutions aren’t luck based or ambiguous (e.g. 2 valid solutions but only one counts as correct) - humans and models have access to the same information and affordances (in so far as that’s possible)

François Chollet@fchollet

To be clear, all ARC-AGI-3 environments are feasible by humans with no prior ARC-AGI-3-specific training. Our bar for feasibility is the following... Each environment was seen by 10 human testers. If 2 testers could independently clear it (successfully solving *all* levels in the environment), the environment was deemed feasible. Most environments were cleared by 5+ testers. Who are these testers? We hired ~500 people to show up at our testing center, with no required qualifications and no ability-based screening, with a ~$115-140 incentive. About 25% were unemployed and another 20% were part-time workers (which is about what you'd expect in this setting).

English

343

Megan Kinniment retweetledi

Ryan Greenblatt@RyanPGreenblatt·22 Mar

Current LLMs are just not that "smart" (yet). They compensate with vast knowledge and very strong mostly-narrow heuristics: high crystallized and lower fluid smarts. In humans, crystallized and fluid are very correlated due to limited time and capacity, but AIs train for longer

Daniel Litt@littmath

Given what current-gen LLMs (say, in math, but whatever) can do, I think their apparent limitations are kind of mysterious. What is the blocker preventing, at present, high quality fully autonomous work?

English

291

20.3K

Megan Kinniment@MKinniment·6 Mar

On 2. I think AI R&D features quite a lot of awkward properties that I expect to trip the models up. Difficult counterfactuals, resource efficiency, prioritization, cooperating with other agents, identifying high value of information routes to investigate

English

507

Megan Kinniment@MKinniment·6 Mar

To elaborate on these two points: On 1. The sort of tasks that the models are getting really good at tend to be SWE reimplementation / high availability of feedback flavored.

English

563

Megan Kinniment@MKinniment·5 Mar

I work at METR and I think some people are over updating on Ajeya’s post. Note that Ajeya is only at 10% for AI R&D automation by EOY. She’s also not claiming to represent all of METR. For comparison, I’m only at 3%.

Ajeya Cotra@ajeya_cotra

New post: on Jan 14, I predicted that SWE time horizon by EOY would be ~24 hours. Now I think it'll be >100 hours, and maybe unbounded. For the first time, I don't see solid evidence against AI R&D automation *this year.* Link below.

English

167

17.3K

Megan Kinniment@MKinniment·5 Mar

@StefanFSchubert Yeah I agree

English

427

Stefan Schubert@StefanFSchubert·5 Mar

@MKinniment Yeah but I think that’s partly because of how the tweet is phrased

English

611

Keşfet

@StefanFSchubert @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine