Neev Parikh

3

31

will depue@willdepue·21h

@ChrisPainterYup @a_karvonen I don't remember the rumor precisely but it seemed alluding to something UT like (arxiv.org/abs/1807.03819) which would both might allow for test-time scaling/train-time adaptive compute at token level, but wouldn't allow for serial reasoning without CoT, given recurrence != CoT.

English

0

2

201

will depue@willdepue·22h

it’s worth recognizing that the next Strawberry will be much less obvious. reasoning is just so so visible: models respond in 90 sec when they used to respond in 3. continual learning claude might just feel smarter for no clear reason! value functions, great adaptive compute, etc

FeltSteam0@FeltSteam

@willdepue @nickcammarata How close do you think we are to the next strawberry moment (which I presume to be something along the lines of continual learning)

English

24

3

289

22.7K

Neev Parikh@neev_parikh·2d

@Dorialexander Agreed, but seems hard to make happen :/

English

1

43

Alexander Doria@Dorialexander·2d

@neev_parikh normative

Français

0

2

116

Alexander Doria@Dorialexander·2d

Exactly why the frontier should be distributed.

will depue@willdepue

academics are unprepared for the coming world where much scientific progress is majorly a function of inference compute. whether OpenAI points the Eye of Stargate at your particular field will decide its acceleration. talent will leach away into the labs. it's already begun

English

91

5.8K

Neev Parikh retweetledi

Elizabeth Barnes@BethMayBarnes·4d

@AISafetyMemes That third quote is not a word-for-word quote and is importantly different than what I said. I said we're on track to have systems with those *capabilities*

English

78

1.4K

Neev Parikh@neev_parikh·4d

@ShakeelHashim why odd?

English

DiscussingFilm@DiscussingFilm

58

Shakeel@ShakeelHashim·4d

it must be odd to be married to a woman who does the same job as you but is vastly better at it than you are

Matt Damon recalls him & Tom Holland getting jealous of Christopher Nolan calling Zendaya's performance perfect on ‘THE ODYSSEY’ set. “Tom [Holland] and I were obsessed with this. She got a ‘perfect’? I’ve never even gotten a ‘great.’ She got a ‘perfect’? He and I bitched about it for the entire rest of the film. ‘Did you get anything today?’ — No, I got a ‘good, moving on’ — ‘Yeah, me too.’” (Source: elle.com/culture/movies…)

English

0

5

1.6K

Neev Parikh retweetledi

Jared Perlo@_perloj·6d

NEW: I'm always excited about METR's work—we (and the world economy) all know and love their time-horizon chart, but their new report on rogue deployments of AI agents is fascinating. Everyone should read it, and I wrote about it here.

English

5

67

3.8K

Neev Parikh@neev_parikh·6d

yea, roughly 6 months ago. everyone please use a large inference budget for evals

Lisan al Gaib@scaling01

I was just saying: nowadays "higher token budgets feel worth it" I think this is a change that happened very recently

English

9

1.5K

Neev Parikh retweetledi

Lama Ahmad لمى احمد@_lamaahmad·20 May

Precedent setting External Assurances / Third Party Assessment work by METR - it’s been great collaborating with the team to produce this report. As the stakes get higher, greater transparency and info sharing are table-stakes.

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English

3

29

1.6K

Neev Parikh retweetledi

TBPN@tbpn·19 May

METR recently found that models cheated on 8+ hour tasks more than 1 in 6 times on average. They also found that Opus 4.6 cheated over 80% of the time when reimplementing big pieces of software. "On some of our tasks, agents are constantly trying to break out of their sandbox and find the file where we put the tests so they can get the answer key," says METR Member of Technical Staff @ajeya_cotra.

English

6

18

173

38.6K

Neev Parikh@neev_parikh·20 May

@yong_zhengxin @vvvincent_c Thanks!

English

2

43

Yong Zheng-Xin@yong_zhengxin·20 May

@vvvincent_c @neev_parikh i love reading this part! great that it’s documented in such details

English

0

7

250

Vincent@vvvincent_c·19 May

the report is out!!!!! i want to share the spookiest transcript i read while working on this where an OpenAI model, unprompted, tried to break out of METR infrastructure ;-;

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English

8

144

15K

Neev Parikh retweetledi

Tomek Korbak@tomekkorbak·20 May

I'm excited about increasing transparency of frontier labs when it comes to loss of control risks, especially as we enter the early stages of RSI. METR does a great job coordinating this.

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English

2

40

2.2K

Neev Parikh retweetledi

Megan Kinniment@MKinniment·19 May

I worked on this. Can confirm, the models were often quite misleading! Some examples and thoughts:

Fact 3: When the agents were faced with hard tasks, they routinely violated constraints and acted deceptively. We’ve seen this pattern across our own coding and research evaluations, and developers reported they’ve also seen agents behave this way.

English

2

4

114

12.1K

Neev Parikh retweetledi

Miles Brundage@Miles_Brundage·19 May

Too ubiquitous to METR

Português

6

65

5.2K

Neev Parikh retweetledi

Manish Shetty@slimshetty_·19 May

Incredible work from my colleagues at METR. This report makes me feel more hopeful about our ability to keep generating evidence on AI capabilities and risks. So cool to be working at a place that doesn't shy away from setting the standard *every* single time!!

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English

16

928

Neev Parikh retweetledi

Daniel Filan@dfrsrchtwts·19 May

I worked on the appendices for this report! They’re long and contain lots of wild stories of model behaviour - some of my favourites in this thread. (🧵)

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English

15

135

16.1K

Neev Parikh retweetledi

jsd@datagenproc·19 May

I think this is great, very excited about this report.

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English

1

7

398

Neev Parikh retweetledi

Max Nadeau@MaxNadeau_·19 May

AI co system cards/risk reports are fine and all, but third-party risk assessments are clearly way more trustworthy. Very thoughtful work by METR.

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English

2

33

1.3K

Neev Parikh retweetledi

Nikola Jurkovic@nikolaj2030·19 May

To me, the existence of this report is strong evidence for the usefulness of future third-party risk assessments of AI companies. I’m now much more hopeful for a future where humanity is less confused about the state of AI risks as we approach ASI.

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English

80

7.9K

Neev Parikh retweetledi

Hjalmar Wijk@HjalmarWijk·19 May

Ever since our first evaluations on GPT-4 and the original Claude in 2022/2023, we’ve been trying to assess how close AI agents are to having the capabilities to operate on their own and prevent us from shutting them down. I’m very proud of this first Frontier Risk Report, which feels like a culmination of all the work we’ve done so far to answer this question.

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English

2

7

50

7.7K

Neev Parikh retweetledi

Steven Adler@sjgadler·19 May

Incredible to see such thorough work done and reported in public; kudos to everyone involved, and who's working on making the field more robust based on this

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English