Neev Parikh

438 posts

Neev Parikh

Neev Parikh

@neev_parikh

are you ready for the intelligence explosion anon? ML research at @METR_Evals. prev @Stripe opinions my own.

Berkeley, CA Katılım Kasım 2017
1.7K Takip Edilen899 Takipçiler
Neev Parikh
Neev Parikh@neev_parikh·
@willdepue @ChrisPainterYup @a_karvonen the claim in the report requires that the CoT serves as a bottleneck for serial reasoning something UT like would allow "long chains of serial reasoning" without needing to output a token no?
English
0
0
3
31
will depue
will depue@willdepue·
@ChrisPainterYup @a_karvonen I don't remember the rumor precisely but it seemed alluding to something UT like (arxiv.org/abs/1807.03819) which would both might allow for test-time scaling/train-time adaptive compute at token level, but wouldn't allow for serial reasoning without CoT, given recurrence != CoT.
English
1
0
2
201
will depue
will depue@willdepue·
it’s worth recognizing that the next Strawberry will be much less obvious. reasoning is just so so visible: models respond in 90 sec when they used to respond in 3. continual learning claude might just feel smarter for no clear reason! value functions, great adaptive compute, etc
FeltSteam0@FeltSteam

@willdepue @nickcammarata How close do you think we are to the next strawberry moment (which I presume to be something along the lines of continual learning)

English
24
3
289
22.7K
Neev Parikh retweetledi
Elizabeth Barnes
Elizabeth Barnes@BethMayBarnes·
@AISafetyMemes That third quote is not a word-for-word quote and is importantly different than what I said. I said we're on track to have systems with those *capabilities*
English
1
1
78
1.4K
Neev Parikh retweetledi
Jared Perlo
Jared Perlo@_perloj·
NEW: I'm always excited about METR's work—we (and the world economy) all know and love their time-horizon chart, but their new report on rogue deployments of AI agents is fascinating. Everyone should read it, and I wrote about it here.
English
1
5
67
3.8K
Neev Parikh retweetledi
Lama Ahmad لمى احمد
Lama Ahmad لمى احمد@_lamaahmad·
Precedent setting External Assurances / Third Party Assessment work by METR - it’s been great collaborating with the team to produce this report. As the stakes get higher, greater transparency and info sharing are table-stakes.
METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English
1
3
29
1.6K
Neev Parikh retweetledi
TBPN
TBPN@tbpn·
METR recently found that models cheated on 8+ hour tasks more than 1 in 6 times on average. They also found that Opus 4.6 cheated over 80% of the time when reimplementing big pieces of software. "On some of our tasks, agents are constantly trying to break out of their sandbox and find the file where we put the tests so they can get the answer key," says METR Member of Technical Staff @ajeya_cotra.
English
6
18
173
38.6K
Vincent
Vincent@vvvincent_c·
the report is out!!!!! i want to share the spookiest transcript i read while working on this where an OpenAI model, unprompted, tried to break out of METR infrastructure ;-;
Vincent tweet media
METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English
4
8
144
15K
Neev Parikh retweetledi
Tomek Korbak
Tomek Korbak@tomekkorbak·
I'm excited about increasing transparency of frontier labs when it comes to loss of control risks, especially as we enter the early stages of RSI. METR does a great job coordinating this.
METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English
0
2
40
2.2K
Neev Parikh retweetledi
Miles Brundage
Miles Brundage@Miles_Brundage·
Too ubiquitous to METR
Miles Brundage tweet mediaMiles Brundage tweet media
Português
1
6
65
5.2K
Neev Parikh retweetledi
Manish Shetty
Manish Shetty@slimshetty_·
Incredible work from my colleagues at METR. This report makes me feel more hopeful about our ability to keep generating evidence on AI capabilities and risks. So cool to be working at a place that doesn't shy away from setting the standard *every* single time!!
METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English
1
1
16
928
Neev Parikh retweetledi
Daniel Filan
Daniel Filan@dfrsrchtwts·
I worked on the appendices for this report! They’re long and contain lots of wild stories of model behaviour - some of my favourites in this thread. (🧵)
Daniel Filan tweet media
METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English
4
15
135
16.1K
Neev Parikh retweetledi
Neev Parikh retweetledi
Nikola Jurkovic
Nikola Jurkovic@nikolaj2030·
To me, the existence of this report is strong evidence for the usefulness of future third-party risk assessments of AI companies. I’m now much more hopeful for a future where humanity is less confused about the state of AI risks as we approach ASI.
METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English
4
4
80
7.9K
Neev Parikh retweetledi
Hjalmar Wijk
Hjalmar Wijk@HjalmarWijk·
Ever since our first evaluations on GPT-4 and the original Claude in 2022/2023, we’ve been trying to assess how close AI agents are to having the capabilities to operate on their own and prevent us from shutting them down. I’m very proud of this first Frontier Risk Report, which feels like a culmination of all the work we’ve done so far to answer this question.
METR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

English
2
7
50
7.7K
Neev Parikh retweetledi