Rishi Mehta

281 posts

Rishi Mehta banner
Rishi Mehta

Rishi Mehta

@rishicomplex

Solve i̶n̶t̶e̶l̶l̶i̶g̶e̶n̶c̶e̶ ̶ coding, use it to solve everything else | Research @AnthropicAI | Past: RL @GoogleDeepmind: AlphaProof co-lead, Gemini.

San Francisco, CA Katılım Temmuz 2009
335 Takip Edilen3.5K Takipçiler
Andrej Karpathy
Andrej Karpathy@karpathy·
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
English
7.9K
11.1K
149K
27.1M
Rishi Mehta
Rishi Mehta@rishicomplex·
what's up with the X algorithm it's just cycling the same 10 posts for me
English
0
0
1
218
Rishi Mehta retweetledi
Nat McAleese
Nat McAleese@__nmca__·
at long last we have built and chosen not to release the zero-day machine from the classic sci-fi tale “please do not release the zero-day machine”
Nat McAleese tweet media
English
17
153
3K
127.4K
Rishi Mehta
Rishi Mehta@rishicomplex·
@sideboared @FakePsyho In the case of humans, per the quote in the paper it appears they can reset the action count
English
0
0
0
32
Flip Fox 🦊
Flip Fox 🦊@sideboared·
@rishicomplex @FakePsyho You can read the models' play logs on the site. They can and do hit reset when they feel they need to. Don't know if it affects their score different from any other move though.
English
1
0
1
34
Psyho
Psyho@FakePsyho·
AI (or any human) will never get 100% in ARC-AGI-3 Let me introduce you to the worst game mechanic you can find in a puzzle game: fog of war At the start, if you go right instead of bottom, you're wasting many moves. Your score on this level literally depends on a conflip!
Psyho tweet media
English
67
24
533
75.4K
Rishi Mehta
Rishi Mehta@rishicomplex·
@andreasorob @fchollet In the case of the human participants, from the quote in the paper it appears they can reset the action count in the middle of a game, which the AI can't do
English
0
0
2
120
Andreas Robinson
Andreas Robinson@andreasorob·
@rishicomplex @fchollet Yes, the AI is also allowed to reset the level (neither can reset the game): "Competition Mode... Only Level Resets are premitted..." #competition-mode" target="_blank" rel="nofollow noopener">github.com/arcprize/arc-a…
English
1
0
2
131
Rishi Mehta
Rishi Mehta@rishicomplex·
@fchollet according to your paper: "Participants were limited to a single attempt per environment and could not revisit previously completed levels. However, they were allowed to reset the current level at any time. In some cases, participants reset levels after reaching a solution in order to improve efficiency, though this typically increased total interaction time." So humans could play around with the task a bunch, and then just reset the game when they figured it out to get the optimal trajectory? Is AI allowed to do this?
François Chollet@fchollet

ARC-AGI-3 is out now! We've designed the benchmark to evaluate agentic intelligence via interactive reasoning environments. Beating ARC-AGI-3 will be achieved when an AI system matches or exceeds human-level action efficiency on all environments, upon seeing them for the first time. We've done extensive human testing that shows 100% of these environments are solvable by humans, upon first contact, with no prior training and no instructions. Meanwhile, all frontier AI reasoning models do under 1% at this time.

English
1
1
25
2.6K
Rishi Mehta
Rishi Mehta@rishicomplex·
@RyanPGreenblatt Possibly not because it looks like they cheated by giving humans infinite retries x.com/i/status/20373…
Rishi Mehta@rishicomplex

@fchollet according to your paper: "Participants were limited to a single attempt per environment and could not revisit previously completed levels. However, they were allowed to reset the current level at any time. In some cases, participants reset levels after reaching a solution in order to improve efficiency, though this typically increased total interaction time." So humans could play around with the task a bunch, and then just reset the game when they figured it out to get the optimal trajectory? Is AI allowed to do this?

English
1
0
5
697
Ryan Greenblatt
Ryan Greenblatt@RyanPGreenblatt·
I wish they published the performance for each human baseliner rather than just the performance of the second best human run on each task. My current guess is that the median human baseliner would score around ~15% on the metric but we can't check because the data isn't public!
ARC Prize@arcprize

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

English
10
5
165
17.9K
Max Schwarzer
Max Schwarzer@max_a_schwarzer·
I've decided to leave OpenAI. I'm incredibly proud of all the work I've been part of here, from helping create the reasoning paradigm with @MillionInt, scaling up test-time compute with @polynoamial, working on RL algorithms with my fellow strawberries, shipping o1-preview (which started life as of one of my derisking runs), to post-training o1 and o3 with @ericmitchellai, @yanndubs and many others. I'm most proud of having led the post-training team here for the last year -- the team has done incredible work and shipped some really smart models, including GPT-5, 5.1, 5.2, and 5.3-Codex. OpenAI has genuinely some of the most talented researchers I have ever met, and I have learned more than I could have imagined knowing since I joined as a new grad. I want to thank @markchen90 @FidjiSimo @sama @merettm for all their support over my time here, and too many collaborators to name for the insights, ideas, and just plain fun we have had working together. After leading post-training for a year, though, I'm longing to start fresh and return to IC research work. I've been thinking about going back to technical research for quite some time, and I genuinely believe my colleagues and team here are set up to succeed going forward without me. I'm personally very excited for my next chapter -- I'm proud to be joining @AnthropicAI to get back into the weeds in RL research, and I'm looking forward supporting my friends there at this important time. Many of people I most trust and respect have joined Anthropic over the last couple of years, and I'm excited to work with them again. I have also been very impressed with Anthropic's talent, research taste and values, and I'm excited to be part of what the company does next!
English
605
1.2K
21.1K
3.2M
xPosed
xPosed@delam25·
@rishicomplex Ironic that your last post expressed concern about China gaining an edge
English
1
0
0
109