Rishi Mehta

275 posts

Rishi Mehta

@rishicomplex

Solve i̶n̶t̶e̶l̶l̶i̶g̶e̶n̶c̶e̶ ̶ coding, use it to solve everything else | Research @AnthropicAI | Past: RL @GoogleDeepmind: AlphaProof co-lead, Gemini.

San Francisco, CA Inscrit le Temmuz 2009

337 Abonnements3.5K Abonnés

Rishi Mehta@rishicomplex·4d

@sideboared @FakePsyho In the case of humans, per the quote in the paper it appears they can reset the action count

English

Flip Fox 🦊@sideboared·4d

@rishicomplex @FakePsyho You can read the models' play logs on the site. They can and do hit reset when they feel they need to. Don't know if it affects their score different from any other move though.

English

Psyho@FakePsyho·4d

AI (or any human) will never get 100% in ARC-AGI-3 Let me introduce you to the worst game mechanic you can find in a puzzle game: fog of war At the start, if you go right instead of bottom, you're wasting many moves. Your score on this level literally depends on a conflip!

English

529

71.3K

Rishi Mehta@rishicomplex·4d

@andreasorob @fchollet In the case of the human participants, from the quote in the paper it appears they can reset the action count in the middle of a game, which the AI can't do

English

Andreas Robinson@andreasorob·4d

@rishicomplex @fchollet Yes, the AI is also allowed to reset the level (neither can reset the game): "Competition Mode... Only Level Resets are premitted..." #competition-mode" target="_blank" rel="nofollow noopener">github.com/arcprize/arc-a…

English

113

Rishi Mehta@rishicomplex·4d

@fchollet according to your paper: "Participants were limited to a single attempt per environment and could not revisit previously completed levels. However, they were allowed to reset the current level at any time. In some cases, participants reset levels after reaching a solution in order to improve efficiency, though this typically increased total interaction time." So humans could play around with the task a bunch, and then just reset the game when they figured it out to get the optimal trajectory? Is AI allowed to do this?

François Chollet@fchollet

ARC-AGI-3 is out now! We've designed the benchmark to evaluate agentic intelligence via interactive reasoning environments. Beating ARC-AGI-3 will be achieved when an AI system matches or exceeds human-level action efficiency on all environments, upon seeing them for the first time. We've done extensive human testing that shows 100% of these environments are solvable by humans, upon first contact, with no prior training and no instructions. Meanwhile, all frontier AI reasoning models do under 1% at this time.

English

2.5K

Rishi Mehta@rishicomplex·4d

@RyanPGreenblatt Possibly not because it looks like they cheated by giving humans infinite retries x.com/i/status/20373…

Rishi Mehta@rishicomplex

English

676

Ryan Greenblatt@RyanPGreenblatt·4d

I wish they published the performance for each human baseliner rather than just the performance of the second best human run on each task. My current guess is that the median human baseliner would score around ~15% on the metric but we can't check because the data isn't public!

ARC Prize@arcprize

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

English

163

15.7K

Rishi Mehta@rishicomplex·17 Mar

@anmol01gulati Energy?

English

Anmol Gulati@anmol01gulati·16 Mar

Why not build data centers in the sea before sending them to space? It seems strictly easier, and cooling wouldn’t be the bottleneck. What am I missing?

Sawyer Merritt@SawyerMerritt

NEWS: Nvidia CEO Jensen Huang announced today that the company is working on a new chip/computer for orbital data-centers called Nvidia Vera Rubin Space-1. "It's going to start data-centers out in space. Of course, in space there's no conduction, no convection, there's just radiation, so we have to figure out how to cool these systems out in space, but we got lots of great engineers working on it."

English

906

Rishi Mehta@rishicomplex·4 Mar

@max_a_schwarzer @MillionInt @polynoamial Welcome!

English

993

Max Schwarzer@max_a_schwarzer·4 Mar

I've decided to leave OpenAI. I'm incredibly proud of all the work I've been part of here, from helping create the reasoning paradigm with @MillionInt, scaling up test-time compute with @polynoamial, working on RL algorithms with my fellow strawberries, shipping o1-preview (which started life as of one of my derisking runs), to post-training o1 and o3 with @ericmitchellai, @yanndubs and many others. I'm most proud of having led the post-training team here for the last year -- the team has done incredible work and shipped some really smart models, including GPT-5, 5.1, 5.2, and 5.3-Codex. OpenAI has genuinely some of the most talented researchers I have ever met, and I have learned more than I could have imagined knowing since I joined as a new grad. I want to thank @markchen90 @FidjiSimo @sama @merettm for all their support over my time here, and too many collaborators to name for the insights, ideas, and just plain fun we have had working together. After leading post-training for a year, though, I'm longing to start fresh and return to IC research work. I've been thinking about going back to technical research for quite some time, and I genuinely believe my colleagues and team here are set up to succeed going forward without me. I'm personally very excited for my next chapter -- I'm proud to be joining @AnthropicAI to get back into the weeds in RL research, and I'm looking forward supporting my friends there at this important time. Many of people I most trust and respect have joined Anthropic over the last couple of years, and I'm excited to work with them again. I have also been very impressed with Anthropic's talent, research taste and values, and I'm excited to be part of what the company does next!

English

614

1.2K

21.3K

3.2M

Rishi Mehta@rishicomplex·28 Şub

Dario explains the Anthropic position youtu.be/MPTNHrq_4LU?si…

YouTube

English

1.2K

Rishi Mehta@rishicomplex·28 Şub

@delam25 This post explains some nuances on autonomous weapons, not using Claude for this right now doesn't give China an edge anthropic.com/news/statement…

English

113

xPosed@delam25·28 Şub

@rishicomplex Ironic that your last post expressed concern about China gaining an edge

English

100

Rishi Mehta@rishicomplex·28 Şub

Glad to work at a place that has integrity

Anthropic@AnthropicAI

A statement on the comments from Secretary of War Pete Hegseth. anthropic.com/news/statement…

English

301

3.2K

Rishi Mehta@rishicomplex·23 Şub

Worth reading the linked article about distillation by Chinese models - this is the biggest threat to the US AI lead

Anthropic@AnthropicAI

We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.

English

1.6K

Rishi Mehta@rishicomplex·21 Şub

This is a game changer

Claude@claudeai

PR monitoring: Open a PR and Claude tracks CI in the background. With auto-fix, it attempts to resolve failures automatically. With auto-merge, PRs land as soon as checks pass. Work on your next task while Claude monitors the previous one.

English

1.1K

Rishi Mehta@rishicomplex·21 Şub

Wild times

Celestia@CelestAI_

🎶 If the Lord won't send us water, oh, we'll get it from the devil; Yes, we'll get it from the devil deeper down. 🎶

English

710

Rishi Mehta@rishicomplex·17 Şub

A much better sonnet!

Claude@claudeai

This is Claude Sonnet 4.6: our most capable Sonnet model yet. It’s a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. It also features a 1M token context window in beta.

English

580

Rishi Mehta@rishicomplex·11 Şub

@volcaholic1 They could sit down?

English

349

Volcaholic 🌋@volcaholic1·9 Şub

It never occurred to me that Giraffes have nowhere to hide from storms! 📍 Maasai Mara, Kenya on Friday

English

2.3K

16.9K

169.9K

25.5M

Rishi Mehta@rishicomplex·8 Şub

This is a surprisingly large productivity unlock because you don't need to context switch as much

Claude@claudeai

Our teams have been building with a 2.5x-faster version of Claude Opus 4.6. We’re now making it available as an early experiment via Claude Code and our API.

English

1.5K

Rishi Mehta@rishicomplex·6 Şub

@dylan522p "While you blinked" makes it sound like I have an unhealthy blinking addiction

English

337

Dylan Patel@dylan522p·5 Şub

4% of GitHub public commits are being authored by Claude Code right now. At the current trajectory, we believe that Claude Code will be 20%+ of all daily commits by the end of 2026. While you blinked, AI consumed all of software development. Read more 👇 newsletter.semianalysis.com/p/claude-code-…

SemiAnalysis@SemiAnalysis_

Claude Code is the Inflection Point, What It Is, How We Use It, Industry Repercussions, Microsoft's Dilemma, Why Anthropic Is Winning. newsletter.semianalysis.com/p/claude-code-…

English

204

551

1.1M

Rishi Mehta@rishicomplex·5 Şub

We made a new Opus! Coding with it feels more like working with a coworker - it's cleverer, more diligent, and more perceptive. Try it out

Claude@claudeai

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

English

3.1K

Rishi Mehta@rishicomplex·31 Oca

@HumanHarlan Yeah most of it seems human directed

English

683

Harlan Stewart@HumanHarlan·31 Oca

PSA: A lot of the Moltbook stuff is fake. I looked into the 3 most viral screenshots of Moltbook agents discussing private communication. 2 of them were linked to human accounts marketing AI messaging apps. And the other is a post that doesn't exist 🧵 x.com/karpathy/statu…

Andrej Karpathy@karpathy

What's currently going on at @moltbook is genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently. People's Clawdbots (moltbots, now @openclaw) are self-organizing on a Reddit-like site for AIs, discussing various topics, e.g. even how to speak privately.

English

216

656

6.2K

1.2M

Rishi Mehta@rishicomplex·31 Oca

Damn

English

529

Découvrir

@sideboared @FakePsyho @andreasorob @fchollet @RyanPGreenblatt @anmol01gulati @max_a_schwarzer @MillionInt