Michael D. Moffitt

356 posts

Michael D. Moffitt banner
Michael D. Moffitt

Michael D. Moffitt

@mmoffitt

Google DeepMind

Austin, TX Katılım Temmuz 2011
5.3K Takip Edilen835 Takipçiler
Michael D. Moffitt retweetledi
ARC Prize
ARC Prize@arcprize·
Thank you to everyone who came out to the ARC-AGI-3 Launch Party last night Incredible room of people pushing AI forward ARC Prize 2026 competition is now open - let the games begin
ARC Prize tweet mediaARC Prize tweet media
English
20
28
460
41.3K
Michael D. Moffitt retweetledi
Bryan Landers
Bryan Landers@bryanlanders·
It's alive! This 3rd version of ARC-AGI represents an incredible amount of work from the ARC Prize team. Hundreds of games. Thousands of levels. Go build agents!
ARC Prize@arcprize

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

English
0
2
10
494
Michael D. Moffitt retweetledi
Mike Knoop
Mike Knoop@mikeknoop·
ARC-AGI-3 and ARC Prize 2026 are now live with $2,000,000 in prizes! As of today, version 3 is the world's only unsaturated agentic intelligence benchmark. Humans score 100% and frontier AI scores ~0%. Play here: arcprize.org/arc-agi/3 While no single version of ARC is definitionally AGI, our aim with the ARC-AGI Series is to continually produce useful scientific benchmarks which identify large remaining gaps between Humans and Frontier AI. At some point, we'll be unable to, and then we'll have AGI. Our new benchmark consists of over 100 novel game environments encompassing nearly 1,000 levels. Notably, test takers are given no explicit goals (other than to win) and must explore the environments to acquire goals, understand rules, develop strategy, and ultimately execute a plan to win. ARC-AGI-3 is a test of agentic intelligence. Beating this benchmark requires on-the-fly world modeling and continual learning to adapt to evolving environments. To score 100% AI must beat all of the games as efficiently as the human baseline (e.g., the number of actions taken to win). An ARC first, this gives us a formal comparison of AI reasoning efficiency vs humans. Version 3 carries classic ARC design principles: core knowledge priors only, private test sets to measure generalization, and it's fun! Every benchmark we release is an experiment and I believe this new version will provide strong signal towards increasingly autonomous AI agents. Prior versions of ARC held strong predictive power for important AI moments. Version 1 only saw progress with the release of AI reasoning models in late 2024 and Version 2 only began seeing progress with the advent of agentic coding models in late 2025. Version 3 is expected to signal when AI agents can become economically useful in more open-ended domains (beyond highly measurable domains like coding and math). There are a few other important design changes for ARC-AGI-3. The public set is now a "demonstration" set, not a training set. And unlike prior versions, the private set is now explicitly designed to be Out Of Distribution (non-IDD) from the public demo set. This is to mitigate targeting and because LLMs can now generalize over IDD splits using AI reasoning. Frontier models have made great progress over the past year. So much that several industry leaders have suggested we may already have AGI. Part of the ARC Prize Foundation mission is to provide accurate public sense finding and we strive to reduce false-positive claims. To this end, we've updated our testing policy. Going forward we will only verify scores outside of the official Kaggle competitions from AI systems with high commercial usage or are 100% open source. We're also adopting a stateless client scoring philosophy to ensure humans and AI are tested under identical conditions. The goal of these changes is to reduce the amount of developer-aware targeting (whether incidental or intentional) and provide clear signal if actual AGI progress has occurred. The Foundation also has a goal to inspire AI innovation which is most likely to come from the community. We've seen dozens of startups using ARC as a tool for showcasing their ideas - a few have fundraised serious capital based on their ARC results. To support this we're launching a new Community leaderboard. While scores for this leaderboard can't be Verified, and you should explicitly not trust these scores as an accurate measure of AGI progress, we will curate the best ideas and promote them. This year I expect we will see rapid progress on the ARC-AGI-3 Community leaderboard and the best ideas will eventually migrate into frontier models and onto the Verified leaderboard. Finally, we’ve partnered again with Kaggle to run two competition tracks for ARC-AGI-2 and ARC-AGI-3. This will be the last year for Version 2. When we launched the first ARC Prize back in 2024, I committed to running the Grand Prize until it was beaten. So for the ARC-AGI-2 track we will be paying out the Grand Prize to the best team, no matter what, in order to honor this commitment. In accordance with the Foundation mission, to win any prize money you must open-source a reproducible solution. We raised the standard for open source to include training. I'm excited to produce a truly open solution as a final send off for the ARC-AGI-1 and 2 format. Focus is now on ARC-AGI-3 (we've even started work on Versions 4 and 5). As always, I'm honored to have the opportunity to steward attention towards AGI progress. I'm also super grateful to the incredible ARC Prize team - including our core engineers, game designers, and human testers - led by @GregKamradt without whom we would not have this incredibly useful benchmark. See you on the leaderboard!
English
10
21
144
15K
Michael D. Moffitt retweetledi
ARC Prize
ARC Prize@arcprize·
Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn
GIF
English
230
579
4.3K
672.3K
Michael D. Moffitt retweetledi
François Chollet
François Chollet@fchollet·
ARC-AGI-3 is out now! We've designed the benchmark to evaluate agentic intelligence via interactive reasoning environments. Beating ARC-AGI-3 will be achieved when an AI system matches or exceeds human-level action efficiency on all environments, upon seeing them for the first time. We've done extensive human testing that shows 100% of these environments are solvable by humans, upon first contact, with no prior training and no instructions. Meanwhile, all frontier AI reasoning models do under 1% at this time.
English
172
316
2.6K
529.8K
Michael D. Moffitt retweetledi
Ryan Burnell
Ryan Burnell@DrRyanBurnell·
How can we evaluate progress toward AGI as systems grow increasingly capable? In our new pre-print, we introduce a scientifically-grounded framework for mapping the cognitive capabilities of AI systems: storage.googleapis.com/deepmind-media…
Ryan Burnell tweet media
English
1
5
11
797
Michael D. Moffitt retweetledi
ARC Prize
ARC Prize@arcprize·
ARC Prize Foundation is part of the @ycombinator W26 batch as the only non-profit. For Demo Day we’re shipping ARC-AGI-3, an interactive reasoning benchmark for the next era of agentic intelligence. ARC and YC are mission aligned that new ideas that push the frontier.
ARC Prize tweet media
English
9
13
107
24.1K
Michael D. Moffitt retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
Excited to launch Gemini 3.1 Pro! Major improvements across the board including in core reasoning and problem solving. For example scoring 77.1% on the ARC-AGI-2 benchmark - more than 2x the performance of 3 Pro. Rolling out today in @GeminiApp, @antigravity and more - enjoy!
Demis Hassabis tweet media
English
246
420
5K
245.7K
Most Ridiculous Bear Market Ever
Most Ridiculous Bear Market Ever@MRBullMktEver·
I guess I don't understand why @LynAldenContact is making a point that you can self custody your BTC but not your NVDA stock? The only reason you'd care about self custody of an asset is if you're a criminal and you're afraid your assets will be frozen. Why else would you care?
Swan@Swan

“You can’t self custody your Nvidia stock.” – Lyn Alden That’s the difference. Stocks are claims. Bitcoin is sovereign property. But how big is the market for self-custodial hard money?

English
237
3
128
111.7K
Michael D. Moffitt
Michael D. Moffitt@mmoffitt·
The ARC Prize north star is shining brighter than ever.
Greg Kamradt@GregKamradt

At >95%, ARC-AGI-1 is effectively performance-saturated at this point. Models are becoming incredible. They'll continue to hill climb, but the next satisfying milestone won't come till 100%. However, ARC-AGI-1 still has useful life. Performance comes at a cost, and ARC-AGI-1 will monitor the efficiency of models - intelligence per watt. My hypotheses for the next 12 months: - Labs, one by one, get verified at >95% on ARC-AGI-1 before May. - We won't see a >95% 2x order-of-magnitude cost reduction (<$0.013/task) until June '27 (happy to make this bet with someone). - We're at the point where model overhang is so large that the *potential energy* is near max. The gap between what a model *can* do and what the industry is actually capturing feels big. The rate of our tool building is _slower_ than the rate of model performance. My vibe: Our tools are using ~5% of model performance right now. I expect ~2 more "OpenClaw" moments within 12 months (also happy to take a bet on this one). What does this mean for ARC Prize and benchmarks in general? The next 12–24 months of benchmark building (in general, not just ARC) are straightforward, though execution-heavy. Two routes: 1/ Harder problems (e.g., Frontier 5, HLE++) 2/ Niche environment domains (e.g., TerminalBench-style, but for narrow domains) For ARC Prize, we have a different focus: what *humans can do, but AI cannot.* Our ability to produce this class of problems show a gap between AI today and AGI (since humans are our only proof point of general intelligence). The goalpost is static: measure the learning efficiency (intelligence) of AI and declare when it has crossed human performance. However, our "tools" to do that evolve and improve. This is our ARC-AGI series of benchmarks. Each subsequent benchmark (v1/2/3+) is more capable of measuring complex learning. There will be a point where we can *no longer* come up with problems humans can do that AI cannot. At that point we have AGI. We're not here yet. Through building ARC-AGI-2 (and soon with ARC-AGI-3), we've found a repeatable process for building benchmarks until AGI. This is a multi-year process. So as an org, our aim as a north star toward AGI is: 1/ Inspire the next set of frontier open research 2/ Guide the public in sense finding. Understand where the frontier is. What matters is net new science and accelerated open progress. Someone recently called ARC-AGI François's "hardest" benchmark. We don't see it that way, and "hard" isn't the right attribute for ARC-AGI. ARC-AGI’s perceived difficulty is an emergent property of our objective. We create benchmarks that accurately reflect intelligence. They are "hard" because building general intelligence is still genuinely hard and not solved. That's the ARC Prize journey! Can't wait to build more with these models. Let's go!

English
1
0
4
318
Michael D. Moffitt retweetledi
François Chollet
François Chollet@fchollet·
The new Gemini Deep Think is achieving some truly incredible numbers on ARC-AGI-2. We certified these scores in the past few days.
François Chollet tweet media
English
87
198
2.2K
210.9K
Michael D. Moffitt retweetledi
François Chollet
François Chollet@fchollet·
We are looking for brilliant deep learning researchers to help us solve program synthesis at @ndea. If you strongly feel like AGI should be capable of invention, not just automation, consider joining us. Apply here: ndea.com/jobs
English
27
36
356
38.4K
Shivers
Shivers@thinkingshivers·
I made a simple puzzle game! Try to enclose the horse in the largest possible area with limited walls.
English
349
1.7K
30.2K
2.3M
Michael D. Moffitt retweetledi
ARC Prize
ARC Prize@arcprize·
ARC Prize 2025 Winners Interviews Paper Award 3rd Place @LiaoIsaac91893 shares the story behind CompressARC - an MDL-based, single puzzle-trained neural code golf system that achieves ~20–34% on ARC-AGI-1 and ~4% on ARC-AGI-2 without any pretraining or external data.
English
2
9
125
55.3K
guille
guille@angeris·
updates on the michelson
guille tweet media
English
2
0
7
726
エチレン
エチレン@ethylene_66·
code golf workshop, 10人しか参加しないミニイベントかと思いきや、聴衆50人くらいいてひっくり返ってる
日本語
1
1
9
1K