keeth retweetledi

ARC-AGI-3 is out now! We've designed the benchmark to evaluate agentic intelligence via interactive reasoning environments. Beating ARC-AGI-3 will be achieved when an AI system matches or exceeds human-level action efficiency on all environments, upon seeing them for the first time.
We've done extensive human testing that shows 100% of these environments are solvable by humans, upon first contact, with no prior training and no instructions.
Meanwhile, all frontier AI reasoning models do under 1% at this time.
English













