
Have AI capabilities accelerated? On 3 out of the 4 AI capability metrics we investigated, we found strong evidence of acceleration, around when reasoning models emerged.
Alexander Barry
55 posts

@AlexBarry4
Independent Statistical Consultant, https://t.co/fKwPM5dSGc Substack: https://t.co/760E1w9ol7

Have AI capabilities accelerated? On 3 out of the 4 AI capability metrics we investigated, we found strong evidence of acceleration, around when reasoning models emerged.








I think this guess for 80% time horizon for Mythos Preview is probably somewhat too high, but I'm not confident. I originally guessed 2.5 hours (based on a quick and dirty extrapolation using gap from Opus 4 to 4.6), but based on this I've I updated to 3.5 hours.









GPT 5.5 (xhigh) scores 84.9% on WeirdML taking the lead over 5.5 (high) 83.9%. Even (xhigh) is not using more than about 15k output tokens. Thanks to @METR_Evals for the increased support that allowed for this run. Opus 4.7 (max) soon, and more things in the pipeline.















Have AI capabilities accelerated? On 3 out of the 4 AI capability metrics we investigated, we found strong evidence of acceleration, around when reasoning models emerged.



I used Anthropic's internal ECI values from the Opus 4.7 model card to predict the METR Time Horizon values they would receive. This predicts Mythos will have a 50% TH of 40 hours, and Opus 4.7 19 hours. 80% THs are 5.5 and 2.5 hours respectively.