James@_jrhm_·22 Nis@AlexBarry4 When fit to actual score instead of TH, Mythos is at 92% Wild to think Mythos 2 could score perfectly on METR's task suiteÇevir English1021K1
Alexander Barry@AlexBarry4·22 NisI used Anthropic's internal ECI values from the Opus 4.7 model card to predict the METR Time Horizon values they would receive. This predicts Mythos will have a 50% TH of 40 hours, and Opus 4.7 19 hours. 80% THs are 5.5 and 2.5 hours respectively.Çevir English628221114.6K59