James

1 posts

James

James

@_jrhm_

Katılım Mart 2023
389 Takip Edilen4 Takipçiler
James
James@_jrhm_·
@AlexBarry4 When fit to actual score instead of TH, Mythos is at 92% Wild to think Mythos 2 could score perfectly on METR's task suite
James tweet media
English
1
0
2
1K
Alexander Barry
Alexander Barry@AlexBarry4·
I used Anthropic's internal ECI values from the Opus 4.7 model card to predict the METR Time Horizon values they would receive. This predicts Mythos will have a 50% TH of 40 hours, and Opus 4.7 19 hours. 80% THs are 5.5 and 2.5 hours respectively.
Alexander Barry tweet media
English
6
28
221
114.6K