
@pigeon__s Yeah I will probably update it when the next gen sonnet comes out. I prefer to skip a few generations as it's expensive & time consuming to re-score the whole leaderboard.
English
Sam Paech
1.2K posts

@sam_paech
Evals @liquidai Maintainer of EQ-Bench https://t.co/Jy56OlHrP5 https://t.co/oRApPQwvWS























Sonnet-4.6 takes top place on all my evals: EQ-Bench, Creative writing, Longform writing & Judgemark. Opus 4.6 within margin of error. GLM-5 and Qwen3.5-397B nipping at their heels.









