Max Niederman
354 posts

Max Niederman
@MaxNiederman
Head of Quality @ Mechanize | 19M


Note that when you look into these sorts of claims, the “subsidy” is simply that it is not taxed as much as they think it should be. You can argue the status quo is wrong, but you can’t really call it a subsidy, now can you?





new @METR_Evals research note from @whitfill_parker, @cherylwoooo, nate rush, and me. (chiefly parker!) we find that *half* of SWE-bench Verified solutions from Sonnet 3.5-to-4.5 generation AIs *which are graded as passing* are rejected by project maintainers.


Emergent Misalignment is Easy, Narrow Misalignment is Hard Anna Soligo (@anna_soligo), Edward Turner, Senthooran Rajamanoharan (@sen_r), Neel Nanda (@NeelNanda5)


decided to run 5.3 codex on xhigh as well, its 90%... rip IBench, survived 3 months.














