the mog community really did that
not gonna farm clout for the community making mogging what it is today but if you were there, you know
anyways guys i’m gonna go get drunk happy saturday
In my experience GPT, Gemini, and DeepSeek v4 are strongest in theoretical physics. This benchmark suggests the same.
CritPt evaluates language models on solving unpublished, frontier-level physics problems that require genuine research-scale reasoning. The benchmark comprises 71 challenges (70 test challenges and one example), created by over 50 active physics researchers across 30 institutions and spanning 11 physics subfields