pomterre
244 posts





Can't stop thinking about how Claude Code is in LAST PLACE on TerminalBench for harnesses using Opus 4.6. There are TEN separate harnesses that use Opus better than Claude Code










Check out this SVG created with GPT-5.4. Amazing if true. Remember @QuiverAI from waaay back (a week ago)?





A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year


Codex is back and operating at normal latency for users globally. The outage affected a significant part of our users who were in the US or routed to US clusters for processing. Team is making immediate improvements to help ensure Codex can run 24/7 in the future.


Meet GPT-5.1-Codex-Max, our latest frontier agentic coding model, available in Codex starting today. It’s faster, more capable and token-efficient, and able to work persistently on long tasks with built-in compaction abilities.








