固定されたツイート

@GoodfireAI Mathematics is not just a human invention; sometimes it emerges spontaneously whenever a system tries to compress the world and understand its patterns in the most efficient way possible
English
simobis
161 posts



Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

The standard GPT-5.5 reproduced the proof ~ 👇 chatgpt.com/share/6a0e9e04… You don't need to wait for oai's internal model!














iris-alpha