Alex
1.3K posts


Deepseek v4 is a huge step upwards compared to DeepSeek 3, outperforms on SWE verified opus 4.6 and GPT-5.4 and sets a new record on Codeforces. Needs to be tested against opus 4.7 and GPT-5.5 tho and see if real world usage holds its promises. Big release! Sota open source model!

We tried a new thing with NVIDIA to roll out Codex across a whole company and it was awesome to see it work. Let us know if you'd like to do it at your company!

GPT-5.5-Pro on ARC-AGI (Verified) ARC-AGI-2: - Max: 82.2%, $10.76 - High: 84.6%, $10.51 GPT-5.5-Pro performs on par with GPT-5.5 on ARC-AGI for +1 OOM cost


OpenAI releases their SWE scores while noting that Anthropic reported signs of memorization on a subset of problems 😮


For no reason in particular, I made my first crypto challenge. I will pay $1,000 to whoever solves it first. Winner is whoever gets the answer into my DMs first.

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

One thing I gotta hand @openai is the way they'd built the Codex team w/ @embirico and @thsottiaux.. relentless focus on taste and craft despite being caught flat footed by cc.. and we can see the results.. the vibes are immaculate. The products are great. And a lot to look forward to as a user/dev. Like Steve said.. A players attract A players.



Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.

Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.









