We've entered into an agreement to join OpenAI as part of the Codex team.
I'm incredibly proud of the work we've done so far, incredibly grateful to everyone that's supported us, and incredibly excited to keep building tools that make programming feel different.
@ericmitchellai@atulit_gaur Hi eric, catastrophic things happen when I ask GPT-5.4 Pro to generate an image - it spams the imagen tool and I end up hitting the rate limit
@ericmitchellai@nicdunz 5.4 constantly does this with me. It seems like using the web tool makes it behave like the prior message was the first one in the conversation. I therefore explicitly ask it not to use the web tool as it will mess up your work. It also makes it reasoning about the task.
@ericmitchellai I don't love sharing links / logs since they often involve non-public docs but here's an example prompt -
"Given what you know of me, what should I read right now?"
It then suggested reading docs *from* the (ChatGPT) Project folder itself, not public reading material.
The most common failure mode I've observed with GPT-5.4 is misunderstanding the intent behind the prompt (but then doing a good job at what it thought the task was).
I'm not sure if this is a regression or not, but it stands out by contrast w/ the task execution
I made a nonfiction writing benchmark to evaluate this and, as much as I love the writing improvements in GPT-5.4, it confirmed my observation that by default GPT-5.4 is excessively verbose. I have tamed it with custom instructions but here's a heat map showing its strengths and weaknesses relative to Opus and Sonnet.
gpt 5.4 has improved in conversation. content wise the answers are rich too based on some soft questions / day to day life stuff i asked. opus 4.6 however is still much more enjoyable to talk to. gpt 5.4 just has some slop patterns still.
Chat GPT 5.4 is really frustrating to work with. Awful hallucinations and laziness. It's hard to tell if it's "smarter" when it's so difficult to steer.