Nidan
1.6K posts

Nidan
@Nidan_zero
Making nothing happen via twitter / Electronics engineer.
Brisbane, Queensland Katılım Mart 2012
176 Takip Edilen279 Takipçiler

@chatgpt21 To be fair Mythos will also get RL releases……. If it every gets released
English

GPT-5.5 & 5.5 Pro is released yesterday , and this chart makes me even more bullish on OpenAI’s pace.
Mythos Preview looks genuinely strong especially on SWE-bench Pro and Humanity’s Last Exam. But it’s still a limited research preview, and may not be out for another couple months. GPT-5.5 Pro is here in ChatGPT right now.
And outside Mythos’s biggest wins, most of these gaps are razor thin: Terminal-Bench is +0.7 for GPT-5.5, GPQA is +1.0 for Mythos, OSWorld is +0.9, CyberGym is +1.3. Noise, basically.
The row that actually jumps out to me who cares a lot about computer use is BrowseComp. GPT-5.5 Thinking is close, but GPT-5.5 Pro pushes to 90.1% - ahead of Mythos at 86.9%.
By the time Mythos-class models are broadly accessible, OpenAI may have already moved the frontier again. GPT-5.5 Pro is a strong signal for where this race is headed.
Considering this is reportedly a new pretrain, I expect we’ll see continued RL gains stacked on top as post-training for this model scales. We’re likely looking at the floor of what this model can do, not the ceiling I’d expect big gains with 5.6 5.7 - GPT 6

English

@mattshumer_ 4.7 will oneshot well. The issue is on large codebases it goes offline the rails, doesn’t follow plans and refactors code which it should not be touching. It will also finish early and not finish alll tasks until prompted.
English

@VictorTaelin They are one shoting. It large code bases it goes offline and doesn’t follow instructions. One shoting gives good results.
English

@webdevcody My experience is it can one shot, but inside a code base it duplicates and changes everything
English

@AnthropicAI I cannot believe how 4.7 does so poorly every time. I switched to 4.6 no problems for 6 hours. Opened another window, forgot to change to 4.6, so 4.7 went off the rails immediately. 20x more code and the wrong direction. Crazy @bcherny
English

Opus 4.7 lost its magic.
- doesn’t listen
- doesn’t understand context
- does things outside the plan
- replicates code
- stops early
- 1/2 does everything
@AnthropicAI @DarioAmodei @bcherny
English

@bcherny the new model is bad. It continues to misinterpret everything. Doesn’t listen to context. I’m getting sick of seeing both these messages.
It’s becoming unworkable because it’s gone from knowing the right ways to doing the dumbest things. I can’t express how bad it has become because of this. Sad as I was working on a demonstration to show my company how we should change the workflow. Now this model killed it in the water. I’m going to have to see if I can roll back to 4.6


English
Nidan retweetledi

@johnennis Maybe I found the issue. I have to stop using opus for a while, it’s driving me crazy

English

@johnennis It’s also not finishing the list of tasks. It always leaves some out. @bcherny
English

@johnennis Yes, having that issue too. 2 massive parallel systems were created today :(
English

@bcherny 4.7 seems to be lazy and doesn’t follow instructions and ends up taking a liberty to change other things when not asked. Spent a lot of time undoing the stupidness while also agreeing to continue on multiple phases rather than it continuing by itself. Now I have to keep telling it to continue on a multiphase project. Really unfortunate
English

@AiBattle_ This is the benchmark that worries me the most about this new model. Huge regression
English

@ASVPxdrizzle @SawyerMerritt Starlink is about the same size, but has better aero
English

@SawyerMerritt Size comparison to starlink for commercial planes?
I read somewhere that one of the major airlines didn’t want starlink because wifi housing typically adds too much drag to the plane (starlink is low profile). Would be interesting to see if they go for amazons offering instead
English

NEWS: Amazon has unveiled its Amazon Leo Aviation Antenna.
"This will deliver reliable internet connectivity to airline passengers and crew with up to 1 Gbps download and 400 Mbps upload speeds. The low-profile antenna has no moving parts, reducing maintenance, and can be installed in one day."
• 58 inches long
• 30 inches wide
• 2.6 inches high



English










