Nidan

1.6K posts

Nidan

@Nidan_zero

Making nothing happen via twitter / Electronics engineer.

Brisbane, Queensland Katılım Mart 2012

176 Takip Edilen279 Takipçiler

Nidan@Nidan_zero·1d

@Wodeshed lol, says someone that hadn’t driven a Tesla. I drove my Porsche down the coast this week, I thought to myself on the way back that my S would have been easier and more pleasurable. I’ve driven it many times over 1000km with a few stops to charge

English

Nidan@Nidan_zero·25 Nis

@chatgpt21 To be fair Mythos will also get RL releases……. If it every gets released

English

Chris@chatgpt21·24 Nis

GPT-5.5 & 5.5 Pro is released yesterday , and this chart makes me even more bullish on OpenAI’s pace. Mythos Preview looks genuinely strong especially on SWE-bench Pro and Humanity’s Last Exam. But it’s still a limited research preview, and may not be out for another couple months. GPT-5.5 Pro is here in ChatGPT right now. And outside Mythos’s biggest wins, most of these gaps are razor thin: Terminal-Bench is +0.7 for GPT-5.5, GPQA is +1.0 for Mythos, OSWorld is +0.9, CyberGym is +1.3. Noise, basically. The row that actually jumps out to me who cares a lot about computer use is BrowseComp. GPT-5.5 Thinking is close, but GPT-5.5 Pro pushes to 90.1% - ahead of Mythos at 86.9%. By the time Mythos-class models are broadly accessible, OpenAI may have already moved the frontier again. GPT-5.5 Pro is a strong signal for where this race is headed. Considering this is reportedly a new pretrain, I expect we’ll see continued RL gains stacked on top as post-training for this model scales. We’re likely looking at the floor of what this model can do, not the ceiling I’d expect big gains with 5.6 5.7 - GPT 6

English

463

33.6K

Nidan@Nidan_zero·20 Nis

@mattshumer_ 4.7 will oneshot well. The issue is on large codebases it goes offline the rails, doesn’t follow plans and refactors code which it should not be touching. It will also finish early and not finish alll tasks until prompted.

English

237

Matt Shumer@mattshumer_·20 Nis

Am I the only one having a good experience with Opus 4.7? I still vastly prefer Codex for most things but Opus is absolutely nailing every UI task I give it.

English

151

459

52.8K

Nidan@Nidan_zero·20 Nis

@VictorTaelin They are one shoting. It large code bases it goes offline and doesn’t follow instructions. One shoting gives good results.

English

204

Taelin@VictorTaelin·19 Nis

people who swear 4.7 > 4.6 (if anyone): what are you doing

English

200

582

79.2K

Nidan@Nidan_zero·19 Nis

@webdevcody My experience is it can one shot, but inside a code base it duplicates and changes everything

English

305

WebDevCody@webdevcody·19 Nis

Opus 4.7 is actually good. I'm not sure I understand people saying otherwise.

English

172

22.8K

Nidan@Nidan_zero·19 Nis

@haider1 Use opus 4.6 instead of 4.7

English

164

Haider.@haider1·19 Nis

at this point, it seems more clear that we should just use sonnet 4.6 and stop using opus 4.7 especially for coding while sonnet 4.6 is available. opus 4.7 can leave you stuck cleaning up messy code all day, and in its current state, it feels more like a liability

English

160

11.4K

Nidan@Nidan_zero·19 Nis

@AnthropicAI I cannot believe how 4.7 does so poorly every time. I switched to 4.6 no problems for 6 hours. Opened another window, forgot to change to 4.6, so 4.7 went off the rails immediately. 20x more code and the wrong direction. Crazy @bcherny

English

Nidan@Nidan_zero·19 Nis

Opus 4.7 lost its magic. - doesn’t listen - doesn’t understand context - does things outside the plan - replicates code - stops early - 1/2 does everything @AnthropicAI @DarioAmodei @bcherny

English

Nidan@Nidan_zero·19 Nis

@bcherny the new model is bad. It continues to misinterpret everything. Doesn’t listen to context. I’m getting sick of seeing both these messages. It’s becoming unworkable because it’s gone from knowing the right ways to doing the dumbest things. I can’t express how bad it has become because of this. Sad as I was working on a demonstration to show my company how we should change the workflow. Now this model killed it in the water. I’m going to have to see if I can roll back to 4.6

English

Nidan@Nidan_zero·18 Nis

@0xSero Having these issues also

English

0xSero@0xSero·18 Nis

Opus-4.7 is unusable. Multiple times i have given it specific links, for it to use, specifically. Instead it goes finds unrelated links, starts expensive processes, and goes for hours in a completely wrong path. No ability to infer intent. Wasted 200$ worth HF credits. lol

English

107

1.3K

125K

Nidan retweetledi

BOOTOSHI 👑@KingBootoshi·18 Nis

opus 4.7 pisses me off more than any other model this model lies WAY more than any other model it's WAY LAZIER than any other model I can't trust autonomous runs because it just does shit work compared to 4.6 or 4.5 this was last nights run maybe the update today fixes this?

English

228

11.1K

Nidan@Nidan_zero·18 Nis

@johnennis Maybe I found the issue. I have to stop using opus for a while, it’s driving me crazy

English

701

John Ennis@johnennis·18 Nis

Honestly starting to hate Opus 4.7

English

139

512

56.6K

Nidan@Nidan_zero·18 Nis

@johnennis It’s also not finishing the list of tasks. It always leaves some out. @bcherny

English

476

Nidan@Nidan_zero·18 Nis

@johnennis Yes, having that issue too. 2 massive parallel systems were created today :(

English

458

Nidan@Nidan_zero·18 Nis

@theo It over complicates things and builds parallel systems. It assumes more and it doesn’t follow instructions

English

297

Theo - t3.gg@theo·16 Nis

How are people feeling about opus 4.7 so far?

English

792

1.7K

386.4K

Nidan@Nidan_zero·18 Nis

@theo It’s bad

English

Nidan@Nidan_zero·18 Nis

@bcherny the new model is so bad, today it has over complicated EVERYTHING. Builds new systems in parallel over and over again. Spends 3 hours doing something which has been repeatedly wrong 🙁

English

Nidan@Nidan_zero·18 Nis

@bcherny 4.7 seems to be lazy and doesn’t follow instructions and ends up taking a liberty to change other things when not asked. Spent a lot of time undoing the stupidness while also agreeing to continue on multiple phases rather than it continuing by itself. Now I have to keep telling it to continue on a multiphase project. Really unfortunate

English

Nidan@Nidan_zero·17 Nis

@AiBattle_ This is the benchmark that worries me the most about this new model. Huge regression

English

2.5K

AiBattle@AiBattle_·16 Nis

Opus 4.7 (Max) and Opus 4.6 (64K) scores on the MRCR v2 (8-needle) context benchmark 256K: - Opus 4.6: 91.9% - Opus 4.7: 59.2% 1M: - Opus 4.6: 78.3% - Opus 4.7: 32.2%

English

1.7K

445.8K

Nidan@Nidan_zero·14 Nis

@ASVPxdrizzle @SawyerMerritt Starlink is about the same size, but has better aero

English

Andrew Lopez@ASVPxdrizzle·13 Nis

@SawyerMerritt Size comparison to starlink for commercial planes? I read somewhere that one of the major airlines didn’t want starlink because wifi housing typically adds too much drag to the plane (starlink is low profile). Would be interesting to see if they go for amazons offering instead

English

379

Sawyer Merritt@SawyerMerritt·13 Nis

NEWS: Amazon has unveiled its Amazon Leo Aviation Antenna. "This will deliver reliable internet connectivity to airline passengers and crew with up to 1 Gbps download and 400 Mbps upload speeds. The low-profile antenna has no moving parts, reducing maintenance, and can be installed in one day." • 58 inches long • 30 inches wide • 2.6 inches high

English

165

148

2.1K

270.7K

Keşfet

@Wodeshed @chatgpt21 @mattshumer_ @VictorTaelin @webdevcody @haider1 @AnthropicAI @bcherny