Nidan

1.6K posts

Nidan banner
Nidan

Nidan

@Nidan_zero

Making nothing happen via twitter / Electronics engineer.

Brisbane, Queensland Katılım Mart 2012
176 Takip Edilen279 Takipçiler
Nidan
Nidan@Nidan_zero·
@Wodeshed lol, says someone that hadn’t driven a Tesla. I drove my Porsche down the coast this week, I thought to myself on the way back that my S would have been easier and more pleasurable. I’ve driven it many times over 1000km with a few stops to charge
English
0
0
1
39
Nidan
Nidan@Nidan_zero·
@chatgpt21 To be fair Mythos will also get RL releases……. If it every gets released
English
0
0
1
87
Chris
Chris@chatgpt21·
GPT-5.5 & 5.5 Pro is released yesterday , and this chart makes me even more bullish on OpenAI’s pace. Mythos Preview looks genuinely strong especially on SWE-bench Pro and Humanity’s Last Exam. But it’s still a limited research preview, and may not be out for another couple months. GPT-5.5 Pro is here in ChatGPT right now. And outside Mythos’s biggest wins, most of these gaps are razor thin: Terminal-Bench is +0.7 for GPT-5.5, GPQA is +1.0 for Mythos, OSWorld is +0.9, CyberGym is +1.3. Noise, basically. The row that actually jumps out to me who cares a lot about computer use is BrowseComp. GPT-5.5 Thinking is close, but GPT-5.5 Pro pushes to 90.1% - ahead of Mythos at 86.9%. By the time Mythos-class models are broadly accessible, OpenAI may have already moved the frontier again. GPT-5.5 Pro is a strong signal for where this race is headed. Considering this is reportedly a new pretrain, I expect we’ll see continued RL gains stacked on top as post-training for this model scales. We’re likely looking at the floor of what this model can do, not the ceiling I’d expect big gains with 5.6 5.7 - GPT 6
Chris tweet media
English
24
45
463
33.6K
Nidan
Nidan@Nidan_zero·
@mattshumer_ 4.7 will oneshot well. The issue is on large codebases it goes offline the rails, doesn’t follow plans and refactors code which it should not be touching. It will also finish early and not finish alll tasks until prompted.
English
0
0
0
237
Matt Shumer
Matt Shumer@mattshumer_·
Am I the only one having a good experience with Opus 4.7? I still vastly prefer Codex for most things but Opus is absolutely nailing every UI task I give it.
English
151
4
459
52.8K
Nidan
Nidan@Nidan_zero·
@VictorTaelin They are one shoting. It large code bases it goes offline and doesn’t follow instructions. One shoting gives good results.
English
0
0
1
204
Taelin
Taelin@VictorTaelin·
people who swear 4.7 > 4.6 (if anyone): what are you doing
English
200
5
582
79.2K
Nidan
Nidan@Nidan_zero·
@webdevcody My experience is it can one shot, but inside a code base it duplicates and changes everything
English
0
0
0
305
WebDevCody
WebDevCody@webdevcody·
Opus 4.7 is actually good. I'm not sure I understand people saying otherwise.
English
55
5
172
22.8K
Nidan
Nidan@Nidan_zero·
@haider1 Use opus 4.6 instead of 4.7
English
0
0
0
164
Haider.
Haider.@haider1·
at this point, it seems more clear that we should just use sonnet 4.6 and stop using opus 4.7 especially for coding while sonnet 4.6 is available. opus 4.7 can leave you stuck cleaning up messy code all day, and in its current state, it feels more like a liability
English
40
7
160
11.4K
Nidan
Nidan@Nidan_zero·
@AnthropicAI I cannot believe how 4.7 does so poorly every time. I switched to 4.6 no problems for 6 hours. Opened another window, forgot to change to 4.6, so 4.7 went off the rails immediately. 20x more code and the wrong direction. Crazy @bcherny
English
0
0
0
6
Nidan
Nidan@Nidan_zero·
Opus 4.7 lost its magic. - doesn’t listen - doesn’t understand context - does things outside the plan - replicates code - stops early - 1/2 does everything @AnthropicAI @DarioAmodei @bcherny
English
0
0
1
19
Nidan
Nidan@Nidan_zero·
@bcherny the new model is bad. It continues to misinterpret everything. Doesn’t listen to context. I’m getting sick of seeing both these messages. It’s becoming unworkable because it’s gone from knowing the right ways to doing the dumbest things. I can’t express how bad it has become because of this. Sad as I was working on a demonstration to show my company how we should change the workflow. Now this model killed it in the water. I’m going to have to see if I can roll back to 4.6
Nidan tweet mediaNidan tweet media
English
0
0
0
6
Nidan
Nidan@Nidan_zero·
@0xSero Having these issues also
English
0
0
0
54
0xSero
0xSero@0xSero·
Opus-4.7 is unusable. Multiple times i have given it specific links, for it to use, specifically. Instead it goes finds unrelated links, starts expensive processes, and goes for hours in a completely wrong path. No ability to infer intent. Wasted 200$ worth HF credits. lol
English
107
43
1.3K
125K
Nidan retweetledi
BOOTOSHI 👑
BOOTOSHI 👑@KingBootoshi·
opus 4.7 pisses me off more than any other model this model lies WAY more than any other model it's WAY LAZIER than any other model I can't trust autonomous runs because it just does shit work compared to 4.6 or 4.5 this was last nights run maybe the update today fixes this?
English
36
11
228
11.1K
Nidan
Nidan@Nidan_zero·
@johnennis Maybe I found the issue. I have to stop using opus for a while, it’s driving me crazy
Nidan tweet media
English
0
0
7
701
John Ennis
John Ennis@johnennis·
Honestly starting to hate Opus 4.7
John Ennis tweet media
English
139
13
512
56.6K
Nidan
Nidan@Nidan_zero·
@johnennis It’s also not finishing the list of tasks. It always leaves some out. @bcherny
English
0
0
1
476
Nidan
Nidan@Nidan_zero·
@johnennis Yes, having that issue too. 2 massive parallel systems were created today :(
English
0
0
2
458
Nidan
Nidan@Nidan_zero·
@theo It over complicates things and builds parallel systems. It assumes more and it doesn’t follow instructions
English
1
0
2
297
Theo - t3.gg
Theo - t3.gg@theo·
How are people feeling about opus 4.7 so far?
English
792
14
1.7K
386.4K
Nidan
Nidan@Nidan_zero·
@theo It’s bad
English
0
0
0
7
Nidan
Nidan@Nidan_zero·
@bcherny the new model is so bad, today it has over complicated EVERYTHING. Builds new systems in parallel over and over again. Spends 3 hours doing something which has been repeatedly wrong 🙁
English
0
0
0
1
Nidan
Nidan@Nidan_zero·
@bcherny 4.7 seems to be lazy and doesn’t follow instructions and ends up taking a liberty to change other things when not asked. Spent a lot of time undoing the stupidness while also agreeing to continue on multiple phases rather than it continuing by itself. Now I have to keep telling it to continue on a multiphase project. Really unfortunate
English
0
0
0
0
Nidan
Nidan@Nidan_zero·
@AiBattle_ This is the benchmark that worries me the most about this new model. Huge regression
English
0
0
1
2.5K
AiBattle
AiBattle@AiBattle_·
Opus 4.7 (Max) and Opus 4.6 (64K) scores on the MRCR v2 (8-needle) context benchmark 256K: - Opus 4.6: 91.9% - Opus 4.7: 59.2% 1M: - Opus 4.6: 78.3% - Opus 4.7: 32.2%
AiBattle tweet media
English
89
76
1.7K
445.8K
Andrew Lopez
Andrew Lopez@ASVPxdrizzle·
@SawyerMerritt Size comparison to starlink for commercial planes? I read somewhere that one of the major airlines didn’t want starlink because wifi housing typically adds too much drag to the plane (starlink is low profile). Would be interesting to see if they go for amazons offering instead
English
2
0
0
379
Sawyer Merritt
Sawyer Merritt@SawyerMerritt·
NEWS: Amazon has unveiled its Amazon Leo Aviation Antenna. "This will deliver reliable internet connectivity to airline passengers and crew with up to 1 Gbps download and 400 Mbps upload speeds. The low-profile antenna has no moving parts, reducing maintenance, and can be installed in one day." • 58 inches long • 30 inches wide • 2.6 inches high
Sawyer Merritt tweet mediaSawyer Merritt tweet mediaSawyer Merritt tweet media
English
165
148
2.1K
270.7K