Niko E.

4.8K posts

Niko E.

Niko E.

@nefthy

Erde Katılım Nisan 2010
511 Takip Edilen38 Takipçiler
Niko E.
Niko E.@nefthy·
@vaaselene Not for coding, but for: - extracting json from text - works knowledge inquiries - project brainstorming - ocr
English
0
0
0
6
Selene
Selene@vaaselene·
is anyone using Gemini?
English
853
11
1K
132.1K
Niko E.
Niko E.@nefthy·
@kr0der you are missing out on the amazing speed of low. And its good. Very resonable default.
English
0
0
0
37
Anthony Kroeger
Anthony Kroeger@kr0der·
i started using GPT 5.5 High w/o fast mode and it feels just as fast as 5.4 High w/ fast mode also i'm sticking with High rather than Low because i don't like swapping reasoning constantly high is just the perfect tradeoff between intelligence, speed, and cost
English
11
2
83
5.3K
BlueG
BlueG@A_Damn_Love_520·
@victori84819871 @nicomuellerAT @thsottiaux Honestly, lowering the thinking level kinda makes 5.5 feel dumb 😂 The main issue is that even with 5.4 Fast on xhigh, it never burned through my weekly limit this quickly.
English
2
0
1
68
Tibo
Tibo@thsottiaux·
Oh no, not you GPT-5.5
Tibo tweet media
English
94
18
1.5K
123.6K
EnPassantGuy
EnPassantGuy@enpassantguy·
@gdb @romainhuet Not as well as I had hoped tbh. I can’t tell a difference in my development job between 5.5 and 5.4. Sadly been using Opus 4.7 as a result, which has been really good.
English
1
0
1
346
Niko E.
Niko E.@nefthy·
@VraserX On a few comparisons I did it is meaningfully faster than 5.4 but rather 20% or so. I'm testing low as my new default effort level and it is very good so far.
English
0
0
0
108
VraserX e/acc
VraserX e/acc@VraserX·
GPT-5.5 feels insane so far. What I love most is that it is blazing fast, but still clearly smarter than GPT-5.4. In my own tests, it feels around 2x to 5x faster depending on the task. What’s your experience so far?
English
42
7
376
158.2K
Stefan Streichsbier
Stefan Streichsbier@s_streichsbier·
@bygregorr yep, makes the API cost difference reasonable. You end up paying roughly the same for a better and faster result.
English
1
0
12
2.5K
Niko E.
Niko E.@nefthy·
@IceSolst One of the benefits of reviews is that you have two people who have seen each piece of code. If you don't need that, I envy the simplicity and scale, you are dealing with.
English
0
0
0
28
Niko E.
Niko E.@nefthy·
@llmdevguy @buildwithparas I agree, yet everybody keeps using high and xhigh, because much effort must be much good or so, and then they complain about the model over-engineering.
English
0
0
1
42
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
🤓After testing GPT-5.5 more I came to conclusion, that there is no point in using 5.3 codex or 5.4 anymore. Tomorrow, I will post an article about how to use GPT-5.5 effectively. Yes, you can use 5.5 on plus plan and it won't burn limits in minutes. Spoiler alert: low thinking is the winner here.
English
58
23
811
56.4K
Sam Altman
Sam Altman@sama·
feels like a good time to seriously rethink how operating systems and user interfaces are designed (also the internet; there should be a protocol that is equally usable by people and agents)
English
1.8K
785
12.5K
1.5M
Niko E.
Niko E.@nefthy·
@vivoplt GPT-5.4/5 is more methodical and better at following instructions through tasks with many steps. Token efficiency is a nice bonus.
English
0
0
0
30
Vivo
Vivo@vivoplt·
Are people switching from Claude Code to Codex just because of token efficiency, or is there more to it?
English
181
3
409
73.6K
Ihor Vorotnov 🇺🇦
Ihor Vorotnov 🇺🇦@ihorvorotnov·
@kentcdodds @peer_rich Maybe works in the US (still not sure the math works out though), paying ahead of schedule always makes sense - shortens the loan lifetime and overpayments even more. Any extra $ after main payment goes into reducing loan body. You can also refinance after first 5y and … 1/2
English
2
0
2
1.3K
Matt Pocock
Matt Pocock@mattpocockuk·
Of course, this is a design decision that can be argued both ways. If it didn't try to make it more AFK, it would lose the feeling of being 'auto mode' because folks would still need to keep checking on it. I just happen to prefer the side of the coin facing the table
English
6
0
58
11K
Matt Pocock
Matt Pocock@mattpocockuk·
I figured out what this was Turns out Auto Mode doesn't just handle permissions It also injects instructions into the system prompt to make it more AFK This is dumb, it shouldn't do that - it's messing with all my skills I guess that's the cost of not owning the whole flow
Matt Pocock@mattpocockuk

Starting to notice that even with /grill-me, Opus 4.7 w/ Claude Code jumps straight to implementation 😡 Just WAIT until we're aligned, silly harness

English
91
25
827
108.1K
Niko E.
Niko E.@nefthy·
@davis7 Tbh. 5.4 is pretty usable at medium for many tasks, and it does not over-engineer so much as high/xhigh.
English
0
0
0
82
Ben Davis
Ben Davis@davis7·
This is one of many reasons I hate the "5.5" name so much. Every single 5 series model has needed at least medium, usually high reasoning to function because the "base" model was pretty bad. Most of it's power came from the insane reasoning stuff they baked in Should you use GPT-5.5 on low reasoning for everything? No. But it's pretty damn close, like 90% of the time, and I'm gonna keep pushing this hard b/c the natural tendency of everyone (self included) is to use the "best" tool always Always try low first, if it can't do it then bump from there We now have a good base model, so reasoning often becomes an anti-pattern as the model gets too much time to over think itself into verbose complex solutions when it knew how to just do the thing u wanted Again, this is not a ".1" bump, this is an entirely new foundation and I'd argue series of models. I know they can't call it GPT-6 because if that model doesn't end up being what mythos is hyped to be and more the entire fucking economy dies, but it's definitely the beginning of GPT-6, much like what 4.5 was last year just way better I think that's the reasoning behind the ".5" name, it's halfway to the new series which would be fine if we didn't just do 5, 5.1, 5.2, 5.3, 5.4 so it feels like another increment. It's not. Another thing is speed, this thing is fast as hell with low reasoning and feels amazing. Highly recommend trying it out with pi, feels so good
Nathan Spencer@NateSpencerWx

@davis7 For everything?

English
26
11
381
43.5K
jon allie
jon allie@jonallie·
I think I must be lacking imagination, but I can't think of a single thing I'd want to do with openclaw. Admittedly, I've not looked deeply into what it can do, but none of the use cases I've seen have seemed compelling. I don't want to message my lights on telegram; I'm happy to compose my own emails; and I don't want a schedule that is so challenging that navigating my calendar requires machine intelligence. I'm bullish on agents for coding, but even in that narrow domain they make enough mistakes that I don't want a similar thing driving life decisions for me or having unfettered access to my data. The enthusiasm around them makes me wonder what I've missed.
English
48
0
76
9.8K
Niko E.
Niko E.@nefthy·
@mehulmpt I wonder how Opus 4.7 and GPT 5.5 medium would do in your benchmark.
English
0
0
1
18
Mehul Mohan
Mehul Mohan@mehulmpt·
Closing note: All these models sucked compared to what I could have done with the same prompt. It would have taken me probably a full day, however, without LLMs. There were a LOT of wins that could have been achieved by these models, but with an open-ended task, no single model was brave enough to implement bold changes (including Opus and GPT in their SOTA harnesses). LLMs are very good at instruction following, but they need a powerful master (YOU) to operate them. This made me realise how important it is for a developer to operate these tools for maximum efficiency. I feel it is very important to be a developer today, to build taste and opinions, probably more important than ever. LLMs are very powerful machines in the right hands, but you can't keep your eyes closed when operating them.
English
5
1
48
6.4K
Mehul Mohan
Mehul Mohan@mehulmpt·
I did one open-ended task in a real codebase with > DeepSeek v4 Pro > Kimi K2.6 > Opus 4.7 > GPT 5.5 I asked all of them to optimise a small code base as much as they can. It's a custom application that I use myself for managing my business inbox. Here are the results 👇
Mehul Mohan tweet media
English
18
2
266
37.2K
Niko E.
Niko E.@nefthy·
@paraddox @kelvinfichter The approach is only reasonable when you can't do a full refactor in one go. With agents those cases get rare. Keep bc to the edge of your codebase. To the parts that interface with other systems and services. And then only if necessary. BC is technical dept.
English
0
0
0
34
Ddox
Ddox@paraddox·
@kelvinfichter Honestly, in brownfield projects that are already in production, that is not a bad approach. In greenfield projects, just tell it to put in agents md to that you don't need backwards compatibility or fallback. That solved it for me.
English
6
1
125
11.6K
smartcontracts.eth
smartcontracts.eth@kelvinfichter·
> Codex: You're totally right, I went ahead and built that new script instead, I left the old script that I built 30 seconds ago as a legacy/compatibility layer in case any users might be using it
English
49
134
5.1K
209.3K
Niko E.
Niko E.@nefthy·
@kelvinfichter It helps to spell out what should and what shouldn't be kept backwards compatible in the AGENTS.md
English
0
0
0
171
Niko E.
Niko E.@nefthy·
@sdmat123 @deredleritt3r I just genuinely can't see the things that would benefit from pro in my day to day work. It seems to me there is a thin band of tasks, that gpt-5.5 can't do that pro can. And it's hard to judge whether a task is in that band or on the unsolvable side.
English
1
0
1
67
sdmat
sdmat@sdmat123·
With 5.4 there was a very clear cut difference in rigor/thoroughness and reliability that made it a win for a wide range of tasks. @deredleritt3r has written on this as well as doing interesting benchmarking for legal use cases: x.com/deredleritt3r/… My view is that 5.5 pro is useful and worth it if you can get mileage out of it for work, but the gap with the base model is narrower and that there is a ton of scope for prompting to improve performance of 5.5 in a way there wasn't with 5.4. Another consideration is that Codex is getting to be a fantastic harness for general use, that's only available with the base model (pro is web-only on sub).
prinz@deredleritt3r

Added to prinzbench: - GPT-5.5 Pro (Extended) - GPT-5.5 Thinking (Heavy) - Opus 4.7 - Meta Muse Spark Overall impressions from testing the models: 1. GPT-5.5 Pro scored slightly (3 points) better than GPT-5.4 Pro, including a solid improvement in Legal Research (by 4 points) and a slight decrease in Search (by 1 point). Overall score: 82/99. As noted elsewhere, this model is *significantly* faster than GPT-5.4 Pro; a question that took GPT-5.4 Pro ~30 minutes to answer takes GPT-5.5 Pro ~8 minutes. It's a good model! We have now reached the point where I am surprised if it does not answer a question correctly. 2. GPT-5.5 Thinking (Heavy) is the star of the show, scoring a full 5 points higher than GPT-5.4 (xhigh) and a full 6 points higher than GPT-5.4 Thinking (Heavy). A big jump in Legal Research (+6 points vs. GPT-5.4 (xhigh) is once again offset here by a slight decrease in Search (-1 point vs. GPT-5.4 (xhigh)). Overall score: 74/99. As with Pro, this model is *significantly* faster than GPT-5.4 Thinking (Heavy); a question that took GPT-5.4 ~8-10 minutes to answer takes GPT-5.5 Thinking ~2 minutes. 3. Opus 4.7 started off really well, and I even thought at one point that it might match the performance of Gemini 3 Pro, but... it trailed off in the end. Overall score: 25/99. This is a significantly better performance than that achieved by any other Anthropic model on my benchmark to date (e.g., 6 points higher than Opus 4.6), but Opus 4.7 still significantly trails many other models released over the past 6 months. On the bright side, the model's Search score (4/24) is significantly better than the usual 1/24 or 0/24 that I typically get from Anthropic models. Some further improvement in search capabilities might unlock performance approximately equivalent to that of Gemini 3 Pro for this model. 4. Meta Muse Spark achieved a very unspectacular score of 31/99. Not quite as good as Gemini 3, not quite as good as Kimi K-2.5 Thinking. This model is nothing to write home about. More details in the link below. Please see footnote 1 in particular, which talks about my participation in OpenAI's early access program for GPT-5.5.

English
1
0
4
633
sdmat
sdmat@sdmat123·
Thoughts on GPT 5.5 after a couple of days of use: - A big step up in fundamental capabilities and a step down in post-training polish, a little like going from working with an experienced colleague to a prodigy a couple of years into their career - Mixed feelings on 5.5 pro, the speed is amazing and results are good but it lacks the rigor and hyper-autistic attention to detail that made 5.4 pro exceptional for hard tasks - At a base level 5.5 is a great model to work with, better personality and style than 5.4 together with superior common sense and general understanding. Big model smell. - Performance ceiling is sky-high but you need to put in significant work to approach it due to the limited post-training - This often manifests as a counterintuitive split where the model will explain the perfect approach for X when asked but won't proactively think it through when X comes up in the course of a task - Otherwise complex instruction following and metacognition are dramatically better - It's worth revisiting prompt engineering concepts that advanced post-training rendered irrelevant and making explicit process and allocation of effort for hard tasks - Self-supervision also works well, e.g. managing well-scoped subagents Fully expect 5.6 in a month or two to round out the post-training and deliver autopilot on hard tasks. Overall: fantastic!
English
16
18
390
36.4K