Niko E.

4.8K posts

Niko E.

@nefthy

Erde Katılım Nisan 2010

511 Takip Edilen38 Takipçiler

Niko E.@nefthy·7h

@vaaselene Not for coding, but for: - extracting json from text - works knowledge inquiries - project brainstorming - ocr

English

Selene@vaaselene·1d

is anyone using Gemini?

English

853

132.1K

Niko E.@nefthy·1d

@kr0der you are missing out on the amazing speed of low. And its good. Very resonable default.

English

Anthony Kroeger@kr0der·1d

i started using GPT 5.5 High w/o fast mode and it feels just as fast as 5.4 High w/ fast mode also i'm sticking with High rather than Low because i don't like swapping reasoning constantly high is just the perfect tradeoff between intelligence, speed, and cost

English

5.3K

Niko E.@nefthy·1d

@A_Damn_Love_520 @victori84819871 @nicomuellerAT @thsottiaux low is goat in 5.5

English

BlueG@A_Damn_Love_520·3d

@victori84819871 @nicomuellerAT @thsottiaux Honestly, lowering the thinking level kinda makes 5.5 feel dumb 😂 The main issue is that even with 5.4 Fast on xhigh, it never burned through my weekly limit this quickly.

English

Tibo@thsottiaux·3d

Oh no, not you GPT-5.5

English

1.5K

123.6K

Niko E.@nefthy·2d

@enpassantguy @gdb @romainhuet Try gpt-5.5 with low reasoning effort. It will surprise you, how good and fast it is.

English

EnPassantGuy@enpassantguy·2d

@gdb @romainhuet Not as well as I had hoped tbh. I can’t tell a difference in my development job between 5.5 and 5.4. Sadly been using Opus 4.7 as a result, which has been really good.

English

346

Greg Brockman@gdb·2d

how is gpt-5.5 performing for you?

VraserX e/acc@VraserX

GPT-5.5 feels insane so far. What I love most is that it is blazing fast, but still clearly smarter than GPT-5.4. In my own tests, it feels around 2x to 5x faster depending on the task. What’s your experience so far?

English

452

1.1K

152.1K

Niko E.@nefthy·2d

@VraserX On a few comparisons I did it is meaningfully faster than 5.4 but rather 20% or so. I'm testing low as my new default effort level and it is very good so far.

English

108

VraserX e/acc@VraserX·2d

English

376

158.2K

Niko E.@nefthy·2d

@s_streichsbier @bygregorr It is also pretty usable on low reasoning for most day-to-day tasks.

English

Stefan Streichsbier@s_streichsbier·2d

@bygregorr yep, makes the API cost difference reasonable. You end up paying roughly the same for a better and faster result.

English

2.5K

Stefan Streichsbier@s_streichsbier·2d

I've completely changed my mind about 5.4 vs 5.5. Gave them the exact same task to investigate a fairly tricky bug. GPT-5.5 identified the bug and proposed a fix in 6m 59s using 117k tokens. GPT-5.4 took 8m 51s using 201k tokens, but it didn't find the bug and is asking for more information to investigate. Call me impressed.

Stefan Streichsbier@s_streichsbier

> but if there’s a third strike, I’m switching back to 5.4. I was wrong, let me explain why. GPT-5.5 is not a minor version bump of the same model. GPT-5.5 is based on a new, fully retrained base model. Yes, OpenAI is bad at naming and versioning. We know. But that changes how I think about the regressions. Some things got worse. Some behaviors are sharper than they should be. Some workflows need new guardrails. But switching back to 5.4 would avoid the learning curve. The real work is understanding the new model: where it overreaches, where it needs tighter instructions, where old workflows break, and where it is genuinely better. Because 5.6 or 6 will not be “5.4, but improved.” It will build on this new foundation. So I’d rather learn the new operating model now than cling to the old one until it disappears.

English

815

482.5K

Niko E.@nefthy·2d

@IceSolst One of the benefits of reviews is that you have two people who have seen each piece of code. If you don't need that, I envy the simplicity and scale, you are dealing with.

English

solst/ICE of Astarte@IceSolst·3d

You don’t: manually reviewing PRs is an archaic practice that was barely reliable There needs to be design docs that are mapped to the code, and a bunch of automations ensure the assumptions hold You review the design, not the syntactical details of code

Michael@michael_chomsky

Ok how tf do I review this PR?

English

187

60.8K

Niko E.@nefthy·3d

@ihorvorotnov @tymzap @kentcdodds @peer_rich That depends on the risks assessments you make and that might be the right choice for you and your situation. I disagree that it is *always* the right choice.

English

Ihor Vorotnov 🇺🇦@ihorvorotnov·3d

@nefthy @tymzap @kentcdodds @peer_rich There’s a difference between losing your money and being left with nothing, and paying off early and owning your place. These trade-offs aren’t comparable.

English

Peer Richelsen@peer_rich·5d

banks dont need to tell you if you use your brain at least once

le.hl@0xleegenz

Realizing a 30 year mortgage doesn’t actually mean 30 years: - 1 extra monthly payment per year can cut 5 years off - 2 extra payments per year can cut about 8 years off - 3 extra payments per year can cut about 11 years off Bank won't tell you this

English

100

48.2K

Niko E.@nefthy·3d

@llmdevguy @buildwithparas I agree, yet everybody keeps using high and xhigh, because much effort must be much good or so, and then they complain about the model over-engineering.

English

Mateusz Mirkowski@llmdevguy·3d

@buildwithparas In gpt 5.4 medium was the most efficient thinking level.

English

Mateusz Mirkowski@llmdevguy·3d

🤓After testing GPT-5.5 more I came to conclusion, that there is no point in using 5.3 codex or 5.4 anymore. Tomorrow, I will post an article about how to use GPT-5.5 effectively. Yes, you can use 5.5 on plus plan and it won't burn limits in minutes. Spoiler alert: low thinking is the winner here.

English

811

56.4K

Niko E.@nefthy·3d

@sama It's called HTTP, bro.

English

Sam Altman@sama·3d

feels like a good time to seriously rethink how operating systems and user interfaces are designed (also the internet; there should be a protocol that is equally usable by people and agents)

English

1.8K

785

12.5K

1.5M

Niko E.@nefthy·3d

@tymzap @ihorvorotnov @kentcdodds @peer_rich It is a trade-off. Neither choice is the best in 100% of the cases.

English

Tymek Zapała@tymzap·3d

@nefthy @ihorvorotnov @kentcdodds @peer_rich Those opportunities bring risk of losing the money, as opposed to paying ahead of schedule which is lower profit but also much lower risk

English

Niko E.@nefthy·3d

@vivoplt GPT-5.4/5 is more methodical and better at following instructions through tasks with many steps. Token efficiency is a nice bonus.

English

Vivo@vivoplt·3d

Are people switching from Claude Code to Codex just because of token efficiency, or is there more to it?

English

181

409

73.6K

Niko E.@nefthy·3d

@ihorvorotnov @kentcdodds @peer_rich It does not always make sense. There are opportunity costs to paying ahead of time.

English

Ihor Vorotnov 🇺🇦@ihorvorotnov·3d

@kentcdodds @peer_rich Maybe works in the US (still not sure the math works out though), paying ahead of schedule always makes sense - shortens the loan lifetime and overpayments even more. Any extra $ after main payment goes into reducing loan body. You can also refinance after first 5y and … 1/2

English

1.3K

Niko E.@nefthy·3d

@mattpocockuk But it's unexpected.

English

Matt Pocock@mattpocockuk·3d

Of course, this is a design decision that can be argued both ways. If it didn't try to make it more AFK, it would lose the feeling of being 'auto mode' because folks would still need to keep checking on it. I just happen to prefer the side of the coin facing the table

English

11K

Matt Pocock@mattpocockuk·3d

I figured out what this was Turns out Auto Mode doesn't just handle permissions It also injects instructions into the system prompt to make it more AFK This is dumb, it shouldn't do that - it's messing with all my skills I guess that's the cost of not owning the whole flow

Matt Pocock@mattpocockuk

Starting to notice that even with /grill-me, Opus 4.7 w/ Claude Code jumps straight to implementation 😡 Just WAIT until we're aligned, silly harness

English

827

108.1K

Niko E.@nefthy·3d

@davis7 Tbh. 5.4 is pretty usable at medium for many tasks, and it does not over-engineer so much as high/xhigh.

English

Ben Davis@davis7·4d

This is one of many reasons I hate the "5.5" name so much. Every single 5 series model has needed at least medium, usually high reasoning to function because the "base" model was pretty bad. Most of it's power came from the insane reasoning stuff they baked in Should you use GPT-5.5 on low reasoning for everything? No. But it's pretty damn close, like 90% of the time, and I'm gonna keep pushing this hard b/c the natural tendency of everyone (self included) is to use the "best" tool always Always try low first, if it can't do it then bump from there We now have a good base model, so reasoning often becomes an anti-pattern as the model gets too much time to over think itself into verbose complex solutions when it knew how to just do the thing u wanted Again, this is not a ".1" bump, this is an entirely new foundation and I'd argue series of models. I know they can't call it GPT-6 because if that model doesn't end up being what mythos is hyped to be and more the entire fucking economy dies, but it's definitely the beginning of GPT-6, much like what 4.5 was last year just way better I think that's the reasoning behind the ".5" name, it's halfway to the new series which would be fine if we didn't just do 5, 5.1, 5.2, 5.3, 5.4 so it feels like another increment. It's not. Another thing is speed, this thing is fast as hell with low reasoning and feels amazing. Highly recommend trying it out with pi, feels so good

Nathan Spencer@NateSpencerWx

@davis7 For everything?

English

381

43.5K

Niko E.@nefthy·3d

@jonallie AI secretary.

Português

jon allie@jonallie·4d

I think I must be lacking imagination, but I can't think of a single thing I'd want to do with openclaw. Admittedly, I've not looked deeply into what it can do, but none of the use cases I've seen have seemed compelling. I don't want to message my lights on telegram; I'm happy to compose my own emails; and I don't want a schedule that is so challenging that navigating my calendar requires machine intelligence. I'm bullish on agents for coding, but even in that narrow domain they make enough mistakes that I don't want a similar thing driving life decisions for me or having unfettered access to my data. The enthusiasm around them makes me wonder what I've missed.

English

9.8K

Niko E.@nefthy·3d

@mehulmpt I wonder how Opus 4.7 and GPT 5.5 medium would do in your benchmark.

English

Mehul Mohan@mehulmpt·4d

Closing note: All these models sucked compared to what I could have done with the same prompt. It would have taken me probably a full day, however, without LLMs. There were a LOT of wins that could have been achieved by these models, but with an open-ended task, no single model was brave enough to implement bold changes (including Opus and GPT in their SOTA harnesses). LLMs are very good at instruction following, but they need a powerful master (YOU) to operate them. This made me realise how important it is for a developer to operate these tools for maximum efficiency. I feel it is very important to be a developer today, to build taste and opinions, probably more important than ever. LLMs are very powerful machines in the right hands, but you can't keep your eyes closed when operating them.

English

6.4K

Mehul Mohan@mehulmpt·4d

I did one open-ended task in a real codebase with > DeepSeek v4 Pro > Kimi K2.6 > Opus 4.7 > GPT 5.5 I asked all of them to optimise a small code base as much as they can. It's a custom application that I use myself for managing my business inbox. Here are the results 👇

English

266

37.2K

Niko E.@nefthy·3d

@paraddox @kelvinfichter The approach is only reasonable when you can't do a full refactor in one go. With agents those cases get rare. Keep bc to the edge of your codebase. To the parts that interface with other systems and services. And then only if necessary. BC is technical dept.

English

Ddox@paraddox·4d

@kelvinfichter Honestly, in brownfield projects that are already in production, that is not a bad approach. In greenfield projects, just tell it to put in agents md to that you don't need backwards compatibility or fallback. That solved it for me.

English

125

11.6K

smartcontracts.eth@kelvinfichter·5d

> Codex: You're totally right, I went ahead and built that new script instead, I left the old script that I built 30 seconds ago as a legacy/compatibility layer in case any users might be using it

English

134

5.1K

209.3K

Niko E.@nefthy·3d

@kelvinfichter It helps to spell out what should and what shouldn't be kept backwards compatible in the AGENTS.md

English

171

Niko E.@nefthy·3d

@sdmat123 @deredleritt3r I just genuinely can't see the things that would benefit from pro in my day to day work. It seems to me there is a thin band of tasks, that gpt-5.5 can't do that pro can. And it's hard to judge whether a task is in that band or on the unsolvable side.

English

sdmat@sdmat123·3d

With 5.4 there was a very clear cut difference in rigor/thoroughness and reliability that made it a win for a wide range of tasks. @deredleritt3r has written on this as well as doing interesting benchmarking for legal use cases: x.com/deredleritt3r/… My view is that 5.5 pro is useful and worth it if you can get mileage out of it for work, but the gap with the base model is narrower and that there is a ton of scope for prompting to improve performance of 5.5 in a way there wasn't with 5.4. Another consideration is that Codex is getting to be a fantastic harness for general use, that's only available with the base model (pro is web-only on sub).

prinz@deredleritt3r

Added to prinzbench: - GPT-5.5 Pro (Extended) - GPT-5.5 Thinking (Heavy) - Opus 4.7 - Meta Muse Spark Overall impressions from testing the models: 1. GPT-5.5 Pro scored slightly (3 points) better than GPT-5.4 Pro, including a solid improvement in Legal Research (by 4 points) and a slight decrease in Search (by 1 point). Overall score: 82/99. As noted elsewhere, this model is *significantly* faster than GPT-5.4 Pro; a question that took GPT-5.4 Pro ~30 minutes to answer takes GPT-5.5 Pro ~8 minutes. It's a good model! We have now reached the point where I am surprised if it does not answer a question correctly. 2. GPT-5.5 Thinking (Heavy) is the star of the show, scoring a full 5 points higher than GPT-5.4 (xhigh) and a full 6 points higher than GPT-5.4 Thinking (Heavy). A big jump in Legal Research (+6 points vs. GPT-5.4 (xhigh) is once again offset here by a slight decrease in Search (-1 point vs. GPT-5.4 (xhigh)). Overall score: 74/99. As with Pro, this model is *significantly* faster than GPT-5.4 Thinking (Heavy); a question that took GPT-5.4 ~8-10 minutes to answer takes GPT-5.5 Thinking ~2 minutes. 3. Opus 4.7 started off really well, and I even thought at one point that it might match the performance of Gemini 3 Pro, but... it trailed off in the end. Overall score: 25/99. This is a significantly better performance than that achieved by any other Anthropic model on my benchmark to date (e.g., 6 points higher than Opus 4.6), but Opus 4.7 still significantly trails many other models released over the past 6 months. On the bright side, the model's Search score (4/24) is significantly better than the usual 1/24 or 0/24 that I typically get from Anthropic models. Some further improvement in search capabilities might unlock performance approximately equivalent to that of Gemini 3 Pro for this model. 4. Meta Muse Spark achieved a very unspectacular score of 31/99. Not quite as good as Gemini 3, not quite as good as Kimi K-2.5 Thinking. This model is nothing to write home about. More details in the link below. Please see footnote 1 in particular, which talks about my participation in OpenAI's early access program for GPT-5.5.

English

633

sdmat@sdmat123·4d

Thoughts on GPT 5.5 after a couple of days of use: - A big step up in fundamental capabilities and a step down in post-training polish, a little like going from working with an experienced colleague to a prodigy a couple of years into their career - Mixed feelings on 5.5 pro, the speed is amazing and results are good but it lacks the rigor and hyper-autistic attention to detail that made 5.4 pro exceptional for hard tasks - At a base level 5.5 is a great model to work with, better personality and style than 5.4 together with superior common sense and general understanding. Big model smell. - Performance ceiling is sky-high but you need to put in significant work to approach it due to the limited post-training - This often manifests as a counterintuitive split where the model will explain the perfect approach for X when asked but won't proactively think it through when X comes up in the course of a task - Otherwise complex instruction following and metacognition are dramatically better - It's worth revisiting prompt engineering concepts that advanced post-training rendered irrelevant and making explicit process and allocation of effort for hard tasks - Self-supervision also works well, e.g. managing well-scoped subagents Fully expect 5.6 in a month or two to round out the post-training and deliver autopilot on hard tasks. Overall: fantastic!

English

390

36.4K

Keşfet

@vaaselene @kr0der @A_Damn_Love_520 @victori84819871 @nicomuellerAT @thsottiaux @enpassantguy @gdb