heruvim

93 posts

heruvim

@__maicl

Typescript, Python, Golang (+AWS)

London, England Katılım Mayıs 2019

52 Takip Edilen104 Takipçiler

heruvim@__maicl·6d

@VictorTaelin @Vinibi90 It's very interesting though, because most people in my timeline fanboy one of the two and all conversions I've seen have been from Claude to GPT I guess, after seeing Opus < GPT output couple of times, I gave up and never thought to tell it "don't be lazy", assuming skill issue

English

696

Taelin@VictorTaelin·6d

@Vinibi90 that's not true, I said 5.4 was better initially, then flipped my position once never flipped it again

English

3.7K

Taelin@VictorTaelin·6d

My final thoughts on Opus 4.6: why this model is so good, why I underestimated it, and why I'm so obsessed about Mythos. When I first tested GPT 5.4 vs Opus 4.6 - both launched at roughly the same time - I was initially convinced that GPT 5.4 was vastly superior, because it did better on my logical tests. That's still true: given the same prompt, by default, GPT will be more competent, careful, and produce a more reliable output, while Opus will give you a half-assed, buggy solution, and call it a day. Now, here's what I failed to realize: Opus bad outputs are not because it is dumb. They're because it is a lazy cheater. And you can tell because, if you just go ahead and tell it: "you did X in a lazy way, do it in the right way now" And if you show that this is serious, it will proceed to do a flawless job. That doesn't happen with dumber models. And, the more I work with Opus, the more I realize that, if you just keep pushing it, its intelligence ceiling is much, much higher than it seems. It IS there, you just need to be patient and push it. GPT, on the other hands, when it fails, it already did its best, so, pushing it further will give you no added results. That is also one of the reasons that benchmarks lie. When Claude and GPT score the same in a given benchmark, it is likely that Claude is actually smarter, because it puts less effort. Now, consider that for a moment, and remember that Mythos is outperforming GPT 5.4 *Pro* on benchmarks. How insane that is? Remember that Sonnet 3.5 lagged behind on benchmarks, yet everyone knew that it was superior to 4o. I think it is this effect at play: for whatever reason, Claude-series model "try less hard" on the first shot. Because of that, even if Spud gets close to Mythos on benchmarks (which I predict will be the case), I suppose Mythos will still be superior. This also leads me to wonder if perhaps Anthropic actually has a real lead over OpenAI, that will only get larger? I could totally see a timeline where Anthropic's models become so good that OpenAI simply fails to catch up as the recursive improvement unfolds? Just my silly thoughts though, what do I know As always I could be wrong, and I hope I am!!

English

154

1.5K

176.5K

heruvim@__maicl·24 Mar

@VictorTaelin So, the trick is to babysit Opus and then it's equal (or better) than 5.4?

English

370

Taelin@VictorTaelin·24 Mar

also a last note before I sleep: I indeed misjudged opus so much initially, and I enjoy it so much right now. I think my first impression was bad because it stops working too early even when things are still in a broken state, while gpt is way more cautious, so it feels like it is dumb. but it is not dumb, at all. the work it has done on this session was truly incredible. all models are really good right now but I think opus is becoming my default choice, perhaps with gpt 5.4 + gemini 3.1 for reviewing its commits 🤔

Taelin@VictorTaelin

more of the same - after begging opus in every way possible to optimize the bend's checker ("make it fast, fix quadratic blowups, think hard pls"), there was zero improvement so I decided to babysit it. i was giving the instructions, it was doing the boring work. i asked it to measure stuff, break timings down, dissect the code exactly how i would 2 hours later: the checker is now ~10x faster so, as of march 2026, and I don't like that, automated research with AI *still* sucks, but a human domain expert using it to empower himself can achieve great things below is the summary of this chat! good night

English

120

13.7K

heruvim@__maicl·12 Mar

@effectfully "Simply telling the agent what to grep for", I don't get it. Do you tell it the files you'd like it to look into or the lines of the files too or the function names that are relevant? That could be easy if your prompt is very targeted, but not for investigative ones, right?

English

338

effectfully@effectfully·12 Mar

The biggest factor causing context pollution I've observed so far is the agent grepping for overly general strings like "function". Moreover, the agent realizes that it causes context pollution and so it adds stuff like `head -n 200` to read only the first 200 lines. Making it all even worse, because now you get context pollution + relevant results beyond the first 200 lines not being processed at all. At 4x token usage. Simply telling the agent what to grep for when starting on the task so the context is properly initialized has basically solved this problem.

effectfully@effectfully

I swear vibe coders are only capable of keeping up with AI's bullshit because they have no idea what's going on. Told GPT-5.4 to narrow its `sed` windows so it stops wasting context. It stopped using `sed` altogether and started `cat`ing whole files or using `awk` with a large window. input_tokens jumped by 78%. I asked "what the fuck?" -- it said that rules around the use of a tool make it not willing to use the tool, because it's now more complicated and it doesn't wanna risk accidentally violating the rule. Imaging dealing with this shit if you're coding illiterate.

English

147

14.3K

heruvim@__maicl·6 Şub

De-escalation skills are a must in corporate

English

heruvim@__maicl·31 Oca

@IterIntellectus Agreed. How do you move out of the cities if that's still where young, high-intellect people surround you? Remote solves the money issue, but similar humans to build community and family away from cities might be scarce

English

1.1K

vittorio@IterIntellectus·31 Oca

x.com/i/article/2017…

ZXX

520

160.1K

heruvim@__maicl·14 Oca

1. Like the post (assume true) 2. Check notifications for community notes 3. Mute/Block the account that misinformed you 4. Enjoy more truth in your timeline

English

heruvim@__maicl·25 Kas

@adonis_singh Or read it: dwarkesh.com/p/ilya-sutskev…

English

113

adi@adonis_singh·25 Kas

drop everything and watch this

English

134

7.8K

heruvim@__maicl·19 Kas

I asked the LLM why it gave me code that fails in some edge cases, response was "Because it is consistent with current codebase patterns" We are being roasted by bots now..

English

heruvim@__maicl·5 Kas

@DrJSchulte @cursor_ai Can we no longer include the git diff for the working directory or specific commits? It seems only diff with main is allowed now, which is completely useless for me

English

Dr. Jonathan Schulte@DrJSchulte·4 Kas

@cursor_ai Is active Tab still used as context? And can is still Not See which rules are used in the context?

English

Cursor@cursor_ai·4 Kas

We've shipped several quality-of-life improvements to Cursor! Details below...

English

1.8K

494.7K

heruvim@__maicl·1 Kas

Noone holds more power over you than your upstairs neighbor

English

heruvim@__maicl·27 Eki

Started responding in meetings with: "Excellent question, you are absolutely right to..."

English

heruvim@__maicl·17 Eki

"That's nothing When I was your age I had to write HR reviews without AI. Go play now"

English

heruvim@__maicl·7 Eki

It seems it never really "goes without saying,..."

English

heruvim@__maicl·3 Eki

Nothing comes close btw

English

heruvim@__maicl·3 Eki

Tried ChatGPT 5 Pro on a difficult task It's smarter than 99% of the population, superintelligence is imminent

English

heruvim@__maicl·23 Eyl

Gemini 2.5 Pro just adds unnecessary complexity to everything. The overthinking model

English

heruvim@__maicl·22 Eyl

How many lives is ChatGPT saving daily by starting with: Excellent question ✅️ ...

English

heruvim@__maicl·19 Eyl

A year ago I needed a thinking model for most cases. Now, non-thinking ChatGPT-5 is usually enough. Progress is fast

English

heruvim@__maicl·15 Eyl

ChatGPT 5 thinking model feels eerily smarter than all else

English

heruvim@__maicl·12 Eyl

Payday reminds you of how fast time passes

English

Keşfet

@VictorTaelin @Vinibi90 @effectfully @IterIntellectus @adonis_singh @DrJSchulte @cursor_ai @elonmusk