heruvim

93 posts

heruvim

heruvim

@__maicl

Typescript, Python, Golang (+AWS)

London, England Katılım Mayıs 2019
52 Takip Edilen104 Takipçiler
heruvim
heruvim@__maicl·
@VictorTaelin @Vinibi90 It's very interesting though, because most people in my timeline fanboy one of the two and all conversions I've seen have been from Claude to GPT I guess, after seeing Opus < GPT output couple of times, I gave up and never thought to tell it "don't be lazy", assuming skill issue
English
1
0
5
696
Taelin
Taelin@VictorTaelin·
@Vinibi90 that's not true, I said 5.4 was better initially, then flipped my position once never flipped it again
English
1
0
25
3.7K
Taelin
Taelin@VictorTaelin·
My final thoughts on Opus 4.6: why this model is so good, why I underestimated it, and why I'm so obsessed about Mythos. When I first tested GPT 5.4 vs Opus 4.6 - both launched at roughly the same time - I was initially convinced that GPT 5.4 was vastly superior, because it did better on my logical tests. That's still true: given the same prompt, by default, GPT will be more competent, careful, and produce a more reliable output, while Opus will give you a half-assed, buggy solution, and call it a day. Now, here's what I failed to realize: Opus bad outputs are not because it is dumb. They're because it is a lazy cheater. And you can tell because, if you just go ahead and tell it: "you did X in a lazy way, do it in the right way now" And if you show that this is serious, it will proceed to do a flawless job. That doesn't happen with dumber models. And, the more I work with Opus, the more I realize that, if you just keep pushing it, its intelligence ceiling is much, much higher than it seems. It IS there, you just need to be patient and push it. GPT, on the other hands, when it fails, it already did its best, so, pushing it further will give you no added results. That is also one of the reasons that benchmarks lie. When Claude and GPT score the same in a given benchmark, it is likely that Claude is actually smarter, because it puts less effort. Now, consider that for a moment, and remember that Mythos is outperforming GPT 5.4 *Pro* on benchmarks. How insane that is? Remember that Sonnet 3.5 lagged behind on benchmarks, yet everyone knew that it was superior to 4o. I think it is this effect at play: for whatever reason, Claude-series model "try less hard" on the first shot. Because of that, even if Spud gets close to Mythos on benchmarks (which I predict will be the case), I suppose Mythos will still be superior. This also leads me to wonder if perhaps Anthropic actually has a real lead over OpenAI, that will only get larger? I could totally see a timeline where Anthropic's models become so good that OpenAI simply fails to catch up as the recursive improvement unfolds? Just my silly thoughts though, what do I know As always I could be wrong, and I hope I am!!
English
154
60
1.5K
176.5K
heruvim
heruvim@__maicl·
@VictorTaelin So, the trick is to babysit Opus and then it's equal (or better) than 5.4?
English
0
0
0
370
Taelin
Taelin@VictorTaelin·
also a last note before I sleep: I indeed misjudged opus so much initially, and I enjoy it so much right now. I think my first impression was bad because it stops working too early even when things are still in a broken state, while gpt is way more cautious, so it feels like it is dumb. but it is not dumb, at all. the work it has done on this session was truly incredible. all models are really good right now but I think opus is becoming my default choice, perhaps with gpt 5.4 + gemini 3.1 for reviewing its commits 🤔
Taelin@VictorTaelin

more of the same - after begging opus in every way possible to optimize the bend's checker ("make it fast, fix quadratic blowups, think hard pls"), there was zero improvement so I decided to babysit it. i was giving the instructions, it was doing the boring work. i asked it to measure stuff, break timings down, dissect the code exactly how i would 2 hours later: the checker is now ~10x faster so, as of march 2026, and I don't like that, automated research with AI *still* sucks, but a human domain expert using it to empower himself can achieve great things below is the summary of this chat! good night

English
6
1
120
13.7K
heruvim
heruvim@__maicl·
@effectfully "Simply telling the agent what to grep for", I don't get it. Do you tell it the files you'd like it to look into or the lines of the files too or the function names that are relevant? That could be easy if your prompt is very targeted, but not for investigative ones, right?
English
1
0
1
338
effectfully
effectfully@effectfully·
The biggest factor causing context pollution I've observed so far is the agent grepping for overly general strings like "function". Moreover, the agent realizes that it causes context pollution and so it adds stuff like `head -n 200` to read only the first 200 lines. Making it all even worse, because now you get context pollution + relevant results beyond the first 200 lines not being processed at all. At 4x token usage. Simply telling the agent what to grep for when starting on the task so the context is properly initialized has basically solved this problem.
effectfully@effectfully

I swear vibe coders are only capable of keeping up with AI's bullshit because they have no idea what's going on. Told GPT-5.4 to narrow its `sed` windows so it stops wasting context. It stopped using `sed` altogether and started `cat`ing whole files or using `awk` with a large window. input_tokens jumped by 78%. I asked "what the fuck?" -- it said that rules around the use of a tool make it not willing to use the tool, because it's now more complicated and it doesn't wanna risk accidentally violating the rule. Imaging dealing with this shit if you're coding illiterate.

English
7
3
147
14.3K
heruvim
heruvim@__maicl·
De-escalation skills are a must in corporate
English
0
0
0
6
heruvim
heruvim@__maicl·
@IterIntellectus Agreed. How do you move out of the cities if that's still where young, high-intellect people surround you? Remote solves the money issue, but similar humans to build community and family away from cities might be scarce
English
1
0
3
1.1K
heruvim
heruvim@__maicl·
1. Like the post (assume true) 2. Check notifications for community notes 3. Mute/Block the account that misinformed you 4. Enjoy more truth in your timeline
English
0
0
0
14
adi
adi@adonis_singh·
drop everything and watch this
adi tweet media
English
8
3
134
7.8K
heruvim
heruvim@__maicl·
I asked the LLM why it gave me code that fails in some edge cases, response was "Because it is consistent with current codebase patterns" We are being roasted by bots now..
English
0
0
1
23
heruvim
heruvim@__maicl·
@DrJSchulte @cursor_ai Can we no longer include the git diff for the working directory or specific commits? It seems only diff with main is allowed now, which is completely useless for me
English
0
0
0
45
Dr. Jonathan Schulte
Dr. Jonathan Schulte@DrJSchulte·
@cursor_ai Is active Tab still used as context? And can is still Not See which rules are used in the context?
English
2
0
2
2K
Cursor
Cursor@cursor_ai·
We've shipped several quality-of-life improvements to Cursor! Details below...
English
96
84
1.8K
494.7K
heruvim
heruvim@__maicl·
Noone holds more power over you than your upstairs neighbor
English
0
0
0
24
heruvim
heruvim@__maicl·
Started responding in meetings with: "Excellent question, you are absolutely right to..."
English
0
0
0
21
heruvim
heruvim@__maicl·
"That's nothing When I was your age I had to write HR reviews without AI. Go play now"
English
0
0
0
27
heruvim
heruvim@__maicl·
It seems it never really "goes without saying,..."
English
0
0
0
18
heruvim
heruvim@__maicl·
Nothing comes close btw
English
0
0
0
15
heruvim
heruvim@__maicl·
Tried ChatGPT 5 Pro on a difficult task It's smarter than 99% of the population, superintelligence is imminent
English
1
0
0
16
heruvim
heruvim@__maicl·
Gemini 2.5 Pro just adds unnecessary complexity to everything. The overthinking model
English
0
0
0
17
heruvim
heruvim@__maicl·
How many lives is ChatGPT saving daily by starting with: Excellent question ✅️ ...
English
0
0
0
7
heruvim
heruvim@__maicl·
A year ago I needed a thinking model for most cases. Now, non-thinking ChatGPT-5 is usually enough. Progress is fast
English
0
0
0
23
heruvim
heruvim@__maicl·
ChatGPT 5 thinking model feels eerily smarter than all else
English
0
0
0
29
heruvim
heruvim@__maicl·
Payday reminds you of how fast time passes
English
0
0
0
9