massa

79 posts

massa banner
massa

massa

@tycnio

Formal Proofs For Verifiability

Katılım Kasım 2025
41 Takip Edilen4 Takipçiler
Robin Ebers | AI Coach for Founders
when I say Claude Opus 4.6 is behind, that's what I mean this model is truly the laziest of the bunch Codex would never
Robin Ebers | AI Coach for Founders tweet media
English
18
0
38
3.3K
massa
massa@tycnio·
claude is severely misaligned
English
0
0
0
0
Taelin
Taelin@VictorTaelin·
Seems like there is a bug on Anthropic's API, affecting all models, where the request will stream the final visible token quickly, then just hang for ~5 more seconds before sending message_stop and actually closing. Repro: gist.github.com/VictorTaelin/6…
English
3
2
44
5.8K
massa
massa@tycnio·
opus 4.6 is fucked #issue-4111985825" target="_blank" rel="nofollow noopener">github.com/anthropics/cla…
massa tweet mediamassa tweet mediamassa tweet media
English
0
0
0
189
massa
massa@tycnio·
The image speaks for itself. For Quality: Codex & ChatGPT > Claude Code I asked each to review plans. `which is likely to get me to the goal? and the goal is an extremely high-end interface... not buildable by regular LLM training distrubition... see for yourself
massa tweet media
English
1
0
0
37
massa
massa@tycnio·
@gabriel1 yeh that does not actually work.
English
0
0
0
14
gabriel
gabriel@gabriel1·
i find great success with standalone "cleanup prompts" to my prs. you can stuff every rule you got into agents.md, but beautiful code is secondary until it's the only focus for example: "simplify all code so it's extremely easy to consume, remove not strictly necessary code"
English
23
5
359
19.8K
massa
massa@tycnio·
@JaredOfAI @trq212 thats a YOU problem id do the same at 106 . i dont blame him
English
0
0
0
76
Jared.W
Jared.W@JaredOfAI·
@trq212 My Claude code 1m opus 6.4 rejected to create more sub-agent in our conversation after creating ~120 sub agents with 64% used context window , is this a feature or bug?
English
2
0
1
2.4K
Thariq
Thariq@trq212·
we need a better word than vibe coding man, Claude can create the most beautiful things
Thariq tweet media
English
279
193
4.9K
279K
massa
massa@tycnio·
@trq212 Why is it okay, though, that they ruin the codebase most of the time?
English
0
0
0
48
Jorge Manrubia
Jorge Manrubia@jorgemanru·
My very subjective perception from the last weeks: - Claude was ahead of Codex. - Codex suddenly became as good as Claude, sometimes better and faster. - Overnight, Claude is substantially ahead of Codex again, both in speed and performance.
English
27
1
105
19.9K
Tyler
Tyler@rezoundous·
Claude was getting stuck on this problem for 1 hour, and Codex just 1 shotted it. I could've sworn it would've been the other way round not too long ago.
English
33
2
66
3.6K
massa
massa@tycnio·
How is it that no one is addressing the escalating situation involving Codex and Claude? They are actively poisoning code bases & wasting ur time. Insane. I let them code today on their own for 10 minutes & they fucked up an entire weeks work. Thank god for git.
massa tweet media
English
0
0
0
23
massa retweetledi
Deep Thrill
Deep Thrill@DeeperThrill·
Codex wastes so much time and effort and creates so much unnecessary code for "migrations" and "backwards compatibility" and "regression testing" and "exception catching" when it's unnecessary and adds a lot of code debt. I tried using only codex with 5.4 for a new feature because so many people were posting on X about how "real coders use Codex not Claude!" and I just find it full of slop, bloat, useless "safety" checks, loose typing, and just bad coding practices. I don't know, maybe gpt 5.4 can write a CUDA kernel better than Opus 4.6 or something, but really I find codex nigh unusable. Claude Opus just gives me only what I ask for, and it's much better at using cli tools.
English
12
3
38
5.7K
massa
massa@tycnio·
@big_duca these agents claude code and chatgpt suck
English
0
0
0
9
Duca
Duca@big_duca·
We have AGI (for coding). And yet so much software is still so damn buggy. (including my own startup) Why?
English
98
4
62
9.9K
massa
massa@tycnio·
@thsottiaux ok but gpt 5.4 fails all my benchmarks. pls git --reset first
English
0
0
4
81
Tibo
Tibo@thsottiaux·
Working at OpenAI is fun because questioning everything and taking risks is part of the culture. Within Codex, the team asks itself how we could make it an order of magnitude better every few months and then sets most things aside to go and do it across the entire stack. Some examples were the Codex App and our first deployment of Cerebras inference with WebSockets. We are now well under way on the next bet and it’s making even our best engineers nervous as it’s at the edge of what’s possible today.
English
243
75
2.5K
301.2K
massa
massa@tycnio·
@iannuttall the models been off since its been out its a refression
English
0
0
0
8
Ian Nuttall
Ian Nuttall@iannuttall·
anybody else felt like gpt 5.4 has been a bit "off" today? just feel like I am fighting with it on things it had no issue with before like building my chrome extensions for local vs prod, working with conductor ports, etc
English
38
0
29
11.2K
Tibo
Tibo@thsottiaux·
@DavidOndrej1 Smoking incredible code I would walk a mile for code from GPT-5.4
English
24
5
502
19.7K
David Ondrej
David Ondrej@DavidOndrej1·
GPT 5.4 *is not* better than Opus 4.6 i have no idea what people are smoking
English
179
15
928
134.5K
massa retweetledi
Ivan Davila
Ivan Davila@ivangdavila·
@NickADobos Also with subagents: - Codex: yeah, I’ll spawn agents in batches and let you know when everything’s done - Me (40 min later): did they finish? - Codex: you’re right, I should have spawned the agents. I will do it now
English
1
1
8
829