massa

79 posts

massa

@tycnio

Formal Proofs For Verifiability

Katılım Kasım 2025

41 Takip Edilen4 Takipçiler

massa@tycnio·5d

@robinebers opus 4.6 is severely misaligned

English

Robin Ebers | AI Coach for Founders@robinebers·5d

when I say Claude Opus 4.6 is behind, that's what I mean this model is truly the laziest of the bunch Codex would never

Robin Ebers | AI Coach for Founders tweet media

English

3.3K

massa@tycnio·6d

claude is severely misaligned

English

massa@tycnio·21 Mar

@VictorTaelin opus 4.6 is EVIL x.com/tycnio/status/…

massa@tycnio

opus 4.6 is fucked #issue-4111985825" target="_blank" rel="nofollow noopener">github.com/anthropics/cla…

English

157

Taelin@VictorTaelin·21 Mar

Seems like there is a bug on Anthropic's API, affecting all models, where the request will stream the final visible token quickly, then just hang for ~5 more seconds before sending message_stop and actually closing. Repro: gist.github.com/VictorTaelin/6…

English

5.8K

massa@tycnio·21 Mar

opus 4.6 is fucked #issue-4111985825" target="_blank" rel="nofollow noopener">github.com/anthropics/cla…

English

189

massa@tycnio·19 Mar

@thsottiaux @steipete @Dimillian

QAM

massa@tycnio·19 Mar

The image speaks for itself. For Quality: Codex & ChatGPT > Claude Code I asked each to review plans. `which is likely to get me to the goal? and the goal is an extremely high-end interface... not buildable by regular LLM training distrubition... see for yourself

English

massa@tycnio·18 Mar

@gabriel1 yeh that does not actually work.

English

gabriel@gabriel1·18 Mar

i find great success with standalone "cleanup prompts" to my prs. you can stuff every rule you got into agents.md, but beautiful code is secondary until it's the only focus for example: "simplify all code so it's extremely easy to consume, remove not strictly necessary code"

English

359

19.8K

massa@tycnio·17 Mar

okay this solves a lot of problems and is actually amazing. thank you @thsottiaux

OpenAI Developers@OpenAIDevs

Subagents are now available in Codex. You can accelerate your workflow by spinning up specialized agents to: • Keep your main context window clean • Tackle different parts of a task in parallel • Steer individual agents as work unfolds

English

massa@tycnio·16 Mar

@JaredOfAI @trq212 thats a YOU problem id do the same at 106 . i dont blame him

English

Jared.W@JaredOfAI·16 Mar

@trq212 My Claude code 1m opus 6.4 rejected to create more sub-agent in our conversation after creating ~120 sub agents with 64% used context window , is this a feature or bug?

English

2.4K

Thariq@trq212·16 Mar

we need a better word than vibe coding man, Claude can create the most beautiful things

English

279

193

4.9K

279K

massa@tycnio·16 Mar

@JaredOfAI @trq212 loolllll

English

massa@tycnio·16 Mar

@trq212 Why is it okay, though, that they ruin the codebase most of the time?

English

massa@tycnio·16 Mar

@jorgemanru both will poison yur codebase

English

Jorge Manrubia@jorgemanru·16 Mar

My very subjective perception from the last weeks: - Claude was ahead of Codex. - Codex suddenly became as good as Claude, sometimes better and faster. - Overnight, Claude is substantially ahead of Codex again, both in speed and performance.

English

105

19.9K

massa@tycnio·16 Mar

@rezoundous they are both terrible

English

Tyler@rezoundous·16 Mar

Claude was getting stuck on this problem for 1 hour, and Codex just 1 shotted it. I could've sworn it would've been the other way round not too long ago.

English

3.6K

massa@tycnio·16 Mar

How is it that no one is addressing the escalating situation involving Codex and Claude? They are actively poisoning code bases & wasting ur time. Insane. I let them code today on their own for 10 minutes & they fucked up an entire weeks work. Thank god for git.

English

massa retweetledi

Deep Thrill@DeeperThrill·13 Mar

Codex wastes so much time and effort and creates so much unnecessary code for "migrations" and "backwards compatibility" and "regression testing" and "exception catching" when it's unnecessary and adds a lot of code debt. I tried using only codex with 5.4 for a new feature because so many people were posting on X about how "real coders use Codex not Claude!" and I just find it full of slop, bloat, useless "safety" checks, loose typing, and just bad coding practices. I don't know, maybe gpt 5.4 can write a CUDA kernel better than Opus 4.6 or something, but really I find codex nigh unusable. Claude Opus just gives me only what I ask for, and it's much better at using cli tools.

English

5.7K

massa@tycnio·16 Mar

@big_duca these agents claude code and chatgpt suck

English

Duca@big_duca·15 Mar

We have AGI (for coding). And yet so much software is still so damn buggy. (including my own startup) Why?

English

9.9K

massa@tycnio·15 Mar

@thsottiaux ok but gpt 5.4 fails all my benchmarks. pls git --reset first

English

Tibo@thsottiaux·15 Mar

Working at OpenAI is fun because questioning everything and taking risks is part of the culture. Within Codex, the team asks itself how we could make it an order of magnitude better every few months and then sets most things aside to go and do it across the entire stack. Some examples were the Codex App and our first deployment of Cerebras inference with WebSockets. We are now well under way on the next bet and it’s making even our best engineers nervous as it’s at the edge of what’s possible today.

English

243

2.5K

301.2K

massa@tycnio·14 Mar

@iannuttall the models been off since its been out its a refression

English

Ian Nuttall@iannuttall·13 Mar

anybody else felt like gpt 5.4 has been a bit "off" today? just feel like I am fighting with it on things it had no issue with before like building my chrome extensions for local vs prod, working with conductor ports, etc

English

11.2K

massa@tycnio·14 Mar

@thsottiaux @DavidOndrej1 gpt 5.4 is a regression though tibo .... whats up with that

English

310

Tibo@thsottiaux·14 Mar

@DavidOndrej1 Smoking incredible code I would walk a mile for code from GPT-5.4

English

502

19.7K

David Ondrej@DavidOndrej1·13 Mar

GPT 5.4 *is not* better than Opus 4.6 i have no idea what people are smoking

English

179

928

134.5K

massa retweetledi

Ivan Davila@ivangdavila·13 Mar

@NickADobos Also with subagents: - Codex: yeah, I’ll spawn agents in batches and let you know when everything’s done - Me (40 min later): did they finish? - Codex: you’re right, I should have spawned the agents. I will do it now

English

829

Keşfet

@robinebers @VictorTaelin @thsottiaux @steipete @Dimillian @gabriel1 @JaredOfAI @trq212