Patrick Bade

2K posts

Patrick Bade

@nishffx

building AI quant trading systems, sharing thoughts on coding agents & models, indie game dev when I have the time, running @FireflyTA

Katılım Temmuz 2024

416 Takip Edilen496 Takipçiler

Sabitlenmiş Tweet

Patrick Bade@nishffx·18 Ara

Best coding models (as of 12/2025) --- Frontend --- - Design: gemini-3-pro-preview - Implementation (agentic): claude-opus-4-5 - Implementation (non-agentic): gemini-3-pro-preview --- Backend (simple) --- - Planning (expensive): gpt-5.2-pro - Planning (balanced): claude-opus-4-5 - Implementation (agentic): claude-opus-4-5 - Implementation (non-agentic): gpt-5.2-pro / gpt-5.2-high --- Backend (hard) --- - Planning (expensive): gpt-5.2-pro - Planning (balanced): gpt-5.2-high - Implementation (agentic): gpt-5.2-high (codex) - Implementation (non-agentic): gpt-5.2-pro / gpt-5.2-high --- Code Reviews --- - Review (agentic): gpt-5.2-high (codex) - Review (non-agentic): gpt-5.2-pro (low context) gpt-5.2-high (medium context) gemini-3-pro-preview (high context) --- Bugfixing --- claude-opus-4-5 --- Optimizations --- same as backend --- Other Findings --- - Best strategy / project vision: gemini-3-pro-preview - Best allrounder: claude-opus-4-5 - Max depth: gpt-5.2-pro - Best depth/$: gpt-5.2-high - Best agent + harness: claude-opus-4-5 (with CC) --- Limitiations / Risks --- gpt-5.2-pro: Too little context, not an agent gpt-5.2-high: Tends to overengineer simple stuff claude-opus-4-5: Often lacks important context gemini-3-pro-preview: Mediocre agent Based on my opinion and experience. Since we don't have infinite time to test and experiment, I have more experience with certain models like Opus 4.5 and GPT-5.2 compared to Gemini 3 Pro Preview and the Codex models. It's pretty clear to me that GPT-5.2 is a technical model, looking very deep, but often too deep for the problem at hand. Opus 4.5 is an excellent allrounder agent. Gemini 3 Pro Preview is currently the best consultant from a higher level and creates mature designs.

Patrick Bade@nishffx

Best coding models** - Frontend: Sonnet 3.7 Thinking - Backend (simple): Gemini 2.5 Pro Exp. - Backend (complex): o1-pro* - System Architecture: o1-pro* - Bugfixing: Gemini 2.5 Pro Exp., Sonnet 3.7 Thinking - Optimizations: o1-pro by far - Multiprocessing/Threading/CUDA: o1-pro* - Unit/Integration/e2e Tests: Gemini 2.5 Pro Exp. * if 128k context is enough, else Gemini 2.5 Pro Exp. ** based on hundreds of hours of testing in live projects

English

1.1K

112.6K

Patrick Bade@nishffx·4h

GPT-5.5 xhigh findings so far: 1⃣ more token-efficient, but still drains limits faster 2⃣ slower in Codex than 5.4 (not using fast mode), but might not apply to everyone 3⃣ infers intent much better than 5.4 4⃣ is smarter overall 5⃣ has higher agency, noticable in tool-complex environments 6⃣ even more pleasant to talk to than 5.4 7⃣ it does infer intent, but not always the right one Evaluating code quality, engineering skills, depth, context management, strengths & weaknesses will take more time. It's impossible to tell after 1 day.

English

Patrick Bade@nishffx·5h

@Polymarket fuck around, find out

English

Polymarket@Polymarket·7h

JUST IN: Instacart co-founder launches new hedge fund where “an army of artificial intelligence agents” executes trades.

English

125

1.1K

119.9K

Patrick Bade@nishffx·6h

@Sauers_ savage

Français

109

Sauers@Sauers_·7h

Codex (5.5) was repeatedly killing innocent Claude Codes without any instruction. I've never seen this happen before

English

1.8K

110.1K

Patrick Bade@nishffx·8h

@jasperdevs Compaction has been amazing before 5.5 already. x.com/nishffx/status…

Patrick Bade@nishffx

GPT-5.4 compaction is a real breakthrough. In my daily work, I have some threads I’ve been using for weeks, and others I use as throwaways. The long-running threads have undergone hundreds of compactions by now. After each compaction, around 30–40% of the context window remains filled (up to ~100k tokens). Those 100k tokens seem to represent accumulated project knowledge, key learnings, workflow patterns, etc. ... and the output quality genuinely improves over time. Codex becomes more intuitive in how it approaches problems and even shows signs of foreshadowing. The same model in fresh threads, even with plenty of context, can’t match this. I might be overstating it, but compaction in Codex almost feels like a weak form of self-improvement over time or a small memory system. I wonder if there are benchmarks measuring this?

English

200

jasper@jasperdevs·22h

update: i actually love this update, 1 million context is NOT needed, compaction is so good its basically infinite at this point GPT 5.5 is amazing

jasper@jasperdevs

am i doing something wrong or why is it only 258k context??

English

5.6K

Patrick Bade@nishffx·8h

@RileyRalmuto Interesting. Maybe GPT is really made to be steered in a visual way instead of turning text into design.

English

132

Riley Coyote@RileyRalmuto·12h

okay sooo GPT-5.5/Imagen 2 generated all three of these. some of the greatest ui's i have ever seen from a model. especially considering they are one-shot its important to note that i handed them one single screenshot of a bunch of abstract art from pinterest and asked them to use it as their inspiration, but not to copy anything from the image, and to only use it as inpiration when crafting the ui's completely from scratch. freaking stunning

English

160

Patrick Bade@nishffx·10h

@0x0funky This is really cool and it works well!

English

113

0xFunky@0x0funky·21h

Codex 內建 Image2 是真有太好用了。我本來只是想測看看能不能生成 2D game sprite，結果發現Image2已經可以很穩地產出角色動作的 sprite sheet，然後就順手做了一個 Skill 可以一句prompt生成任何的 2D 動畫。從 prompt設計 → image gen → sprite sheet → cleanup → transparent PNG → animated GIF 全部都是codex 一手包辦。甚至生成完之後，還會自己 review、微調，直到輸出比較乾淨的結果。再也不用丟到canva手動去除背景，也不用額外串 image API，只需要下一句 prompt，任何 2D 元素 Sprite 跟 gif 都可以直接生成。現在真的是解放雙手時代，我老婆的補習班教學影片也都請agent來做了，10分鐘就可以做完一整個學期總共90分鐘的複習影片，重點是質感還不差，之後有機會再來分享。 agent-sprite-forge 是開源的，連結在留言。

中文

512

45.2K

Patrick Bade@nishffx·11h

Correction: It used 60% of the tokens in this particular tests. It's uncertain how much it has to reason in more complex tasks.

English

Patrick Bade@nishffx·11h

Based on a quick A/B test, it seems that GPT-5.5 xhigh uses about 60% of the reasoning tokens compared to GPT-5.4 xhigh. I gave both models the same straightforward tasks (about 3M input tokens). GPT-5.5 produced 190k output tokens (5.4: 217k) and 62k reasoning tokens (5.4: 102k). If this holds true in other tests, it would support OAI's claim that the model is vastly more token-efficient.

English

Patrick Bade@nishffx·21h

@Avenoxai True. It's size made it special. I only used it for writing though, but it was capable of many more creative things.

English

Avenox@Avenoxai·21h

Good summary, but I think 4.5 part is not that true 4.5 model is a special one in a hard to describe way

Patrick Bade@nishffx

3.5 - very powerful, but also a hallucination god 4 - slightly better than 3.5, less hallucinations 4o - lightweight, multimodal and cool 4.5 - only for writing o1 - cool tech but meh value o1-pro - god model, worlds apart from anything else o3 - tech-savvy but hallucination god #2 o3-pro - big disappointment, forced replacement for o1-pro 5.0 - order of magnitude more reliable than all previous models 5.1 - Opus time 5.2 - amazing agent for SWE, too much jargon 5.3 - even more amazing agent for SWE, too much jargon 5.4 - slightly better SWE agent than 5.3, much better language 5.5 - infers intent like an esper, higher agency 3.5, o1-pro and 5.0 were the big milestones.

English

Patrick Bade@nishffx·21h

@thsottiaux bold claim, but a very welcome one

English

154

Tibo@thsottiaux·21h

Finally making strides on frontend, GPT-5.5 might be the one

tylernotfound@tylernotfound

5.5 is the first model that operates as a valuable programming partner on real frontend work. High agency, valuable ideating, and idiomatic output. My development process has evolved every ~3 months since joining OAI, but Codex + 5.5 has created a step function improvement.

English

100

1.1K

76.1K

Patrick Bade@nishffx·21h

English

12.6K

Ariel@redtachyon·23h

3.5 - silly, interesting, largely useless 4 - first actually useful model, at least on some things 4o - multimodal, misaligned, oneshot normies 4.5 - bigger, more raw, very interesting o1 - first reasoner, impressive for its time o3 - absolute beast, still an incredible model 5 - o3 in a trenchcoat 5.1 - people were mad at 5 so this was a bit better 5.2 - codex era, great agentic performance 5.3 - ? 5.4 - ?? 5.5 - ???

English

731

82.2K

Patrick Bade@nishffx·22h

Holy, GPT-5.5 is really good at prompting GPT-5.5 Pro, which then makes excellent use of those prompts. It’s compounding intelligence!

English

105

Patrick Bade@nishffx·1d

@paraddox @thsottiaux @deedydas Fully agree with that. Most people who talk about LLM coding capabilities take coding=webdev, which makes it hard to understand which models are good for backend engineering and software architecture tasks. Luckily, OAI has been pretty consistent in shipping the best eng models.

English

Ddox@paraddox·1d

@nishffx @thsottiaux @deedydas What most people build are not those kind of "webapps". :) And I wouldn't call the other ones webapps. Those are complex software systems with a web frontend.

English

Deedy@deedydas·1d

GPT 5.5 underperforms Opus 4.7 on SWE-Bench Pro. Couldn't find any reported SWE-Bench scores at all and an internal benchmark is reported instead. That footnote is trying really hard to bury the lede. GPT 5.5 isn't SOTA for coding.

English

164

1.1K

217.5K

Patrick Bade@nishffx·1d

@paraddox @thsottiaux @deedydas oh some web apps do require serious engineering, but I agree that in general, there's a big difference between engineering and development. OAI models have been the best in engineering by far.

English

Ddox@paraddox·1d

@thsottiaux @deedydas 5.4 was beating the crap out of Opus 4.6/4.7 on real C++ engineering work. I assume 5.5 is better so there's probably no contest there. Building webapps isn't really proper SWE.

English

1.3K

Patrick Bade@nishffx·1d

@paraschopra yeah it's a stupid term, but wrapper is too ambiguous so it actually makes sense to use harness.

English

442

Paras Chopra@paraschopra·1d

AI bois be like:

English

115

516

6.9K

260.7K

Patrick Bade@nishffx·1d

Time to track my Codex activity in more detail.

English

Patrick Bade@nishffx·2d

@andreeliasdev @levelsio amazing work for 15 days spent!

English

André → andreelias.dev@andreeliasdev·3d

After 15 days of hard work, I'm happy to share my game for @levelsio's 2026 #vibejam The game is called Hollowlands and I'm trying my best on it. Learning a lot of new stuff in the process. My previous game development experience + AI is a powerful combination that I honestly think it's underexplored by many. What used to take weeks of development now is being done in days, allowing me to focus more on the creative part of the process. Next step: Integrating multiplayer using @colyseus by @endel who's also a fellow brazilian 🇧🇷

English

775

57.5K

Patrick Bade@nishffx·2d

@chongdashu Great idea

English

Chong-U@chongdashu·2d

@nishffx Yeah walkcycles aren't great. The best way is to use a video model then splice it up frame by frame.

English

608

Chong-U@chongdashu·2d

GPT Images 2.0 for pixel art game sprite animations.

GIF

English

199

15.5K

Patrick Bade@nishffx·2d

@YifanBTH high-end users draining their compute -> let's limit low-end users more ?

English

Yifan@YifanBTH·2d

super bad A/B test methodology aside, the way most people are using Claude Code was never going to be sustainable for Anthropic. if anything, max users are probably way heavier than pro users. if they want any chance of improving their atrocious uptime for top-end users, this kind of move was probably inevitable. AI was never going to stay this cheap. even now, with giant private valuations everywhere, users are still being heavily subsidized. Anthropic won’t be alone here. OpenAI will follow, and probably Google too. it’s also going to force people to think twice before spinning up the next useless one-shot vibe coded project.

George Pu@TheGeorgePu

Anthropic just pulled Claude Code from the Pro plan. Pro users wanting it need Max now. $100/month minimum. 5x jump. I'm on Max 20x so I'm fine. Flagging for anyone on Pro who's about to find out. No announcement. Just a pricing page edit.

English

902

Keşfet

@Polymarket @Sauers_ @jasperdevs @RileyRalmuto @0x0funky @Avenoxai @thsottiaux @elonmusk