Patrick Bade

2K posts

Patrick Bade banner
Patrick Bade

Patrick Bade

@nishffx

building AI quant trading systems, sharing thoughts on coding agents & models, indie game dev when I have the time, running @FireflyTA

Katılım Temmuz 2024
416 Takip Edilen496 Takipçiler
Sabitlenmiş Tweet
Patrick Bade
Patrick Bade@nishffx·
Best coding models (as of 12/2025) --- Frontend --- - Design: gemini-3-pro-preview - Implementation (agentic): claude-opus-4-5 - Implementation (non-agentic): gemini-3-pro-preview --- Backend (simple) --- - Planning (expensive): gpt-5.2-pro - Planning (balanced): claude-opus-4-5 - Implementation (agentic): claude-opus-4-5 - Implementation (non-agentic): gpt-5.2-pro / gpt-5.2-high --- Backend (hard) --- - Planning (expensive): gpt-5.2-pro - Planning (balanced): gpt-5.2-high - Implementation (agentic): gpt-5.2-high (codex) - Implementation (non-agentic): gpt-5.2-pro / gpt-5.2-high --- Code Reviews --- - Review (agentic): gpt-5.2-high (codex) - Review (non-agentic): gpt-5.2-pro (low context) gpt-5.2-high (medium context) gemini-3-pro-preview (high context) --- Bugfixing --- claude-opus-4-5 --- Optimizations --- same as backend --- Other Findings --- - Best strategy / project vision: gemini-3-pro-preview - Best allrounder: claude-opus-4-5 - Max depth: gpt-5.2-pro - Best depth/$: gpt-5.2-high - Best agent + harness: claude-opus-4-5 (with CC) --- Limitiations / Risks --- gpt-5.2-pro: Too little context, not an agent gpt-5.2-high: Tends to overengineer simple stuff claude-opus-4-5: Often lacks important context gemini-3-pro-preview: Mediocre agent Based on my opinion and experience. Since we don't have infinite time to test and experiment, I have more experience with certain models like Opus 4.5 and GPT-5.2 compared to Gemini 3 Pro Preview and the Codex models. It's pretty clear to me that GPT-5.2 is a technical model, looking very deep, but often too deep for the problem at hand. Opus 4.5 is an excellent allrounder agent. Gemini 3 Pro Preview is currently the best consultant from a higher level and creates mature designs.
Patrick Bade@nishffx

Best coding models** - Frontend: Sonnet 3.7 Thinking - Backend (simple): Gemini 2.5 Pro Exp. - Backend (complex): o1-pro* - System Architecture: o1-pro* - Bugfixing: Gemini 2.5 Pro Exp., Sonnet 3.7 Thinking - Optimizations: o1-pro by far - Multiprocessing/Threading/CUDA: o1-pro* - Unit/Integration/e2e Tests: Gemini 2.5 Pro Exp. * if 128k context is enough, else Gemini 2.5 Pro Exp. ** based on hundreds of hours of testing in live projects

English
48
71
1.1K
112.6K
Patrick Bade
Patrick Bade@nishffx·
GPT-5.5 xhigh findings so far: 1⃣ more token-efficient, but still drains limits faster 2⃣ slower in Codex than 5.4 (not using fast mode), but might not apply to everyone 3⃣ infers intent much better than 5.4 4⃣ is smarter overall 5⃣ has higher agency, noticable in tool-complex environments 6⃣ even more pleasant to talk to than 5.4 7⃣ it does infer intent, but not always the right one Evaluating code quality, engineering skills, depth, context management, strengths & weaknesses will take more time. It's impossible to tell after 1 day.
English
1
0
0
64
Polymarket
Polymarket@Polymarket·
JUST IN: Instacart co-founder launches new hedge fund where “an army of artificial intelligence agents” executes trades.
English
125
91
1.1K
119.9K
Sauers
Sauers@Sauers_·
Codex (5.5) was repeatedly killing innocent Claude Codes without any instruction. I've never seen this happen before
Sauers tweet media
English
67
47
1.8K
110.1K
Patrick Bade
Patrick Bade@nishffx·
@RileyRalmuto Interesting. Maybe GPT is really made to be steered in a visual way instead of turning text into design.
English
0
0
0
132
Riley Coyote
Riley Coyote@RileyRalmuto·
okay sooo GPT-5.5/Imagen 2 generated all three of these. some of the greatest ui's i have ever seen from a model. especially considering they are one-shot its important to note that i handed them one single screenshot of a bunch of abstract art from pinterest and asked them to use it as their inspiration, but not to copy anything from the image, and to only use it as inpiration when crafting the ui's completely from scratch. freaking stunning
Riley Coyote tweet mediaRiley Coyote tweet mediaRiley Coyote tweet media
English
12
12
160
9K
0xFunky
0xFunky@0x0funky·
Codex 內建 Image2 是真有太好用了。 我本來只是想測看看能不能生成 2D game sprite,結果發現Image2已經可以很穩地產出角色動作的 sprite sheet,然後就順手做了一個 Skill 可以一句prompt生成任何的 2D 動畫。 從 prompt設計 → image gen → sprite sheet → cleanup → transparent PNG → animated GIF 全部都是codex 一手包辦。 甚至生成完之後,還會自己 review、微調,直到輸出比較乾淨的結果。 再也不用丟到canva手動去除背景,也不用額外串 image API,只需要下一句 prompt,任何 2D 元素 Sprite 跟 gif 都可以直接生成。 現在真的是解放雙手時代,我老婆的補習班教學影片也都請agent來做了,10分鐘就可以做完一整個學期總共90分鐘的複習影片,重點是質感還不差,之後有機會再來分享。 agent-sprite-forge 是開源的,連結在留言。
中文
25
68
512
45.2K
Patrick Bade
Patrick Bade@nishffx·
Correction: It used 60% of the tokens in this particular tests. It's uncertain how much it has to reason in more complex tasks.
English
0
0
1
30
Patrick Bade
Patrick Bade@nishffx·
Based on a quick A/B test, it seems that GPT-5.5 xhigh uses about 60% of the reasoning tokens compared to GPT-5.4 xhigh. I gave both models the same straightforward tasks (about 3M input tokens). GPT-5.5 produced 190k output tokens (5.4: 217k) and 62k reasoning tokens (5.4: 102k). If this holds true in other tests, it would support OAI's claim that the model is vastly more token-efficient.
English
1
0
5
97
Patrick Bade
Patrick Bade@nishffx·
@Avenoxai True. It's size made it special. I only used it for writing though, but it was capable of many more creative things.
English
1
0
0
17
Avenox
Avenox@Avenoxai·
Good summary, but I think 4.5 part is not that true 4.5 model is a special one in a hard to describe way
Patrick Bade@nishffx

3.5 - very powerful, but also a hallucination god 4 - slightly better than 3.5, less hallucinations 4o - lightweight, multimodal and cool 4.5 - only for writing o1 - cool tech but meh value o1-pro - god model, worlds apart from anything else o3 - tech-savvy but hallucination god #2 o3-pro - big disappointment, forced replacement for o1-pro 5.0 - order of magnitude more reliable than all previous models 5.1 - Opus time 5.2 - amazing agent for SWE, too much jargon 5.3 - even more amazing agent for SWE, too much jargon 5.4 - slightly better SWE agent than 5.3, much better language 5.5 - infers intent like an esper, higher agency 3.5, o1-pro and 5.0 were the big milestones.

English
1
0
1
41
Patrick Bade
Patrick Bade@nishffx·
3.5 - very powerful, but also a hallucination god 4 - slightly better than 3.5, less hallucinations 4o - lightweight, multimodal and cool 4.5 - only for writing o1 - cool tech but meh value o1-pro - god model, worlds apart from anything else o3 - tech-savvy but hallucination god #2 o3-pro - big disappointment, forced replacement for o1-pro 5.0 - order of magnitude more reliable than all previous models 5.1 - Opus time 5.2 - amazing agent for SWE, too much jargon 5.3 - even more amazing agent for SWE, too much jargon 5.4 - slightly better SWE agent than 5.3, much better language 5.5 - infers intent like an esper, higher agency 3.5, o1-pro and 5.0 were the big milestones.
English
2
0
60
12.6K
Ariel
Ariel@redtachyon·
3.5 - silly, interesting, largely useless 4 - first actually useful model, at least on some things 4o - multimodal, misaligned, oneshot normies 4.5 - bigger, more raw, very interesting o1 - first reasoner, impressive for its time o3 - absolute beast, still an incredible model 5 - o3 in a trenchcoat 5.1 - people were mad at 5 so this was a bit better 5.2 - codex era, great agentic performance 5.3 - ? 5.4 - ?? 5.5 - ???
English
25
12
731
82.2K
Patrick Bade
Patrick Bade@nishffx·
Holy, GPT-5.5 is really good at prompting GPT-5.5 Pro, which then makes excellent use of those prompts. It’s compounding intelligence!
English
0
0
1
105
Patrick Bade
Patrick Bade@nishffx·
@paraddox @thsottiaux @deedydas Fully agree with that. Most people who talk about LLM coding capabilities take coding=webdev, which makes it hard to understand which models are good for backend engineering and software architecture tasks. Luckily, OAI has been pretty consistent in shipping the best eng models.
English
1
0
1
25
Ddox
Ddox@paraddox·
@nishffx @thsottiaux @deedydas What most people build are not those kind of "webapps". :) And I wouldn't call the other ones webapps. Those are complex software systems with a web frontend.
English
1
0
1
51
Deedy
Deedy@deedydas·
GPT 5.5 underperforms Opus 4.7 on SWE-Bench Pro. Couldn't find any reported SWE-Bench scores at all and an internal benchmark is reported instead. That footnote is trying really hard to bury the lede. GPT 5.5 isn't SOTA for coding.
Deedy tweet media
English
164
35
1.1K
217.5K
Patrick Bade
Patrick Bade@nishffx·
@paraddox @thsottiaux @deedydas oh some web apps do require serious engineering, but I agree that in general, there's a big difference between engineering and development. OAI models have been the best in engineering by far.
English
1
0
2
59
Ddox
Ddox@paraddox·
@thsottiaux @deedydas 5.4 was beating the crap out of Opus 4.6/4.7 on real C++ engineering work. I assume 5.5 is better so there's probably no contest there. Building webapps isn't really proper SWE.
English
2
0
16
1.3K
Patrick Bade
Patrick Bade@nishffx·
@paraschopra yeah it's a stupid term, but wrapper is too ambiguous so it actually makes sense to use harness.
English
0
0
0
442
Paras Chopra
Paras Chopra@paraschopra·
AI bois be like:
Paras Chopra tweet media
English
115
516
6.9K
260.7K
Patrick Bade
Patrick Bade@nishffx·
Time to track my Codex activity in more detail.
Patrick Bade tweet media
English
0
0
0
59
André → andreelias.dev
André → andreelias.dev@andreeliasdev·
After 15 days of hard work, I'm happy to share my game for @levelsio's 2026 #vibejam The game is called Hollowlands and I'm trying my best on it. Learning a lot of new stuff in the process. My previous game development experience + AI is a powerful combination that I honestly think it's underexplored by many. What used to take weeks of development now is being done in days, allowing me to focus more on the creative part of the process. Next step: Integrating multiplayer using @colyseus by @endel who's also a fellow brazilian 🇧🇷
English
93
30
775
57.5K
Chong-U
Chong-U@chongdashu·
@nishffx Yeah walkcycles aren't great. The best way is to use a video model then splice it up frame by frame.
English
2
0
6
608
Chong-U
Chong-U@chongdashu·
GPT Images 2.0 for pixel art game sprite animations.
GIF
Chong-U tweet media
English
10
5
199
15.5K
Patrick Bade
Patrick Bade@nishffx·
@YifanBTH high-end users draining their compute -> let's limit low-end users more ?
English
0
0
0
34
Yifan
Yifan@YifanBTH·
super bad A/B test methodology aside, the way most people are using Claude Code was never going to be sustainable for Anthropic. if anything, max users are probably way heavier than pro users. if they want any chance of improving their atrocious uptime for top-end users, this kind of move was probably inevitable. AI was never going to stay this cheap. even now, with giant private valuations everywhere, users are still being heavily subsidized. Anthropic won’t be alone here. OpenAI will follow, and probably Google too. it’s also going to force people to think twice before spinning up the next useless one-shot vibe coded project.
George Pu@TheGeorgePu

Anthropic just pulled Claude Code from the Pro plan. Pro users wanting it need Max now. $100/month minimum. 5x jump. I'm on Max 20x so I'm fine. Flagging for anyone on Pro who's about to find out. No announcement. Just a pricing page edit.

English
3
0
4
902