fullofcaffeine

4K posts

fullofcaffeine

@FullOfCaffeine

Planet Earth Katılım Aralık 2008

5.3K Takip Edilen733 Takipçiler

@burkov @thsottiaux Yes! It requires quite a bit of steering to produce the output that Opus often gets right the first time. It means that Opus is often much better for documentation.,

English

248

BURKOV@burkov·1d

@thsottiaux It's not good in explaining a problem or a solution in simple English.

English

4.9K

Tibo@thsottiaux·1d

Hello builders. What are we getting wrong with Codex, what can we improve?

English

2.4K

2.8K

302.8K

fullofcaffeine@FullOfCaffeine·1d

GPT 5.4 is not good at writing user-friendly docs. To be more specific: GPT usually tends to be way too terse and technical and use terms that might not be familiar to the end-user, e.g in a public README. Steering often fixes that though. Opus 4.5 is much better at capturing my intentions given the context, it's prose is often much better by default. And ofc, its design skills. Opus is still better! And finally - and foremost - the current Codex outage, happening just now: github.com/openai/codex/i….

English

191

fullofcaffeine@FullOfCaffeine·14 Mar

@RCallsign @sudoingX And I mean the commercial aspect, they literally have us by the b*lls if they want.

English

fullofcaffeine@FullOfCaffeine·14 Mar

@RCallsign @sudoingX Well, I can only hope it will become cost and time effective! I use and like the closed-weight frontier models, but do you really want to live in a world where you depend solely on them? That's pretty depressing and dangerous.

English

296

Sudo su@sudoingX·13 Mar

hey if you have a 3060, or any GPU with 8GB or more sitting in a drawer right now, that thing can run 9 billion parameters of intelligence autonomously. and you don't know it yet. 2 hours ago i posted that 9B hit a ceiling. 2,699 lines across 11 files. blank screen. said the limit for autonomous multifile coding on 9 billion parameters is real. then i audited every file. found 11 bugs. exact file, exact line, exact fix. duplicate variable declarations killing the script loader. a canvas reference never connected to the DOM. enemies with no movement logic. particle systems called on the class instead of the instance. fed that list as a single prompt to the same Qwen 3.5 9B on the same RTX 3060 through Hermes Agent. it fixed all 11. surgically. patch level edits across 4 files. no rewrites. no hallucinated changes. game boots. enemies spawn, move, collide. background renders. particles fire. and here's what nobody is talking about. this is a 9 billion parameter model running a full agentic framework. Hermes Agent with 31 tools. file operations, terminal, browser, code execution. not a single tool call failed. the agent chain never broke. most people think you need 70B+ for reliable tool use. this is 9B on 12 gigs doing it clean. the model didn't fail. my prompting strategy did. the ceiling is not the parameter count. the ceiling is how you prompt it. this is not done. bullets don't fire yet. boss fights need wiring. but the screen that was black 2 hours ago now has a full game rendering in real time. iterating right now. anyone with a GPU from the last 5 years should be paying attention to what is happening right now.

Sudo su@sudoingX

9B on a 3060. 2,699 lines. 11 files. blank screen. Qwen 3.5 9B Q4 running through Hermes Agent wrote the full Octopus Invaders project autonomously. config, audio, particles, background, enemies, player, ui, game loop, README. structured the directory, separated concerns, documented everything. then it selfdiagnosed 10 bugs and patched them across files. fixed variable references, missing classes, broken directory paths. even created a reusable Hermes skill for future game builds unprompted. the code reads like a senior dev wrote it. clean architecture, proper separation, professional naming. but CONFIG.canvas is null on line 1 of initGame(). the game crashes before a single frame renders. 9B understands structure. it can architect, scaffold, and debug individual files. what it can't do is hold 10 files in context and wire them together correctly. duplicate Bullet classes across two files with incompatible interfaces. static method calls on instance based classes. enemies that spawn but never move because there's no y += speed. 35B on a 3090 built 3,483 lines in one pass and it ran. 9B built 2,699 lines across multiple iterations and the screen is black. the ceiling for autonomous multifile coding on 9B parameters is real. still iterating. trying a singlefile version of the same prompt next to isolate what 9B can actually close on.

English

112

250

2.7K

680.4K

fullofcaffeine@FullOfCaffeine·14 Mar

@sudoingX I have a sparing Quadro RTX 5000 with 16GB. Wondering if it'd at least get close to that? Great content, btw!

English

536

fullofcaffeine@FullOfCaffeine·14 Mar

@rezoundous Yes!

Tyler@rezoundous·13 Mar

Dear Codex, please get better at UI so I can unsubscribe Claude and Gemini.

English

120

1.9K

117.6K

fullofcaffeine@FullOfCaffeine·13 Mar

You can do that, and it might work reasonably well. The difference is that the agent needs to do less and the oracle implements it deterministically. And oracle implements and abstracts the browser communication/orchestration side (if you're using the browser) which would be a nightmare for the agent to do everytime. If you're using the API only, you might be able to implement a similar system via instructions via AGENTS.md or a skill or just prompt it everytime.

English

Brendan Smith, Ph.D.@mrbsmith58·13 Mar

@FullOfCaffeine @VictorTaelin Will check it out! Quick question .. how is Oracle different from having codex zip relevant files and writing a markdown defining the files and issue at hand?

English

Taelin@VictorTaelin·13 Mar

Quick 2am success story: asked GPT-5.4 to simplify Bend2's elaborator; 4h later, no real improvements. Asked it to write a big prompt asking help, passed it to 5.4 *Pro*, pasted the response back to codex, which landed a massive simplification landed. Seems like the pro version enlightened it. Perhaps a nice feature to have natively on Codex would be to just pause what it is doing and invoke the pro version for a plan. This was my first time using pro and it was definitely worth it.

English

453

37.4K

fullofcaffeine@FullOfCaffeine·13 Mar

@nate_yiu @VictorTaelin Would like to know as well!

English

120

Nate Yiu@nate_yiu·13 Mar

@VictorTaelin Nice. Did xhigh write the prompt?

English

1.5K

fullofcaffeine@FullOfCaffeine·13 Mar

@VictorTaelin You can use something like github.com/steipete/oracle to automate that. The browser integration is flaky, but I've made it work better locally, with some additional fixes. It also supports Pro via the API, which is, of course, reliable.

English

fullofcaffeine@FullOfCaffeine·13 Mar

@VictorTaelin Pro is a beast. I love it. This workflow you described is great, it's almost a "secret sauce"-kind-of-thing. Too bad it's very slow, but it's worth it most of the time.

English

945

fullofcaffeine@FullOfCaffeine·11 Mar

@thsottiaux Thanks, really appreciate the hard work and transparency!

English

199

Tibo@thsottiaux·11 Mar

OK, Codex is back and stable and we should be good for a while. Reset button pressed, should see it in a bit

Tibo@thsottiaux

Codex GPU fleet is still melting, team is working day (and night) to keep up. We’re seeing stability in sight for later this evening.

English

316

2.4K

346.5K

fullofcaffeine@FullOfCaffeine·11 Mar

Ah yeah, that's the case: status.openai.com

English

fullofcaffeine@FullOfCaffeine·11 Mar

Codex requests are extremely slow at the moment. In fact, it seems to be hanging at this very moment. GPT 5.4 high/xhigh. Tried to interrupt one of them to get out of the deadlock, asked codex to continue and got this error message: `■ stream disconnected before completion: An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID d424b136-c4ef-42ae-bf12-7bb0c0df7028 in your message.` Is it due to capacity/API issues? cc @thsottiaux

English

206

fullofcaffeine@FullOfCaffeine·11 Mar

@georgepickett xhigh can be counter productive sometimes. Wish codex could auto-adjust between high/xhigh as needed! Haven't tried fast yet, though.

English

George Pickett@georgepickett·11 Mar

btw - this is not a diss to Codex, more of a brag. I have 5.4 on xhigh and on fast mode. I know the implications of that. If I didn't get extreme value out of running agents like this I would not be using it 10 hours/day

English

4.5K

George Pickett@georgepickett·10 Mar

burned 31% of a chatgpt pro sub in one 5hr session not looking forward to April 2 when 2x rate limits go away.

English

640

57K

fullofcaffeine@FullOfCaffeine·10 Mar

@BLCNYY It's not always the right reasoning level, there are drawbacks depending on the case, not to mention it's slower. `High` is a sweet spot.

English

164

BLCNYY@BLCNYY·9 Mar

Am I the only one who always uses the Extra High reasoning effort in Codex, regardless of how hard the task is? 🤔

English

121

439

78.6K

fullofcaffeine@FullOfCaffeine·10 Mar

@therealnoozo Bashing how? I've been seeing a lot of praise in my timeline instead? Love the BEAM, btw!

English

381

{:ok, %Pedro{}}@therealnoozo·9 Mar

So many people bashing the BEAM lately. I couldn't care less. All I know is that I moved to Elixir years ago and no other language/ecosystem has made me this happy. I don't need a million libraries. My stack is app and PostgreSQL. And our apps are neither small nor simple by any count either. We have AI, video processing, transcriptions, and many other things in the same monolith. No regrets.

English

189

8.2K

fullofcaffeine@FullOfCaffeine·8 Mar

@nummanali @Draja441 Maybe better for planning(ish) steps?

English

Numman Ali@nummanali·8 Mar

@Draja441 It stops going in circles, over engineering and second guessing itself

English

1.3K

Numman Ali@nummanali·8 Mar

The rumours are true After always being XHigh on Codex I can say with confidence That GPT 5.4 is better with High

English

300

21.8K

fullofcaffeine@FullOfCaffeine·7 Mar

Agreed! Are you using xhigh all the time for all kinds of projects, though? I've found high to still be very good, but it's hard to benchmark that quantitatively. The main open loop I have now is deciding the optimal thinking level for a given project/task. Heck, if I had unlimited credits, fine, I wouldn't even bother the slowness, but xhigh uses a lot of tokens ;)

English

200

Ryan Carson@ryancarson·7 Mar

$200/mo ChatGPT Pro + gpt-5.4 xhigh + Codex Mac App is the most asymmetric upside I've ever seen in a tool chain. A ridiculous cheat code for founders.

English

142

1.7K

306.7K

fullofcaffeine@FullOfCaffeine·7 Mar

@tomcupr @thsottiaux DId you use xhigh exclusively?

English

1.5K

Tomas Cupr@tomcupr·7 Mar

Opus 4.6 felt different for coding. Things just suddenly worked. GPT-5.4 xhigh in Codex feels like another leap. It's Opus 4.6 that goes deeper and considers much broader implications of its work and the whole development setup. Crazy. Amazing job @thsottiaux.

English

245

35.5K

fullofcaffeine@FullOfCaffeine·7 Mar

@SeloSlav Looks beautiful! Did you draw the assets or were they AI-assisted/generted?

English

Martin Erlić@SeloSlav·7 Mar

10 months ago I did something slightly insane. I started building an entire 2D top-down multiplayer game engine… inside React. 3500 files later, hundreds of hooks, and somehow the game loop still runs shockingly well thanks to AI-assisted optimization. But the architecture is now totally out of control. Time to extract the engine from React before this thing collapses under its own hooks.

English

132

7.1K

Keşfet

@burkov @thsottiaux @RCallsign @sudoingX @rezoundous @VictorTaelin @nate_yiu @elonmusk