Graham

5.2K posts

Graham

@grahamcodes

staff engineer @coinbase working on AI devx for the @base team. prev: tech lead on Coinbase Advanced Trade, early team @fluidityio (acq. @consensys)

United States انضم Ekim 2013

2.2K يتبع1.8K المتابعون

تغريدة مثبتة

Graham@grahamcodes·5 Haz

ZXX

7.5K

Graham@grahamcodes·1d

@wesbos 🤠

QME

Wes Bos@wesbos·1d

Only cool people can reply to this

English

686

693

108.8K

Graham@grahamcodes·1d

@mattlam_ @SIGKITTEN @sawyerhood In my testing it’s the best thing you can use other than Browser Use saas

English

Matthew Lam@mattlam_·1d

@SIGKITTEN @sawyerhood do you know how it compares to agent-browser?

English

358

SIGKITTEN@SIGKITTEN·1d

> there are like 100s browser agent clis (even Garry Tan has one) why use this one? because @sawyerhood s clanker web browser shit is always sota

Sawyer Hood@sawyerhood

Introducing the new dev-browser cli. The fastest way for an agent to use a browser is to let it write code. Just `npm i -g dev-browser` and tell your agent to "use dev-browser"

English

168

21.6K

Graham أُعيد تغريده

dex@dexhorthy·4d

as @rauchg put it so well, shipping is much more than just coding. Shipping means testing, deploying, monitoring, maintaining, fixing at 2am, etc. Models can code but we're still figuring out if/how they can solve which parts of shipping As models write more code, the SWE's job evolves from "write working code" to "produce working code" - we're all figuring out what that means.

English

2.6K

Graham@grahamcodes·1d

@midmajortapes Aztec legend on.soundcloud.com/WnIiiKNjg1YXWk…

Magyar

541

Mid-Major Tapes 🔥@midmajortapes·1d

Man I miss 2017 Malik Pope...

English

318

25.6K

Graham أُعيد تغريده

Wes Winder@weswinder·3d

this is the best thing i’ve ever seen lol why does it feel like riley is the only person building cool stuff all these parallel claude orchestration sessions and everything i see here is boring

Riley Walz@rtwlz

made my computer dramatically play BBC news music before every meeting

English

231

7.8K

688.7K

Graham أُعيد تغريده

Ben Davis@davis7·3d

Been going deeper into the "code mode" stuff. Basically letting the agents write typescript to call MCPs, APIs, and etc. instead of normal tool calls or bash commands. No clue what the final form of this is yet. Really like what @RhysSullivan is working on with executor. I think it or something like it is probably the future

English

146

34.3K

Graham@grahamcodes·3d

@StatisticsFTW @noahzweben They run on Anthropic servers. It helps them grow their moat/lock in.

English

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·3d

@noahzweben Where do these run? Is this a Claude code that's running on some sort of cloud server? Why is Claude the appropriate place for cron job scheduling to live?

English

1.1K

Noah Zweben@noahzweben·3d

Use /schedule to create recurring cloud-based jobs for Claude, directly from the terminal. We use these internally to automatically resolve CI failures, push doc updates, and generally power automations that you want to exists beyond a closed laptop

English

176

318

4.3K

970.2K

Graham@grahamcodes·3d

@H04642924H @amyforsandiego @TSA @ICEgov Pre Check and CLEAR are both closed at the moment

English

132

Tom Hekman@H04642924H·3d

@amyforsandiego @TSA @ICEgov Was Pre-check just as bad?

English

867

Amy Reichert@amyforsandiego·3d

RIGHT NOW: San Diego International Airport is “organized chaos” this morning. @TSA line stretched out with a 70 minute wait just to reach screening. DHS canines on site. @ICEgov expected to arrive today. And this Friday marks the 6th missed paycheck for federal TSA workers.

English

223

1.1K

83.1K

Graham أُعيد تغريده

Professor Campbell@abcampbell·5d

managing a team of AIs is exactly like managing bunch of first year analysts/quants/devs massively overconfident insane ambition to practicality ratio constantly distracted by the shiny thing of trying to do your job or someone else’s confuse intelligence for judgment need constant reminders of their todo list terrible synthesizers perpetually confusing goals with tasks I need a nap

English

365

34.1K

Graham@grahamcodes·5d

@dillon_mulroy This is what the internal reasoning logs looks like.. sometimes it just leaks through their masking layer randomly lmao. I’ve see it a few times as well.

English

109

Dillon Mulroy@dillon_mulroy·6d

bro what is happening lmao

English

Dillon Mulroy@dillon_mulroy·6d

gpt 5.4 has started talking like a caveman out of the blue

English

117

11.4K

Graham@grahamcodes·5d

@nickbaumann_ Web devs feasting while native mobile devs starve

English

Nick@nickbaumann_·6d

We are dangerously close to putting Codex in autonomous loops where it picks up tickets, tests it's own changes via Playwright, and records and uploads verification mp4s to PRs. If you've never asked Codex to test your app, do try it!

OpenAI Developers@OpenAIDevs

Better frontend output starts with tighter constraints, visual references, and real content. Here’s how to build intentional frontends with GPT-5.4 developers.openai.com/blog/designing…

English

1.4K

173.6K

Graham@grahamcodes·6d

@tannerlinsley Codex desktop app or T3 Code

English

220

Tanner Linsley@tannerlinsley·6d

Ghostty was fun, but time for something else. I still love opencode, too but with CC plans dead on it… I’m feeling lost. Full GUI? T3 Code? Opencode GUI? Warp? Back to cursor? Try CC again? Raw Codex? My 🧠 hurts and I just need to keep shipping.

English

413

1.3K

332.9K

Graham أُعيد تغريده

dex@dexhorthy·18 Mar

hlyr.dev/blog/stop-clau…

ZXX

10K

Graham أُعيد تغريده

Matt Pocock@mattpocockuk·19 Mar

Doing some experiments today with Opus 4.6's 1M context window. Trying to push coding sessions deep into what I would consider the 'dumb zone' of SOTA models: >100K tokens. The drop-off in quality is really noticeable. Dumber decisions, worse code, worse instruction-following. Don't treat 1M context window any differently. It's still 100K of smart, and 900K of dumb.

English

155

1.2K

158.1K

Graham@grahamcodes·19 Mar

@nayshins stupid overly defensive helpers that don’t trust type safety and do paranoid runtime checks of values all the time. isRecord and isString are ones I’ve seen Codex generate multiple times

English

135

Jake@nayshins·19 Mar

Has anyone documented all the code slop patterns yet? I want to lint for them and banish them to hades.

English

201

22.2K

Graham أُعيد تغريده

Professor Campbell@abcampbell·18 Mar

x.com/i/article/2034…

ZXX

110

949

380.2K

Graham أُعيد تغريده

Conor@jconorgrogan·19 Mar

By 2H 2026 companies will realize that token efficiency per pull request is one of the most important differentiators for SWE talent Intelligence too cheap to meter isn't going to happen anytime soon, and the free token corporate spigots are going to be reigned in, hard

English

2.6K

Graham أُعيد تغريده

Mario Zechner@badlogicgames·17 Mar

i can't speak for david. what i see is this: if you let agents build or extend a codebase with only minor or no supervision, you get unmaintainable garbage, because the agent makes terrible decisions that compound, both big and small. those decisions make it hard for both you and the agent to keep modifying the code base, until eventually it's unrecoverable. why does the agent make bad decisions? i can't tell for sure, but my gut tells me that training data can currently not capture the holistic thinking needed to design and evolve complex systems. that's one part of the problem. related to that, and oversimplified: agents output the "mean quality" of the code they saw during training. most of that code is very bad. specifically tests, which humans are terrible at writing at. another part of the problem is that specification via prompt is not precise enough, so the agent has to fill in the blanks, giving it enough rope to hang itself. the more detailed your spec gets, so the agent gets constrained and less likely to produce crap, the closer you are to handwriting the code yourself, as that's the most detailed version of the spec that can exist. so then you gain nothing. back to prompt spec it is, which means the agent fills in blanks, which means we get suboptimal or truely bad results. using agents can still be a net productivity boost (see other posts in my thread), but it is not easy to come up with consistent workflows that produce both production quality maintainable code while retaining the speed advantages agents give you.

English

288

14.9K

Graham أُعيد تغريده

boris@boristane·15 Mar

slop creep is what happens when you turn your brain off and hand the thinking to coding agents each individual change is fine, but all together, you have a pile of crap we're witnessing this happen in real-time across everything boristane.com/blog/slop-cree…

English

653

89.3K

Graham@grahamcodes·15 Mar

@snowmaker Somehow, I really miss it. I'm not sure if I actually enjoyed it or it's a strange form of nostalgia.

English

106

Jared Friedman@snowmaker·14 Mar

I realized something else AI has changed about coding: you don't get stuck anymore. Programming used to be punctuated by episodes of extreme frustration, when a tricky bug ground things to a halt. That doesn't happen anymore.

English

593

444

7.4K

915.6K

اكتشف

@wesbos @mattlam_ @SIGKITTEN @sawyerhood @rauchg @midmajortapes @RhysSullivan @StatisticsFTW