Graham

5.2K posts

Graham

@grahamcodes

staff engineer @coinbase working on AI devx for the @base team. prev: tech lead on Coinbase Advanced Trade, early team @fluidityio (acq. @consensys)

United States Beigetreten Ekim 2013

2.2K Folgt1.8K Follower

Angehefteter Tweet

Graham@grahamcodes·5 Haz

ZXX

7.5K

Graham retweetet

prinz@deredleritt3r·1h

Anthropic has been testing a new model called "Mythos" with certain customers: - a "step change" in AI capabilities, including "dramatically higher scores" in coding, academic reasoning and cybersecurity - "currently far ahead of any other AI model in cyber capabilities” - part of a new "Capybara" series of models, which are larger and more intelligent than Opus - more expensive to run than Opus; not yet ready for general release

Jeremy Kahn@jeremyakahn

Exclusive: Anthropic left details of an unreleased model, exclusive CEO retreat, sitting in an unsecured data trove in a significant security lapse. Great reporting from @FortuneMagazine's @beafreyanolan fortune.com/2026/03/26/ant…

English

572

95.6K

Graham@grahamcodes·1d

@wesbos 🤠

QME

Wes Bos@wesbos·1d

Only cool people can reply to this

English

688

693

109.4K

Graham@grahamcodes·1d

@mattlam_ @SIGKITTEN @sawyerhood In my testing it’s the best thing you can use other than Browser Use saas

English

Matthew Lam@mattlam_·1d

@SIGKITTEN @sawyerhood do you know how it compares to agent-browser?

English

360

SIGKITTEN@SIGKITTEN·1d

> there are like 100s browser agent clis (even Garry Tan has one) why use this one? because @sawyerhood s clanker web browser shit is always sota

Sawyer Hood@sawyerhood

Introducing the new dev-browser cli. The fastest way for an agent to use a browser is to let it write code. Just `npm i -g dev-browser` and tell your agent to "use dev-browser"

English

168

21.6K

Graham retweetet

dex@dexhorthy·4d

as @rauchg put it so well, shipping is much more than just coding. Shipping means testing, deploying, monitoring, maintaining, fixing at 2am, etc. Models can code but we're still figuring out if/how they can solve which parts of shipping As models write more code, the SWE's job evolves from "write working code" to "produce working code" - we're all figuring out what that means.

English

2.6K

Graham@grahamcodes·1d

@midmajortapes Aztec legend on.soundcloud.com/WnIiiKNjg1YXWk…

Magyar

541

Mid-Major Tapes 🔥@midmajortapes·1d

Man I miss 2017 Malik Pope...

English

320

25.7K

Graham retweetet

Wes Winder@weswinder·3d

this is the best thing i’ve ever seen lol why does it feel like riley is the only person building cool stuff all these parallel claude orchestration sessions and everything i see here is boring

Riley Walz@rtwlz

made my computer dramatically play BBC news music before every meeting

English

231

7.8K

688.8K

Graham retweetet

Ben Davis@davis7·3d

Been going deeper into the "code mode" stuff. Basically letting the agents write typescript to call MCPs, APIs, and etc. instead of normal tool calls or bash commands. No clue what the final form of this is yet. Really like what @RhysSullivan is working on with executor. I think it or something like it is probably the future

English

146

34.3K

Graham@grahamcodes·3d

@StatisticsFTW @noahzweben They run on Anthropic servers. It helps them grow their moat/lock in.

English

Robert Balicki (👀 @IsographLabs)@StatisticsFTW·3d

@noahzweben Where do these run? Is this a Claude code that's running on some sort of cloud server? Why is Claude the appropriate place for cron job scheduling to live?

English

1.1K

Noah Zweben@noahzweben·3d

Use /schedule to create recurring cloud-based jobs for Claude, directly from the terminal. We use these internally to automatically resolve CI failures, push doc updates, and generally power automations that you want to exists beyond a closed laptop

English

175

318

4.3K

970.9K

Graham@grahamcodes·3d

@H04642924H @amyforsandiego @TSA @ICEgov Pre Check and CLEAR are both closed at the moment

English

132

Tom Hekman@H04642924H·3d

@amyforsandiego @TSA @ICEgov Was Pre-check just as bad?

English

867

Amy Reichert@amyforsandiego·3d

RIGHT NOW: San Diego International Airport is “organized chaos” this morning. @TSA line stretched out with a 70 minute wait just to reach screening. DHS canines on site. @ICEgov expected to arrive today. And this Friday marks the 6th missed paycheck for federal TSA workers.

English

223

1.1K

83.2K

Graham retweetet

Professor Campbell@abcampbell·5d

managing a team of AIs is exactly like managing bunch of first year analysts/quants/devs massively overconfident insane ambition to practicality ratio constantly distracted by the shiny thing of trying to do your job or someone else’s confuse intelligence for judgment need constant reminders of their todo list terrible synthesizers perpetually confusing goals with tasks I need a nap

English

365

34.1K

Graham@grahamcodes·5d

@dillon_mulroy This is what the internal reasoning logs looks like.. sometimes it just leaks through their masking layer randomly lmao. I’ve see it a few times as well.

English

109

Dillon Mulroy@dillon_mulroy·6d

bro what is happening lmao

English

Dillon Mulroy@dillon_mulroy·6d

gpt 5.4 has started talking like a caveman out of the blue

English

117

11.4K

Graham@grahamcodes·5d

@nickbaumann_ Web devs feasting while native mobile devs starve

English

Nick@nickbaumann_·6d

We are dangerously close to putting Codex in autonomous loops where it picks up tickets, tests it's own changes via Playwright, and records and uploads verification mp4s to PRs. If you've never asked Codex to test your app, do try it!

OpenAI Developers@OpenAIDevs

Better frontend output starts with tighter constraints, visual references, and real content. Here’s how to build intentional frontends with GPT-5.4 developers.openai.com/blog/designing…

English

1.4K

173.6K

Graham@grahamcodes·6d

@tannerlinsley Codex desktop app or T3 Code

English

220

Tanner Linsley@tannerlinsley·6d

Ghostty was fun, but time for something else. I still love opencode, too but with CC plans dead on it… I’m feeling lost. Full GUI? T3 Code? Opencode GUI? Warp? Back to cursor? Try CC again? Raw Codex? My 🧠 hurts and I just need to keep shipping.

English

413

1.3K

332.9K

Graham retweetet

dex@dexhorthy·18 Mar

hlyr.dev/blog/stop-clau…

ZXX

10K

Graham retweetet

Matt Pocock@mattpocockuk·19 Mar

Doing some experiments today with Opus 4.6's 1M context window. Trying to push coding sessions deep into what I would consider the 'dumb zone' of SOTA models: >100K tokens. The drop-off in quality is really noticeable. Dumber decisions, worse code, worse instruction-following. Don't treat 1M context window any differently. It's still 100K of smart, and 900K of dumb.

English

155

1.2K

158.2K

Graham@grahamcodes·19 Mar

@nayshins stupid overly defensive helpers that don’t trust type safety and do paranoid runtime checks of values all the time. isRecord and isString are ones I’ve seen Codex generate multiple times

English

135

Jake@nayshins·19 Mar

Has anyone documented all the code slop patterns yet? I want to lint for them and banish them to hades.

English

201

22.2K

Graham retweetet

Professor Campbell@abcampbell·18 Mar

x.com/i/article/2034…

ZXX

110

949

380.2K

Graham retweetet

Conor@jconorgrogan·19 Mar

By 2H 2026 companies will realize that token efficiency per pull request is one of the most important differentiators for SWE talent Intelligence too cheap to meter isn't going to happen anytime soon, and the free token corporate spigots are going to be reigned in, hard

English

2.6K

Graham retweetet

Mario Zechner@badlogicgames·17 Mar

i can't speak for david. what i see is this: if you let agents build or extend a codebase with only minor or no supervision, you get unmaintainable garbage, because the agent makes terrible decisions that compound, both big and small. those decisions make it hard for both you and the agent to keep modifying the code base, until eventually it's unrecoverable. why does the agent make bad decisions? i can't tell for sure, but my gut tells me that training data can currently not capture the holistic thinking needed to design and evolve complex systems. that's one part of the problem. related to that, and oversimplified: agents output the "mean quality" of the code they saw during training. most of that code is very bad. specifically tests, which humans are terrible at writing at. another part of the problem is that specification via prompt is not precise enough, so the agent has to fill in the blanks, giving it enough rope to hang itself. the more detailed your spec gets, so the agent gets constrained and less likely to produce crap, the closer you are to handwriting the code yourself, as that's the most detailed version of the spec that can exist. so then you gain nothing. back to prompt spec it is, which means the agent fills in blanks, which means we get suboptimal or truely bad results. using agents can still be a net productivity boost (see other posts in my thread), but it is not easy to come up with consistent workflows that produce both production quality maintainable code while retaining the speed advantages agents give you.

English

288

14.9K

Graham retweetet

boris@boristane·15 Mar

slop creep is what happens when you turn your brain off and hand the thinking to coding agents each individual change is fine, but all together, you have a pile of crap we're witnessing this happen in real-time across everything boristane.com/blog/slop-cree…

English

652

89.3K

Entdecken

@wesbos @mattlam_ @SIGKITTEN @sawyerhood @rauchg @midmajortapes @RhysSullivan @StatisticsFTW