Graham

5.2K posts

Graham banner
Graham

Graham

@grahamcodes

staff engineer @coinbase working on AI devx for the @base team. prev: tech lead on Coinbase Advanced Trade, early team @fluidityio (acq. @consensys)

United States Beigetreten Ekim 2013
2.2K Folgt1.8K Follower
Angehefteter Tweet
Graham
Graham@grahamcodes·
ZXX
1
4
34
7.5K
Graham retweetet
prinz
prinz@deredleritt3r·
Anthropic has been testing a new model called "Mythos" with certain customers: - a "step change" in AI capabilities, including "dramatically higher scores" in coding, academic reasoning and cybersecurity - "currently far ahead of any other AI model in cyber capabilities” - part of a new "Capybara" series of models, which are larger and more intelligent than Opus - more expensive to run than Opus; not yet ready for general release
prinz tweet media
Jeremy Kahn@jeremyakahn

Exclusive: Anthropic left details of an unreleased model, exclusive CEO retreat, sitting in an unsecured data trove in a significant security lapse. Great reporting from @FortuneMagazine's @beafreyanolan fortune.com/2026/03/26/ant…

English
24
38
572
95.6K
Wes Bos
Wes Bos@wesbos·
Only cool people can reply to this
English
688
3
693
109.4K
Graham retweetet
dex
dex@dexhorthy·
as @rauchg put it so well, shipping is much more than just coding. Shipping means testing, deploying, monitoring, maintaining, fixing at 2am, etc. Models can code but we're still figuring out if/how they can solve which parts of shipping As models write more code, the SWE's job evolves from "write working code" to "produce working code" - we're all figuring out what that means.
English
2
4
26
2.6K
Graham retweetet
Ben Davis
Ben Davis@davis7·
Been going deeper into the "code mode" stuff. Basically letting the agents write typescript to call MCPs, APIs, and etc. instead of normal tool calls or bash commands. No clue what the final form of this is yet. Really like what @RhysSullivan is working on with executor. I think it or something like it is probably the future
English
15
8
146
34.3K
Robert Balicki (👀 @IsographLabs)
@noahzweben Where do these run? Is this a Claude code that's running on some sort of cloud server? Why is Claude the appropriate place for cron job scheduling to live?
English
2
0
3
1.1K
Noah Zweben
Noah Zweben@noahzweben·
Use /schedule to create recurring cloud-based jobs for Claude, directly from the terminal. We use these internally to automatically resolve CI failures, push doc updates, and generally power automations that you want to exists beyond a closed laptop
English
175
318
4.3K
970.9K
Amy Reichert
Amy Reichert@amyforsandiego·
RIGHT NOW: San Diego International Airport is “organized chaos” this morning. @TSA line stretched out with a 70 minute wait just to reach screening. DHS canines on site. @ICEgov expected to arrive today. And this Friday marks the 6th missed paycheck for federal TSA workers.
Amy Reichert tweet media
English
54
223
1.1K
83.2K
Graham retweetet
Professor Campbell
Professor Campbell@abcampbell·
managing a team of AIs is exactly like managing bunch of first year analysts/quants/devs massively overconfident insane ambition to practicality ratio constantly distracted by the shiny thing of trying to do your job or someone else’s confuse intelligence for judgment need constant reminders of their todo list terrible synthesizers perpetually confusing goals with tasks I need a nap
English
16
18
365
34.1K
Graham
Graham@grahamcodes·
@dillon_mulroy This is what the internal reasoning logs looks like.. sometimes it just leaks through their masking layer randomly lmao. I’ve see it a few times as well.
English
0
0
0
109
Dillon Mulroy
Dillon Mulroy@dillon_mulroy·
bro what is happening lmao
Dillon Mulroy tweet media
English
13
0
45
4K
Dillon Mulroy
Dillon Mulroy@dillon_mulroy·
gpt 5.4 has started talking like a caveman out of the blue
Dillon Mulroy tweet media
English
26
0
117
11.4K
Graham
Graham@grahamcodes·
@nickbaumann_ Web devs feasting while native mobile devs starve
English
0
0
0
35
Nick
Nick@nickbaumann_·
We are dangerously close to putting Codex in autonomous loops where it picks up tickets, tests it's own changes via Playwright, and records and uploads verification mp4s to PRs. If you've never asked Codex to test your app, do try it!
Nick tweet media
OpenAI Developers@OpenAIDevs

Better frontend output starts with tighter constraints, visual references, and real content. Here’s how to build intentional frontends with GPT-5.4 developers.openai.com/blog/designing…

English
77
49
1.4K
173.6K
Tanner Linsley
Tanner Linsley@tannerlinsley·
Ghostty was fun, but time for something else. I still love opencode, too but with CC plans dead on it… I’m feeling lost. Full GUI? T3 Code? Opencode GUI? Warp? Back to cursor? Try CC again? Raw Codex? My 🧠 hurts and I just need to keep shipping.
English
413
9
1.3K
332.9K
Graham retweetet
Matt Pocock
Matt Pocock@mattpocockuk·
Doing some experiments today with Opus 4.6's 1M context window. Trying to push coding sessions deep into what I would consider the 'dumb zone' of SOTA models: >100K tokens. The drop-off in quality is really noticeable. Dumber decisions, worse code, worse instruction-following. Don't treat 1M context window any differently. It's still 100K of smart, and 900K of dumb.
English
155
62
1.2K
158.2K
Graham
Graham@grahamcodes·
@nayshins stupid overly defensive helpers that don’t trust type safety and do paranoid runtime checks of values all the time. isRecord and isString are ones I’ve seen Codex generate multiple times
English
0
0
1
135
Jake
Jake@nayshins·
Has anyone documented all the code slop patterns yet? I want to lint for them and banish them to hades.
English
38
3
201
22.2K
Graham retweetet
Conor
Conor@jconorgrogan·
By 2H 2026 companies will realize that token efficiency per pull request is one of the most important differentiators for SWE talent Intelligence too cheap to meter isn't going to happen anytime soon, and the free token corporate spigots are going to be reigned in, hard
English
4
3
21
2.6K
Graham retweetet
Mario Zechner
Mario Zechner@badlogicgames·
i can't speak for david. what i see is this: if you let agents build or extend a codebase with only minor or no supervision, you get unmaintainable garbage, because the agent makes terrible decisions that compound, both big and small. those decisions make it hard for both you and the agent to keep modifying the code base, until eventually it's unrecoverable. why does the agent make bad decisions? i can't tell for sure, but my gut tells me that training data can currently not capture the holistic thinking needed to design and evolve complex systems. that's one part of the problem. related to that, and oversimplified: agents output the "mean quality" of the code they saw during training. most of that code is very bad. specifically tests, which humans are terrible at writing at. another part of the problem is that specification via prompt is not precise enough, so the agent has to fill in the blanks, giving it enough rope to hang itself. the more detailed your spec gets, so the agent gets constrained and less likely to produce crap, the closer you are to handwriting the code yourself, as that's the most detailed version of the spec that can exist. so then you gain nothing. back to prompt spec it is, which means the agent fills in blanks, which means we get suboptimal or truely bad results. using agents can still be a net productivity boost (see other posts in my thread), but it is not easy to come up with consistent workflows that produce both production quality maintainable code while retaining the speed advantages agents give you.
English
18
35
288
14.9K
Graham retweetet
boris
boris@boristane·
slop creep is what happens when you turn your brain off and hand the thinking to coding agents each individual change is fine, but all together, you have a pile of crap we're witnessing this happen in real-time across everything boristane.com/blog/slop-cree…
English
41
63
652
89.3K