Pete Hodgson (@thepete.net on bluesky)

7.7K posts

Pete Hodgson (@thepete.net on bluesky) banner
Pete Hodgson (@thepete.net on bluesky)

Pete Hodgson (@thepete.net on bluesky)

@ph1

@thepete.net on bluesky Independent consultant helping engineering teams tackle thorny problems. Sociotechnical architect 🧐. Formerly Earnest, ThoughtWorks.

Pacific Northwest, USA Katılım Mart 2009
469 Takip Edilen3.2K Takipçiler
Pete Hodgson (@thepete.net on bluesky) retweetledi
Joe Walnes
Joe Walnes@joewalnes·
Modern macOS contains a fully local inference model. No network calls, stays fully on device. Here's a single file script to turn it into an OpenAI API compatible completions server: github.com/joewalnes/ones…
Joe Walnes tweet media
English
3
1
8
936
Pete Hodgson (@thepete.net on bluesky)
@shubhamJReacts @mattpocockuk I think you have it backwards. Markdown is interpreted pretty permissively, but HTML way more so. HTML is probably the most permissively interpreted file format out there. Renderers and parsers will wade on no matter how malformed it is.
English
1
0
0
113
Pete Hodgson (@thepete.net on bluesky)
@techgirl1908 100% agree on "just give it more context" being unhelpful. But I remain skeptical on automatically managed memories, until I see compelling results. I'm not ready to trust the quality of context being injected behind the scenes, at least when it comes to coding agents.
English
0
0
1
54
Angie Jones
Angie Jones@techgirl1908·
The more I work with agents, the more I'm convinced that "just give it more context" can't be the whole answer. I'm not seeing enough discourse about memory. More specifically, memory design... like what gets stored, what gets retrieved, what gets summarized, what triggers the agent to look things up again. I'll be spending time with @oracledevelopers soon, getting hands-on with agentic memory patterns. Very excited to get into the weeds!
English
25
5
112
13.4K
Pete Hodgson (@thepete.net on bluesky) retweetledi
dex
dex@dexhorthy·
the funniest thing about the token grift is most folks who pushed token burn in q1 are now having a falling out with their CFOs because they don’t have a metric that correlates to business outcomes Inputs -> outputs -> outcomes If you can’t measure revenue, measure KPIs If you cant measure KPIs, measure customer outcomes If you cant measure customer outcomes, measure task throughput (features, tickets, bugs) If you cant measure task throughput, measure work throughput (PRs) If you cant measure PRs, measure LOC If you cant measure LOC, measure tokens if you’re a leader and you’re not focused on improving your ability to measure things that matter, you’re cooked
Alex Bouaziz@Bouazizalex

Token spend will be on your next performance review. Maybe not next quarter. But soon. Boards and CEOs are already asking. Everyone bought Claude Code, Cursor, and a dozen other AI tools. Nobody can tell you what came out of it. Adoption isn't proficiency, and most companies have zero idea who's actually getting value from any of it. Deel Engage closes that gap. We integrate with Anthropic and every major LLM. AI usage lands next to KPIs, feedback, and competencies in your reviews module. One view of AI maturity across every location, time zone, and employment type. No manual stitching. What we measure: token spend across every major LLM provider. Where direct data isn't available, we approximate from usage patterns. One number, consistent across every tool and team. Is it the whole story? No. It's gameable. Anyone can burn tokens to look busy. But it's a real signal in a space where most companies have zero. And as Anthropic and the other model providers ship deeper analytics, Engage absorbs them. Sharper signal, faster than you could build it. Your next review cycle is the test. Walk in with data, or walk in guessing. Deel Engage is the difference! Full article below

English
8
8
96
14.4K
Pete Hodgson (@thepete.net on bluesky) retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English
1.2K
2.5K
20.8K
4.4M
Pete Hodgson (@thepete.net on bluesky) retweetledi
Matt Pocock
Matt Pocock@mattpocockuk·
Doing some experiments today with Opus 4.6's 1M context window. Trying to push coding sessions deep into what I would consider the 'dumb zone' of SOTA models: >100K tokens. The drop-off in quality is really noticeable. Dumber decisions, worse code, worse instruction-following. Don't treat 1M context window any differently. It's still 100K of smart, and 900K of dumb.
English
151
60
1.2K
159.9K
Pete Hodgson (@thepete.net on bluesky) retweetledi
boris
boris@boristane·
slop creep is what happens when you turn your brain off and hand the thinking to coding agents each individual change is fine, but all together, you have a pile of crap we're witnessing this happen in real-time across everything boristane.com/blog/slop-cree…
English
39
63
649
90.2K
Pete Hodgson (@thepete.net on bluesky)
Being an Old, I have a bit of nostalgia for The Good Old Days of OSS where you shared a thing and maybe some people used it, and there wasn't any influencing or fancy websites or weird drama. It's nice to rediscover that vibe in the 3D printing community...
Pete Hodgson (@thepete.net on bluesky) tweet media
English
1
0
2
146
Sam Parr
Sam Parr@thesamparr·
How is everyone getting team adoption for Claude? I spent a lot of time on Twitter, as do you. We see all this AI stuff popping up. We're on top of it, or at least sorta. I know what's going on and are testing all these fringe ideas. But how are all you people getting your team to actually use it effectively without spending all their time on Twitter and learning, which we know they won't and probably shouldn't be?
English
271
20
512
285.9K
Pete Hodgson (@thepete.net on bluesky)
@0xblacklight Amazing write-up! Can I steal your subagent context window visualization for a presentation (w. credit!)? Also FYI in "Distributing Tools with Skills" you say you can't package MCPs, scripts etc. in a skill. It's true, but Claude Code's plugins solve exactly for that.
English
1
0
1
108
Pete Hodgson (@thepete.net on bluesky) retweetledi
dex
dex@dexhorthy·
Here’s what’s gonna happen: - you replace your code review with feedback loops (sentry, datadog, support tickets, etc) - you stop reading the code - software factory fixes everything - one day something breaks at 3am, agent can’t fix it - nobody’s read the code in 3 months - you have 3 weeks of downtime trying to re-onboard and fix it - you lose significant % of your contracts and users - your company is now dead
dex@dexhorthy

@gregpr07 this may surprise you that thus is coming from me but I think we’re in for a 1-3 year period where stuff might break at 3am and if you’re relying on loops to fix it and nobody understands what’s under the hood, you’re looking at an existential threat to your company

English
250
559
6.8K
596.4K
Pete Hodgson (@thepete.net on bluesky) retweetledi
dax
dax@thdxr·
sent this to the team today everything great comes from being able to delay gratification for as long as possible and it feels like we're collectively losing our ability to do that
dax tweet media
English
256
705
6.9K
979.6K
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Exactly one year ago (10 mar 2025), Dario Amodei: "I think we will be there in 3-6 months, where AI is writing 90% of the code. And then, in 12 months, we may be in a world where AI is writing essentially all of the code." This turned out to be... too darn accurate.
English
234
165
3.8K
455.1K
dex
dex@dexhorthy·
Yes harness used to be called agent but we can’t have nice things
dex@dexhorthy

@leo_trapani Yeah well that used to be called “agent” but the word “harness” was made necessary because the saas slop industrial complex broke our terminology by trying to call everything an “agent”

English
3
0
13
3.2K
dex
dex@dexhorthy·
can we plz settle on what a "harness" is quickly harness == the coding agent (claude code, codex, etc) apparatus == the stuff you build around the coding agent (backpressure, mcp, ralph wiggum, etc etc) i'm tired of this blurry line where ppl use "harness" to mean both the agent and the stuff you build around it
English
94
13
282
35.5K
Pete Hodgson (@thepete.net on bluesky)
@somi_ai @dexhorthy Even better is to write a custom lint rule that checks for await with cookies, and make sure the agent lints after every edit. There are lots of more nuanced guardrails that can't be expressed just via static analysis, but where you can it's a huge win IMO
English
0
0
0
45
Somi AI
Somi AI@somi_ai·
@dexhorthy yeah we found this too. the fix was keeping CLAUDE.md under 200 lines and being brutally specific. vague rules like 'follow best practices' get ignored, but 'always use await with cookies()' gets followed every time
English
3
1
59
5.3K
Peter Steinberger 🦞
Peter Steinberger 🦞@steipete·
Been wrangling a lot of time how to deal with the onslaught of PRs, none of the solutions that are out there seem made for our scale. I spun up 50 codex in parallel, let them analyze the PR and generate a JSON report with various signals, comparing with vision, intent (much higher signal than any of the text), risk and various other signals. Then I can ingest all reports into one session and run AI queries/de-dupe/auto-close/merge as needed on it. Same for Issues. P rompt R equests really are just issues with additional metadata. Don't even need a vector db. Was thinking way too complex for a while. There's like 8 PRs for auto-update in the last 2 days alone (still need to ingest 3k PRs, only have 1k so far).
Peter Steinberger 🦞 tweet media
English
422
210
4.1K
570.3K