David Buxton
122 posts

David Buxton
@davidreads
Harriet co-founder; tinkerer


Sierra charges when an agent resolves a ticket, zero for failures. Devin sells Agent Compute Units, not tokens — the same abstraction Databricks & Snowflake use with credits to decouple pricing from raw compute. Margin is decoupled from the inference line. Durable.





"We created a monster" companies rein in AI usage as costs strain budgets “Compute costs are now beginning to enter the minds of both CFOs and boards. Consumers and businesses have been taught that AI is cheap or free and that is definitely not the case,” said Costi Perricos, global generative AI leader at Deloitte. ... some companies have told workers to use open-source models that can be run locally on their own servers or personal devices, reducing the bill they pay to AI labs and cloud providers. ... customers are still weighing higher costs against the promises they have made to investors about AI’s impact on their own bottom line and workers’ productivity.


Most software engineers are facing an identity crisis bordering on depression. As CTOs aggressively evangelize tokenmaxxing, a class divide ensues. The lazy. The lazy push code. They don't write it. They don't manually test it. They don't even read it. They're on autopilot. See Jira ticket, prompt for task, submit code. Many of them are barely on their computer the whole day. A comment on the PR asking why they did this? The lazy ask AI. A Slack message? The lazy ask AI. Need to prepare for standup? The lazy ask AI. As long as it sounds enough like them and isn't detected. Some of the lazy are even overemployed, and work multiple jobs. The lazy smart ones get away with this, and even rewarded. After all, software engineering for the lazy is just a dance to convince your colleagues you're smart and hard working. The craftsmen. The craftsmen are tired. Very tired. 15 PRs in queue. Slack blowing up. The entire burden of review falls on the craftsman. The burden of understanding. They try. They work their way through the code, thoughtfully commenting to improve what ships. The response? A lazy: "That's a clever idea! You're absolutely right." with an incorrect change. It's fine, the craftsman says. I can fix them. They write a doc urging his colleagues to be better. The next day? 20,000 line PR to review. Day after day, their workload grows. Bugs seep into production. No one seems to care. Another round of AI is thrown at it. Their animosity to their colleagues rises. Eventually, they give up. It's just not what it used to be. The craft they loved is dead. They eventually wake up, a lazy. This isn't all companies. Many companies are genuinely more productive, adopt the right set of principles and practices around AI development and have highly talented teams that trust each other. It tends to happen in bigger companies that are 10+yrs old with a higher talent variance. But it happens. A lot.





UP TO 95% TOKEN REDUCTION WITH ZERO CODE CHANGES A Netflix engineer just open-sourced Headroom, and it’s one of the smartest ways I’ve seen to cut LLM costs. It wraps Cursor or Claude in a local proxy to compress your payload before it hits the LLM: → Intelligently shrinks logs, JSON, and code → Perfectly preserves logic accuracy → Keeps 100% of your data local → Stops Opus-tier models from wasting tokens on boilerplate It already crossed 35K stars, which says a lot. 100% free and open-source. repo in 🧵↓






One interesting trend: I’m seeing *so many* VC-funded, internally built infrastructure and bootstrapped solutions around “building a context layer for engineering teams.” Aka trying to solve the problem of “if only eg Claude Code had the context from all your other systems”







I've been using both GLM 5.2 and kimi 2.7 code in @warpdotdev and both are very good quality. Not quite frontier imo, but you can get a ton of good building done with them. The thing that stands out though is that 1) you get 10-20x further with them for the same price compared to the frontier lab models 2) they are like 3x faster for me Having used GPT 5.5 and Opus 4.8 regularly I was surprised by how much I actually cared about the latency difference in addition to the obvious cost savings. Highly suggest folks try them







