Tadas Antanavicius

393 posts

Tadas Antanavicius

@tadasayy

Consulting eng leaders at 50-5,000 person orgs on effective adoption of agentic engineering via https://t.co/Z9EyZkkWs7 tadasant on most other platforms.

United States Beigetreten Eylül 2010

291 Folgt696 Follower

Tadas Antanavicius@tadasayy·12h

If the EU had it together, there could be a golden upcoming opportunity to convince Anthropic to relocate 👀

Armin Ronacher ⇌@mitsuhiko

Europeans please wake up.

English

136

Tadas Antanavicius@tadasayy·6d

The global reaction that loops need a crazy token budget misses @steipete's point. There's nothing forcing "loops" to be always-on token-burners. One example, among many: it's still a loop-prompted agent if you manually write a prompt to a "routing agent" that in the course of its agentic loop figures out what coding agent sessions to spawn - backend, DB, frontend - in parallel. The insight is in designing those layers of agentic loops so that the right chunk of work gets done at each layer of the agentic sessions that ensue. That's Peter's point: "you shouldn't be [directly] prompting coding agents anymore".

Thiago ➔ hodle.com.br@ThiagoMot_

how to make this without infinite tokens?

English

1.6K

Tadas Antanavicius@tadasayy·5 Haz

It was the combination of lack of adoption vs. the challenge of subtle long tail complexity (e.g. it started with just basic chat completions but there is such a long tail of what this sort of interaction could be). Theoretical utility is there, but both robust spec investment and per-client implementation investment was just too high to be worth it. And I do personally think that, absent solving that long tail challenge to the end, it would be too limiting to be useful. Increasingly sprinkles of inference like basic chat completions are often not as useful as workarounds (e.g. tool designing around the lack of Sampling) that hand control back to the calling agent with full context and other tools. Maybe not true for every use case, but enough of them that I understand the lack of uptake here

English

332

swyx@swyx·5 Haz

@tadasayy @walden_yan @colemurray oh! why what was wrong w sampling?

English

124

Walden@walden_yan·29 May

If you're building your own cloud agent like Devin or Ramp Inspect, there's lots of great details here on setting up VMs, computer use, memory, and more. Fun deep dive with the creator of OpenInspect on what setting up a cloud agent entails latent.space/p/cognition

English

291

29.5K

Tadas Antanavicius@tadasayy·4 Haz

@ChainZenit You don't believe agentic coding sessions should run in the cloud autonomously kicked off by external triggers (e.g. alerts)?

English

Strata@ChainZenit·4 Haz

@tadasayy Just another layer of tech debt for teams to maintain, honestly.

English

Tadas Antanavicius@tadasayy·4 Haz

Inability to save on costs may be true for teams where every individual is doing AI-assisted dev work from scratch every time they spin up a coding agent. I think the opportunity here is mostly for teams that have matured towards hardened auto-triggered, cloud agent workflows. The complexity of those is more predictable, the paths (memorialized as Skills) better-trodden. It seems many @cognition customers are near this point - @ido_pesok wrote up recently how such async sessions are eclipsing interactive ones (x.com/ido_pesok/stat…). I bet @FactoryAI customers show a similar profile. But most companies are nowhere close to this yet.

Quinn Slack@sqs

The Factory team seems very smart, and I'm eager to see how well their cost-optimizing new model router works. If they can pull it off, that is a huge accomplishment. For Amp: we may try to build a model router that picks the best model for a task, but we don't intend to build a cost-optimizing model router based on the current state of the models. Here's why. Every time we've looked into using cheaper models in Amp, we've benchmarked on tasks that reflect how people use agents for coding today. On these real tasks, the expensive frontier model was not only the best (obviously), but also usually the fastest and cheapest, when measuring end-to-end task completion. Why? Cheaper-per-token models are less capable, which means that on complex real-world tasks they spend more tokens and time fixing mistakes along the way. You can find plenty of cases where cheaper models are indeed faster and cheaper end-to-end. But such cases were rarer than we expected, and the differences were fairly small. If you can easily detect such cases, then there is an opportunity here. But even then, on the AI hedonic treadmill, once people get a taste of frontier intelligence, they don't want to go back to using those more primitive prompts where cheaper models suffice. (Which is a good part of human behavior! It's how we decided to stop living in caves!) If your tasks can be handled just as well by non-frontier models, I would strongly advise you to uplevel how you use agents and what you produce to stay competitive against people who are using frontier models. In a power-law world, with rapid intelligence advances, try to get to the frontier and stay there.

English

2.5K

Tadas Antanavicius@tadasayy·4 Haz

@sqs I use "predictable" less strongly here. For example, I would say the need for "someone to consider and merge your Dependabot PRs daily/weekly" is predictable. The coding agent session that fields those PRs probably doesn't need Opus, Sonnet will do just fine

English

Quinn Slack@sqs·4 Haz

@tadasayy Ehhh...I dunno. Who says anything is predictable or certain here? It's like the industrial revolution is happening, the steam engine was invented, a lot changed in a short amount of time, and then people said "surely this pace of change will slow down"...and it didn't.

English

684

Tadas Antanavicius@tadasayy·3 Haz

@RhysSullivan I find the most useful plugins to be the inverse: some useful external MCP server and then internally-built skills. e.g. DataDog + an internal `/triage-alert` skill

English

490

Rhys@RhysSullivan·3 Haz

Are plugins just an MCP + skills? Why not just have the MCP ship skills then it’s standard across all clients

English

193

34.5K

Tadas Antanavicius@tadasayy·31 May

I think that will be true in the long run, but it's not true right now where the bulk of AI spend is going. Most leadership is not aware that their engineers who were previously spending $20/day on tokens are now spending $30/day due to subtle API-side tweaks driving cost increases (and not increased usage by the engineer).

English

Paulo Santos@ThinkFinance999·31 May

@tadasayy The public is not corporations. Corporations will want the cheapest that gets the job done reliably in production systems. When you run a thing a zillion times, you don't accept even 10% more cost, never mind 50% or 1000%.

English

183

Tadas Antanavicius@tadasayy·30 May

I bet this is a market-induced fallout from everyone getting upset that "Anthropic nerfs their models" back when they changed default reasoning levels from high -> med a few months back Public opinion would rather have subtle cost increases for incremental performance gains like this (for now, while people don't yet care about optimizing for cost), instead of the opposite (default cost savings, which caused Anthropic's PR nightmare)

Shaun Smith@evalstate

Token Cost creep (at API default settings): $4.34 - Opus 4.6 $4.71 - Opus 4.7 $5.38 - Opus 4.8 (+24%) GPT-5.5 was $2.03 for the same prompts.

English

9.8K

Tadas Antanavicius@tadasayy·30 May

@evalstate 🤣

QME

Shaun Smith@evalstate·30 May

@tadasayy That's a roll of fifties, not a coin in the picture btw.

English

212

Tadas Antanavicius@tadasayy·30 May

I agree with the metrics sentiment here (especially anti-tokenmaxxing) but I'm not convinced that most individuals and companies have actually crossed the chasm yet. Even at forward-leaning companies I work with, under 50% of individual engineers actually have usage metrics that suggest "I am using AI agentically". They're still in the "I pair-program with AI in my sidebar" phase. There are three tiers: "No usage", "Some usage", "Agentic usage". It's worth continuing to measure those tiers until everyone is in the third bucket.

English

148

Gergely Orosz@GergelyOrosz·29 May

Angie is 100% correct (Yes, encouraging usage did make sense early on, aka last year, when there was both resistance for usage and costs were manageable)

Angie Jones@techgirl1908

called it. and yeah a year is a month in AI times

English

144

21.5K

Tadas Antanavicius@tadasayy·17 May

theinformation.com/articles/atlas…

ZXX

Tadas Antanavicius@tadasayy·17 May

I suspect this trend of moving from flat fees to usage-based will go well for some companies (e.g. model providers), and terribly for others. I don't want to pay obscure variable fees to interact with Jira. I just want to use my bucket of tokens attached to my favorite coding agent, where the cost may be variable but it's transparent, and the performance is predictable. I'd rather turn off AI features embedded in these systems of record than pay extra.

English

239

Tadas Antanavicius@tadasayy·14 May

Agree that the technical enforcement details <> spirit of the updated ToS is very murky! That's a rabbit hole of a conversation, though I think separate from the point I'm making as to the downstream impact of the decision. I think it's pretty clear that Anthropic is willing to give discounted token usage to those that buy into their proprietary product stack, and not to others just using their API. We can debate whether that's the right business/PR move, and whether it's possible to enforce at the technical level, but at the end of the day the policy absolutely will move a meaningful number of (most, IMO) people away from relying on solely claude -p into more economical solutions.

English

Shaun Smith@evalstate·14 May

I don't disagree with your central thesis and certainly agree that frontier models are way overused. This: claude -p "review this PR" and claude < "review this PR" being charged differently is absurd. Savvy users now know they are in a game of ToS enforcement cat-and-mouse. Will "automation detection" be tripped by 3rd party voice transcription? Or with keyboard macro software? Or over SSH? I don't think these questions are hyperbole at all.

English

Tadas Antanavicius@tadasayy·14 May

Unpopular opinion: as a heavy user of claude -p on Claude Max plans, I actually think Anthropic is going to come out ahead with this move. If Anthropic had to turn flat fees off for programmatic usage, OpenAI is going to have to eventually as well. Anyone relying on these flat fee subscriptions is going to churn off the idea they can count on them. It's too risky to build infrastructure and workflows wholly reliant on the subsidization now. And if most people do that, then it's a level playing field where the best mix of token cost vs. model performance wins. People will still use Opus and Sonnet. They'll probably spread out and incorporate other cheaper models too. We'll be forced to be more economical and stop bringing the Ferrari for workloads where a Prius will do. The ecosystem is going to do what we always do: innovate and figure out how to accomplish the same outcomes with more efficient token spend, so the API fees become stomachable. I expect to churn off of my Claude Max accounts, but I think I'll continue to pay a decent amount in API fees to Anthropic, and probably a bunch to other model providers too where I figure out opportunities to downshift from Opus without meaningful performance loss.

Matt Pocock@mattpocockuk

This is the clarity we've been crying out for. But it's a poisoned chalice. This is a 10X cut to claude -p disguised as a monthly bonus. Anthropic is discouraging any kind of programmatic usage. And that's fine - no subsidy lasts forever. But it's time to try Codex.

English

420

Tadas Antanavicius@tadasayy·13 May

x.com/i/article/2054…

ZXX

904

Tadas Antanavicius@tadasayy·26 Mar

I say this as someone very passionate about the promise of AI-enabled productivity. But clankers that scrape and automate web browsers are not the way. Service provider APIs must be respected. MCP servers let them set the terms of what's meant for agents. That is how we all align

English

273

Tadas Antanavicius@tadasayy·26 Mar

GitHub is being overrun by bots. Reddit just announced "bots must wear name tags". X is several months into a big anti-bot campaign. It's now a real problem - expect all those clankers impersonating humans to be fully blocked from doing anything useful very soon.

English

231

Entdecken

@steipete @walden_yan @colemurray @ChainZenit @cognition @ido_pesok @FactoryAI @sqs