
oxycblt
530 posts

oxycblt
@oxycblt
✝ / 21 / eng @ razorbill / asymptotic self improvement







You can use Nano Banana Pro in a "base model mode". I've created a detailed blog post on how to do that and how that looks like.






There's a growing narrative that AI token consumption is too expensive and too wasteful. Engineers are "tokenmaxxing." CFOs are nervous. Budgets are blown. The concern isn't wrong. There is waste. But it misses the structural picture. The Mental Model AI spend = users × tasks/user × tokens/task × $/token The first half — users and tasks per user — is ripping. Claude Code's adoption curve is steeper than Cursor's was at the same stage. Cowork is ramping faster than Claude Code. We're barely scratching the surface. The tension lives in the second half: tokens/task and $/token. That's where optimization happens, and where the real debate gets heated. Two Levers 1. Same work, cheaper tokens. Model routing is the highest-impact play. A routing layer that sends trivial tasks to Haiku and reserves Opus for complex reasoning can cut 60-80% of spend on eligible tasks. OSS models for commodity tasks — self-hosting Llama or Qwen for boilerplate — means zero per-token cost, swapped for GPU capex. Or the simplest strategy: wait. Token prices fall roughly 10x every 18 months. 2. Same work, fewer tokens. Prompt caching is low-hanging fruit — cache repeated system prompts, reads cost 10% of input price. Context window management — summarize history instead of re-sending full conversations. Thinking budget tuning — cap thinking tokens for simple completions, uncap for hard problems. And agent loop pruning, possibly the biggest single source of waste: most agents waste 50-70% of their tokens on redundant tool calls, retries, and pointless sub-agent spawns. Who Optimizes What Every layer of the stack targets different metrics. Infra ( $NVIDIA, $Cerebras, $Groq) optimizes tokens/watt and tokens/dollar. Model providers ( $Anthropic, $OpenAI, $Google) optimize quality/token and thinking efficiency. App layer (Cursor, Claude Code, Codex) optimizes cost/task and cache hit rates. Enterprise buyers optimize cost/engineer and ROI vs. headcount. Each layer's gains pressure the layers around it. Faster hardware forces providers to compete on price. Better models reduce the tokens apps need. Application routing erodes premium pricing. Enterprise CFOs demand all of the above. Bear vs. Bull The core question: does optimization compress AI revenue faster than new demand replaces it? The bear case is real. Rationalization is the CFO's first instinct — when the budget blows, the reaction is "finally back inside the envelope," not "let's 10x usage." Model routing drops revenue per task 10-20x. OSS is closing the gap fast. Caching is pure token destruction: cache hit = zero revenue, no new demand generated. And thinking efficiency is self-cannibalization — if Anthropic improves extended thinking by 3x, billing for the same reasoning task drops by two-thirds. The bull case is equally compelling. Current usage is cost-constrained, not demand-constrained. Companies blew their budgets and had to throttle. Drop costs 5x and every killed use case comes back. Today only coding is at scale — testing, documentation, code review, security auditing are all waiting for the economics. Penetration is still single digits. Agentic workflows are a token multiplier: a human-in-the-loop conversation runs thousands of tokens, an autonomous agent on a complex task runs hundreds of thousands. New modalities — vision, audio, video — are net-new demand that dwarfs text. And Jensen Huang's framing: a $500K/year engineer should consume at least $250K/year in tokens. At $5K, you're dramatically under-leveraging AI. Where This Lands The optimizers will win every individual battle. Every caching trick, every routing layer, every pruned agent loop will work. Cost per task will drop dramatically. But the number of tasks, the number of users, and the complexity of what gets delegated to AI will grow faster than efficiency compresses spend. Token costs are going down. Token spend is going up. Both things are true, and they aren't in contradiction. Full: open.substack.com/pub/robonomics…






HOW FUCKING DEEP DOES THE FISHER FIXATION GO???















