Strider
3K posts

Sabitlenmiş Tweet
Strider retweetledi

Inference got a hundred times cheaper this year. The compute bill went up anyway.
If you understand why those two sentences are both true at the same time, you understand the most important thing happening in AI right now.
I work on inference for a living, at @nebiustf, where we run open-source managed inference at scale. Most of what follows is what I'm seeing from inside the bill.
12 months ago, the cost of 1M tokens of frontier-class reasoning was somewhere on the order of $60.
Today, an equivalent quality of output costs roughly $0.50.
Price /token of o1-level intelligence has dropped about a 128x in a year.
Price of GPT-4-level output has dropped roughly 100x since the original GPT-4 shipped.
By any normal reading of a technology cost curve, this should be deflationary. It should be saving customers money.
The opposite has happened. The total compute bill at every hyperscaler is going up, not down. Anthropic just signed multi-year capacity deals with both XAI and Amazon. Microsoft's Azure capex guide for 2026 starts with an eight. OpenAI is reportedly spending more on compute every quarter than it did in all of 2023. Nvidia paid roughly twenty billion dollars to acquire Groq, an inference-specialist company that did not exist as a serious commercial entity three years ago.
The cost curve and the demand curve crossed, and then the demand curve lapped the cost curve.
Here is what happened underneath.
A reasoning model burns roughly 10x the output tokens of a non-reasoning model on the same task, because it spends most of its tokens thinking out loud before answering. An agentic workflow chains roughly twenty times the requests of a single-shot completion, because it loops, calls tools, plans, retries, and synthesizes. A modern deep-research query (the kind a research analyst can fire off in fifteen seconds and then walk away from for ten minutes) costs more compute than 10 original GPT-4 queries combined. We made every individual token a hundred times cheaper, and then we built a generation of products that consume ten thousand times more tokens.
This is the Jevons paradox playing out at trillion-dollar scale, in compressed time, in front of everyone. Jevons noticed in 1865 that making coal-burning more efficient did not reduce coal consumption. It increased it, because efficiency unlocked uses that were previously uneconomic. Steam engines became more practical at smaller scales. Whole industries that could not afford coal at the old price suddenly could. Britain's coal consumption rose sharply, not despite the efficiency gains, but because of them.
The same thing is happening to AI compute right now and it is happening faster than any analogous historical cycle. Falling token prices did not contract demand. They unlocked agents, deep research, code-writing systems, multi-step reasoning, persistent memory, the entire next layer of AI products. Every product in that next layer consumes orders of magnitude more compute than the chat interfaces it is replacing.
The math at the aggregate level is brutal: 100x cheaper tokens times 10 000 more tokens equals a 100x larger total bill.
The implications stack quickly.
If you are running a hyperscaler, your 2026 capex guide is not a peak. It is a step on a curve. Inference is structurally always-on, twenty-four hours a day, in a way that training never was. Training is bursty. You spin up a cluster, run for weeks or months, and stop. Inference runs continuously, scales with usage, and the usage curve is exponential. Your power bill, your cooling bill, your transceiver count, your storage footprint, all of these were sized for a workload mix that no longer exists.
If you are running an AI software company built on top of someone else's closed API, you have a problem that did not exist a year ago. Your gross margins get worse as your customers get more value out of your product, because the more they use it, the more compute you pay for. The companies that win this are the ones that figured out vertical integration before the math caught them.
If you are watching this from a distance and trying to understand where the next bottlenecks form, the answer is everywhere downstream of "more inference compute, always-on, with massive memory state per session." The KV cache, the running memory state of a long conversation or an agent loop, is the silent monster of the inference era. It does not scale linearly with parameters. It scales linearly with context length and number of agent steps. A long agent session can hold tens of gigabytes of state per user, per session.
Multiply that by every concurrent user of every product, and you understand why $MU, $SNDK, $TOWCF, and the entire memory and packaging layer have re-rated the way they have.
The CPU-to-GPU ratio is evolving. Training is 1:8. Basic chat inference is 1:4. Agentic inference is 1:1, sometimes CPU-heavy. Google has split its TPU line in two, with a dedicated inference chip carrying tripled SRAM for KV cache. $INTC and $AMD just spent two earnings calls explaining that this shift is structural, not cyclical. The hardware map is redrawing in real time and the financial press is mostly still writing about training clusters.
The right framing of where we are right now is not that AI is hitting a wall. The framing a year ago that scaling was hitting a wall was the most expensive bad take of the cycle. The right framing is that AI got dramatically cheaper, dramatically more capable, and dramatically more useful, and the cost of running it at the new equilibrium of demand is much higher than the cost at the old equilibrium of demand, because the new equilibrium is enormous.
A meaningful share of what we actually do at Token Factory, day to day, is help customers stop their bills from running away from them. KV-cache management. Speculative decoding. Quantization. Routing. The kind of vertical integration that, eighteen months ago, every product team was happy to leave abstracted away behind a closed API. The reason this stack matters now is the same reason this whole essay matters: at the new equilibrium of inference demand, the cost of treating compute as a commodity is no longer survivable. The companies that figure out the layer beneath the API are the ones who keep their margins.
Cheaper tokens. More tokens.
Same coal as 1865.

English

@kwaker_oats_ I assume very few new joiners made enough money last cycle to still be around here
English

Okay first target reached. We did it Intel incels.
DeepDiveDenny@deepdivedenny
People are going to sell Intel way too soon. Don't sell a dime before $100.
English

@SamuraiTakedown inflation in europe is much lower than in the US you are not loosing that much tbh
English

@nobraintrader1 After that run I think it’s not the same asset anymore. Just normal market dynamics that the previously fastest runner is getting bid on great news especially after having such a massive correction
English

@MelonConnsoieur Keep your head up man and happy birthday may you blessed with health!
English
Strider retweetledi

Blog post: AI Bubble or Not? continuations.com/ai-bubble-or-n…
English

Pine Analytics wrote a bear case on TAO - worth reading in full. But it makes analytical errors I've seen before.
In early 2024 you could have written a similar report about Hyperliquid. Overvalued relative to current fee revenue, no moat since anyone can build a perp DEX, competitors can replicate the tech. That bear case was correct on the snapshot and catastrophically wrong on the trajectory. What it missed was the mechanism that made usage and token value the same thing. Pine is making the same mistake here.
The report evaluates Bittensor as a single company measured against aggregate external revenue. This is the wrong framework for a network where value accrual operates at the subnet token level, flows through protocol-native AMMs, and mechanically creates TAO demand regardless of which specific subnets win.
The revenue numbers are stale. Pine cites Chutes at $1.3-2.4M annualized from an October 2025 source. Chutes is at $5.8M today and growing fast. Targon is at $10M. Lium is at $5.3M. Probably a dozen subnets on track for $1M+ ARR this year. The aggregate is closer to $25M+, not $3-15M, and these are compounding quarter over quarter from a standing start barely a year ago.
The subsidy math uses those same outdated figures. Plug in current numbers and the ratio compresses from 22-40:1 to roughly 8:1. Still subsidized but compressing every quarter. And the framing misunderstands the supply side anyway. These miners are running idle hardware at marginal cost. The cost advantage is structural, not only subsidized.
The part Pine completely misses is the buyback flywheel. Leading subnets currently commit large portions of external revenue to alpha token buybacks through the enshrined DEX (this trend will only grow), and every buyback is mechanically a TAO purchase. Revenue flows into buybacks, the token appreciates, more TAO stakes into that subnet, emission share increases, compute improves, more revenue comes in. A competitor who copies the model weights starts that flywheel at zero. That's the switching cost.
Pine says no moat because models are open source. You may be able to fork the code, but you cannot fork the mining pools, the validator set, the subnet composability, or a billion dollars in cumulative emissions that took years to build.
The pricing vise argument has a shelf life. Templar trained Covenant-72B. Ridges is producing novel coding agents through competitive tournaments no single lab replicates. As subnets produce unique intelligence rather than serving existing weights, "download it from Hugging Face" stops working. And every model the Chinese labs release makes Bittensor better immediately. They bear the training cost, Bittensor captures the value downstream.
I will concede this bear case applies to pure emission-capture subnets with no revenue flywheel (or reasonable path to one), and there are plenty of those. But TAOFlow is designed to kill them. No buying flow means losing emissions. Sustained negative flow means deregistration. Bottom line: you don't value Y Combinator by averaging in the failures, especially when the accelerator kicks out the ones that aren't working.
Pine Analytics@PineAnalytics
English

ripping a pod with a criminally under followed homie tomorrow @Cheshire_Cap
if you have q’s you can lmk (i may ignore and selfishly ask mine)
English

Still not interested in knife catching gold here. Really feels like gold is either going to 0 or infinity, no in between.
We're close to the 2016 gold bottom OI and we're still arguably in a bull market. Not sure what to make of it.
Noodle@YippoNoodle
In 3 years of watching gold daily I’ve never seen PA like this. I genuinely don’t understand it so I’m just staying pat, not adding (a lot) or cutting. OI is getting deleted, almost 100k shaved off as people tap out. The beta 1 narrative has basically become self-fulfilling. Not trying to knife catch a paper slam yet. There’s still ~30 days until these futures expire and are forced to recouple with the physical market. Too early if that’s the trade.
English

@koreanjewcrypto @OracleHershiser no way fuck, hope he is at peace wherever he is now.
English

Got the really bad news that our brother @OracleHershiser passed over the weekend. He was one of the good ones gone too soon 😔
Rip king 🙏
English











