The Grid

97 posts

The Grid banner
The Grid

The Grid

@The_GridAI

The spot market for AI Inference

Katılım Eylül 2025
62 Takip Edilen865 Takipçiler
Sabitlenmiş Tweet
The Grid
The Grid@The_GridAI·
1/ The world’s most valuable commodity doesn’t have a live market…yet. Inference today lacks a standardized way to measure, trade, and allocate it. This problem is existential for builders. There is a better way. The Manifesto for the Intelligence Revolution is here 👇
English
40
85
157
140.9K
The Grid
The Grid@The_GridAI·
Fixed pricing makes sense when usage was predictable. Whereas for AI, demand is spiky and supply is constrained, thus the value per token varies a lot.
Carl D@CahlDee

"There's no world in which pricing doesn't significantly evolve when the technology is changing this quickly." - @nickaturley, Head of ChatGPT Agreed. The problem the labs are facing is that they've set a fixed price when the value is variable. Let me explain... Today, AI typically has a fixed cost. Nice for budget planning, but completely disconnected from the value of what's being delivered. This may continue at the retail level, but I believe we're likely to see it evolve to be more like the airline industry. What's the value of a seat on a plane? Is it the same when all seats are available vs when this flight and all those that follow are sold out? Of course not. At any given time, AI providers can only serve so many tokens (AI output). When that limit is reached, they're tapped out. Seats on the plane are full. To fly a plane, airlines have a base cost. They price aggressively low to get as close to covering that cost as possible. As they approach that level, and certainly as they pass it, seat costs go through the roof. AI providers face the same challenge. They need to serve a given number of tokens every second to cover their costs. Often they're nowhere near that level. To build a sustainable business, they need to OVERCHARGE everyone at all times to cover the difference. Instead, I expect we'll move to dynamic pricing. Pricing that reflects demand. @AnthropicAI already announced a light version of this with peak and off-peak rate limits. That's a start. But the natural evolution is toward an order book. Suppliers set how many tokens they're willing to sell at which price levels (limit orders). Demand determines market rate. This can be completely transparent to users — they just get the best price on the book (market buys). Once derivatives evolve, you get budget planning through options and futures. The primitives exist. They're working at scale. And they're coming to AI.

English
0
0
8
125
The Grid
The Grid@The_GridAI·
“Tokens are the new commodity.” Commodities always evolve the same way: ▫️Grades emerge ▫️Tiers standardize ▫️Markets set the price This is already happening with AI Inference on The Grid: Text Standard → speed & throughput Text Prime → premium reasoning You're buying a performance tier, instead of a model. And, just like every commodity market, price discovery is left to the market.
English
1
0
10
347
The Grid
The Grid@The_GridAI·
When the performance gap between top models is small and getting smaller, winning teams stop chasing benchmark points. Instead, they're focusing on running reliable inference at the lowest cost.
The Grid@The_GridAI

x.com/i/article/2031…

English
0
0
8
192
The Grid
The Grid@The_GridAI·
If intelligence becomes a utility, it will behave like one. The history of electricity, oil, and bandwidth is similar: fragmented supply and fluctuating demand led to a market. The same will be true of AI inference.
English
2
1
9
404
The Grid
The Grid@The_GridAI·
This is the real problem for builders. If two models reach the same ceiling given enough inference, the model name matters less and less. What most really want is a performance tier at the best available price.
Michael R. Bock@michaelrbock

1/ OpenAI just launched GPT-5.4 Pro, their premium model at 12x the API cost of standard GPT-5.4. $30/M input tokens, $180/M output vs. $2.50/$15. I ran TaxCalcBench on Pro. The result: exactly tied with standard GPT-5.4 12x the price, 0% improvement But the full story is more nuanced:

English
1
3
10
345
The Grid
The Grid@The_GridAI·
2/ When performance is this close across this many producers, choosing a single vendor stops being a technology decision. It becomes a procurement decision.
English
0
0
7
76
The Grid
The Grid@The_GridAI·
1/ The best AI model scores 89% on knowledge benchmarks. The average of the top 10 scores 84.5%. That’s a gap of just 4.5 points. But pricing across providers for the same model can vary far more. In many cases, you’re paying a premium your users will never notice.
The Grid tweet media
English
1
3
13
260
The Grid
The Grid@The_GridAI·
@rohanpaul_ai Thanks for reading. What we’re hearing time and time again is that people are fed up with having to do the integration all over again when they move models.
English
0
0
0
60
Rohan Paul
Rohan Paul@rohanpaul_ai·
@The_GridAI Looks like a brilliant idea "Instead of choosing a specific model from a specific provider, you purchase Units of the instrument."
English
1
0
2
201
The Grid
The Grid@The_GridAI·
Electricity is just one of many inputs. The commodity is the inference itself and it's metered in tokens The actual markets form around the labor or task being done: coding/USD, text/USD, image/USD etc These can be standardized because there are expected floors for quality/intelligence, latency and other measurable specs Kind of like a barrel of oil is a standard unit (the llm token) but there are different grades like brent, wti, crude, raw, refined for diff use cases. And these are the markets
English
1
0
1
25
Steph Curdy
Steph Curdy@Steph_Curdy·
@The_GridAI Standard unit of inference would seem to be electricity because each model has different subjective performance per token, no?
English
1
0
0
32
The Grid
The Grid@The_GridAI·
@Steph_Curdy You need standardized units of inference based on the delivered outcome. LLMs are metered in tokens and defined by specs like an intelligence floor, latency, context window etc. Once the unit is standardized, suppliers can compete behind it with different models and hardware
English
1
0
1
39
Steph Curdy
Steph Curdy@Steph_Curdy·
GPUs make available the commodity (inference) and a whole host of other commodities (everything else GPUs can do?). How would one aggregate inference (the commodity) across all hardwares in all geographies in order to get the best global pricing? Or what's the right question here?
English
1
0
0
40
The Grid
The Grid@The_GridAI·
The same is true for inference. There is no standard unit. No standard benchmark. So you can’t really compare prices across providers. Real price discovery occurs when everyone is quoting the same unit.
Yano 🟪@JasonYanowitz

This alone is a $1B+ idea. Price discovery for compute is a huge issue. If you want to price compute, you have to email/call every neocloud. Everyone gives a different price. Some deal direct, others push you to brokers. + all the creative financing deals complicates price discovery. Compute markets solve this.

English
0
0
8
437
The Grid
The Grid@The_GridAI·
We're at the @clawcon in NYC. Obviously there's heaps of lobsters 🦞 Who wants to link up with our CEO, @__sishir?
The Grid tweet media
English
1
0
9
300
The Grid
The Grid@The_GridAI·
Nvidia’s data center revenue shows where AI is heading. It is becoming an operating cost center. Product teams used to optimize for: ▫️Cloud costs ▫️API costs ▫️Vendor lock-in The next frontier will be managing inference costs.
The Grid tweet media
English
7
3
15
1.5K
The Grid
The Grid@The_GridAI·
You’re right that mission-critical systems won’t optimize for the lowest price. They optimize for reliability. But most inference usage isn’t mission-critical. The "good enough" at a significantly lower price is what will democratize AI and make it available to everyone. Low-cost android phones aren't iphone pro max-es, but they're what makes it possible for billions of people all over the world to have the internet in their pocket.
English
1
0
0
14
David Wall
David Wall@DavidWall9987·
The shrinking benchmark gap is a distraction from a fundamental category error. AI isn’t a “discount toaster” or a generic disk drive where “good enough for the price” is a winning strategy. This industrial-age obsession with cost-per-token ignores the true pillars of AI competitiveness: Data Provenance, Trustworthiness, and Reasoning Stability. Why are we so focused on a race-to-the-bottom for price while ignoring the “hallucination tax” of cheap, commoditized models? In mission-critical systems, the real moat isn’t a hot-swappable spot price; it’s the verifiable reliability of the intelligence. If the training data is a black box and the output lacks consistent stability, it doesn't matter how cheap the inference is—it's a liability, not a commodity. High-fidelity intelligence requires a commitment to quality that the “disk drive” mindset simply cannot deliver.
English
1
1
0
43
The Grid
The Grid@The_GridAI·
Models are becoming commodities. The question isn’t “what’s the top model?” It’s “what performance tier is good enough for the price?” Markets make that tradeoff explicit. That’s what we’re building.
Sishir@__sishir

x.com/i/article/2024…

English
9
3
31
14.6K
The Grid retweetledi
Carl D
Carl D@CahlDee·
Once LLM quality is "good enough", most use cases will lean toward speed and cost optimizations. Google gets this.
Aakash Gupta@aakashgupta

Google just priced intelligence at $0.25 per million input tokens. Let that math sink in. Gemini 3.1 Flash-Lite costs 4x less than Claude 4.5 Haiku on input ($0.25 vs $1.00) and 3.3x less on output ($1.50 vs $5.00). It runs 2.5x faster time-to-first-token than Google’s own 2.5 Flash. And it scores 86.9% on GPQA Diamond, which beats larger Gemini models from previous generations. This tells you everything about where the AI model war is actually being fought right now. Everyone’s watching the frontier models compete on reasoning benchmarks. The real war is in the efficiency tier, where the actual infrastructure bills get paid. Here’s why. Enterprise AI is at roughly 10% adoption heading toward 50%. The workloads that drive that adoption curve aren’t complex reasoning tasks. They’re translation, content moderation, intent routing, catalog processing. Millions of calls per day where the difference between $0.25 and $1.00 per million tokens compounds into hundreds of thousands of dollars per month. Google is doing something specific here. They’re using 3.1 Flash-Lite as a wedge to lock developers into the Vertex AI ecosystem on high-volume workloads, then upselling them to 3.1 Pro for complex reasoning at $2.00 per million input. The cascading architecture play: cheap model handles 90% of requests, expensive model handles 10%. Total cost drops by 80%+ versus running everything through a frontier model. OpenAI sees the same dynamic. GPT-5 Nano is priced at $0.05/$0.40 per million tokens. That’s 5x cheaper than Flash-Lite on input. The efficiency tier is becoming a loss leader for ecosystem capture. The company that wins the next 2 years of enterprise AI is the one whose cheap model is good enough to run every log file, every customer chat, every moderation call without exhausting the cloud budget. Google just made their bid.

English
0
1
1
228