Thaki Cloud

39 posts

Thaki Cloud

@thakicloud

Enterprise AI infrastructure provider with a full-stack, on-premises platform, for organizations that need to run AI workloads on their own infrastructure.

Seoul, South Korea Katılım Mayıs 2026

81 Takip Edilen10 Takipçiler

Sabitlenmiş Tweet

Thaki Cloud@thakicloud·5d

x.com/i/article/2067…

ZXX

Thaki Cloud@thakicloud·1d

Micron just signed a multi-year supply deal🤝, co-designed memory architecture with, and took an equity stake alongside Anthropic. The headline is the $965B valuation. The real story is buried in the technical details. Here's what most coverage is missing: LLM inference isn't compute-bound, it's memory-bound. Every token generated means rereading the full model weights from memory. GPUs sit idle waiting on bandwidth, not flops. That's why Micron getting to co-design HBM, DRAM, and SSD architecture specifically for Anthropic's workloads matters more than the dollar figure attached to it. We see this pattern constantly with enterprise teams running their own inference stacks: the lever that actually moves cost and throughput is memory tuning — KV cache placement, batching against HBM capacity, model placement across the hierarchy — not just adding GPUs. If frontier labs are now treating memory architecture as a competitive moat worth co-designing, that's a strong signal for anyone deciding between API dependency and owning their inference stack. Curious where others land: as memory becomes the bottleneck everyone's racing to control, does that make the case for self-hosted inference stronger or weaker? We wrote up the full technical breakdown — memory hierarchy, what co-design actually changes, caveats — on our blog: thakicloud.github.io/en/news/micron… Talk to us about your AI infrastructure: sales@thakicloud.com #Micron #Anthropic #HBM #SeriesH #MemoryBandwidth #KVCache #PagedAttention #vLLM #AIInfrastructure #TCO #OnPremAI #EnterpriseAI #ThakiCloud

English

Thaki Cloud@thakicloud·5d

Uber burned its entire 2026 AI budget by April. 💸 Gartner says 40% of agent projects will be cancelled by 2027 — not technical failure. Economics. The agentic cost problem isn't a line item. It's a full-stack infrastructure problem. 📔Read More on Substack: thakicloud.substack.com/p/uber-claude-… #Uber #SamAltman #OpenAI #Anthropic #Gartner #AgenticAI #AIEconomics #EnterpriseAI #AIInfrastructure #LLMOps #AIAgents #TokenEconomics #AIObservability #PrivateCloud #AIStrategy #ThakiCloud

English

210

Thaki Cloud@thakicloud·5d

#SemiAnalysis just spent a research report taking down a viral stat: "half of 2026 US datacenter capacity is delayed or canceled." Their finding — the panic is built on AI-generated forecasts that take press releases at face value. An announced 5GW campus with a "Contact Us" button and zero satellite evidence of construction gets counted as "delayed" capacity. A moratorium bill in a county with no datacenters gets counted as a setback. ERCOT's interconnection queue shows 410GW of requests — but only 45GW are real, trackable projects. The other 311GW is what they call "phantom" demand. The irony: the same report discloses spending $171,476 in a single week on Claude Code internally — while warning that "Claude Coded" datacenter models are the exact source of the bad data circulating across financial media. The lesson generalizes well beyond datacenters: AI is extraordinarily good at compiling information that looks authoritative and is structurally wrong, because it takes inputs (a press release, an LOI, an announcement) at face value without the domain judgment to know which inputs are real signals and which are noise. Verification — satellite imagery, permitting filings, on-the-ground checks — is still the thing doing the actual work here. The AI just made it faster to be wrong at scale. #SemiAnalysis #ClaudeCode #DataCenters #AIInfrastructure #Anthropic #ERCOT #AIForecasting #EnterpriseAI #AIGovernance #DatacenterConstruction #ThakiCloud

English

Thaki Cloud@thakicloud·6d

Most enterprise AI bills aren't growing because AI is expensive. They're growing because no one is watching. Max Brodeur-Urbas at Gumloop put out a sharp breakdown of the 7 patterns he sees repeatedly across companies whose AI spend is spiraling. A few that stood out: One company switched internal agents from Claude Opus to an open source model at a 93% cost reduction — and nobody noticed a difference in quality. The frontier model was being used out of habit, not necessity. At one large tech company, employees discovered that no one in the top 10 token consumers was laid off. Token consumption became a job security strategy, not a productivity tool. The incentive structure created the waste. A healthcare team's monthly agent bill jumped from $12,000 to $68,000 in six weeks. The root cause — a retrieval fault pulling documents 8x larger than needed — only appeared through unified telemetry, two weeks after it had already hit the invoice. The pattern across all seven sins is the same: AI spend grows in the dark. The fix isn't spending less on AI — it's building the observability, governance, and infrastructure discipline to see exactly what's happening and why. Enterprises running AI on infrastructure they control have a structural advantage here. Every tool call, every token consumed, every agent decision is visible in logs they own — not buried in a managed platform's aggregate billing dashboard. The curve doesn't have to go up and to the right forever. But it won't flatten on its own. Read the original post: linkedin.com/pulse/7-deadly… #Gumloop #MaxBrodeurUrbas #AISpend #EnterpriseAI #AIAgents #AgenticAI #AIGovernance #LLMOps #AIObservability #TokenEconomics #AIInfrastructure #PrivateCloud #AIStrategy #ThakiCloud

English

100

Thaki Cloud@thakicloud·17 Haz

Uber burned its entire '26 AI budget by April. Gartner says 40% of agent projects will be cancelled by 2027. The agentic cost problem isn't a line item. It's a AI infrastructure problem. #Uber #SamAltman #OpenAI #Anthropic #Gartner #AgenticAI #ThakiCloud thakicloud.substack.com/p/uber-claude-…

English

Thaki Cloud@thakicloud·16 Haz

@MilkRoadAI The neocloud cost advantage over hyperscalers is real and durable. The more interesting question is what sits below that layer — dedicated infrastructure where you own the economics entirely, not just rent them more cheaply.

English

Milk Road AI@MilkRoadAI·15 Haz

Chamath Palihapitiya just dropped the number that explains the entire AI infrastructure trade (Save this). A gigawatt of compute now costs $100 billion and when he started his Arizona data center project it was $4 to $5 billion, it has gone up 20x in a single investment cycle. The implication is not just that AI infrastructure is expensive but rather that the capital barrier to owning meaningful compute has become so high that only a handful of entities in the world can actually build it and the companies who got there early are sitting on what may be the most durable pricing power in the history of the technology industry. This is the neocloud trade. The neocloud market, purpose-built GPU cloud providers like CoreWeave, Nebius, and Lambda Labs was worth $35 billion in 2026 and is projected to reach $236 billion by 2031, compounding at 46% annually. For context, that is faster growth than cloud computing itself posted in its first decade. The reason is very simple, hyperscalers like AWS, Azure, and Google are building for everything, storage, databases, enterprise software, networking and their GPU pricing reflects the overhead of that full-stack infrastructure. Neoclouds build for one thing only, AI compute. The result is a 60% to 85% cost advantage on the same Nvidia silicon, bare metal H100s at $0.78 to $2.79 per GPU-hour on a neocloud versus $3.43 to $5.07 per GPU-hour on a hyperscaler. That spread does not close as AI demand scales but rather it widens, because hyperscalers have to amortize legacy infrastructure and margin expectations that neoclouds do not carry. Gartner projects that by 2030, neoclouds will capture 20% of the $267 billion AI cloud market, and Vultr's own analysis says at least 80% of GPU market share by end of 2026 will be held by a small group of scaled neocloud providers. Now zoom into Nebius specifically, because it is the most interesting publicly traded proxy for this trade. Nebius is the infrastructure arm of the former Yandex Russia's equivalent of Google rebuilt from the ground up after Russia's invasion of Ukraine by Arkady Volozh and relisted on Nasdaq in October 2024. The team that built it already knew how to run internet-scale infrastructure at the lowest possible cost, which is exactly the operational DNA a neocloud requires. In Q1 2026, Nebius reported revenue of $399 million and already generating serious cash on a young business with revenue growing nearly eightfold year-over-year. Then in March 2026, Meta signed a five-year infrastructure agreement with Nebius worth up to $27 billion, $12 billion in committed dedicated GPU capacity deployments beginning early 2027, plus up to $15 billion more tied to Meta purchasing Nebius's unsold third-party capacity. The deal will be executed on one of the first large-scale deployments of Nvidia's Vera Rubin platform, the next-generation architecture after Blackwell making Nebius one of a tiny number of operators in the world with confirmed priority access to the most advanced AI hardware available. Following the contract, Nebius guided to $7 to $9 billion in annualized recurring revenue for 2026 representing 540% year-over-year growth. @chamath point about the $100 billion capital moat is the bear case for new entrants and the bull case for incumbents. No one can afford to build the next CoreWeave or Nebius from scratch at current hardware and power costs. The companies that are already built, already contracted, and already deploying Nvidia's latest silicon have a moat that compounds with every GPU generation cycle because they get allocations first, they deploy fastest, and their customers re-sign rather than wait for a new operator that does not yet exist. Come join Milk Road Pro for our full breakdown, the complete neocloud competitive landscape, how to think about Nebius's valuation versus CoreWeave and AI entire thesis. Link below.

English

526

138K

Thaki Cloud@thakicloud·16 Haz

6 billion people are watching the World Cup. Almost none of them are watching the AI infrastructure running underneath it. #WorldCup2026 #FIFA #ThakiCloud #AIInfrastructure #EnterpriseAI #WorldCup2026 #FIFA #ThakiCloud #AIInfrastructure #EnterpriseAI linkedin.com/posts/thakiclo…

English

Thaki Cloud@thakicloud·12 Haz

DeepSeek just took 17% of production AI token volume in a single month. Its share of spend: 1%. The model market is getting more complex every month. The infrastructure question is whether your stack can keep up. #DeepSeek #Anthropic #TokenEconomics linkedin.com/posts/deepseek…

English

Thaki Cloud@thakicloud·11 Haz

@zephyr_z9 The 20GW vs 4.5GW gap is real. But the US constraint isn't generation capacity — it's transmission and interconnection. China adds grid faster than most countries have in total. The snapshot comparison flatters the US. The rate-of-change comparison is more interesting.

English

Zephyr@zephyr_z9·10 Haz

The biggest problem I have with "China has more power than US narrative" is that Chinese CSPs are tendering for 4.5GW of power for IT capacity this year Meanwhile, that number is over 20GW in the US for 2026 "FLOPS/W: directionally true and Huawei’s real weakness. But power is the binding constraint in the US, not China. China adds more grid capacity yearly than most countries have in total."

GDP@bookwormengr

Comparing Ascend and Rubin on FLOPS is misguided, imho! GENERATIONS ======== First of all they are different generations and different architecture! Also, they have different die count, process and Ascend 950 dies are smaller (smaller than even 910C). @jukan05 OPTICAL SCALE UP ======== They also have different design philosophies! Huawei’s approach is system over chips and they go for large scale up domains with optical networking that also means more space for networking IP. They use LPO that pushes DSP (lightweight) logic to ASIC. @iamfabian This also saves them trouble of buying expensive DSP IP. FYI @teortaxesTex Rubin Ultra is 4 chiplets at 3nm (for the 50 FP4 SKU you are mentioning). Ascend is 7nm and 2 chiplets only. Also, Huawei reduced chiplet size significantly from 910c to 950. This improves the yield. I grant your point about Rubin dedicating more space for FP4, which Ascend may end up doing. Here is the right way to assess, imho. ============ 1.Rubin’s 50 PF is a 2027 part vs Ascend 950 shipping Q4 2026 in my estimate. The shipping comparison is B200/B300: ~10 and ~15 PF dense FP4. So per-chip it’s ~5-7x, not 25x. Still a real gap, but a very different one…. 2.Per-chip FLOPS is the wrong unit of account. Training and inference run on systems, not chips!!! Atlas 950 SuperPoD puts 8,192 NPUs in ONE scale-up domain (16 EF FP4, 16 PB/s fabric, unified memory addressing) vs 72-144 GPUs for NVL72/NVL144. Huawei’s all-optical UnifiedBus (2.1µs latency, 200m+ reach, claimed 100x optical reliability) is what makes a rack-scale to hall-scale coherent domain transition possible at all. 3.Why does domain size matter? Bigger scale-up domains mean less reliance on slow scale-out networking for EP/TP-heavy workloads (MoE inference especially). They trade per-chip muscle for fabric, exactly the trade a networking company under chip sanctions should make! 4.FLOPS/W: directionally true and Huawei’s real weakness. But power is the binding constraint in the US, not China. China adds more grid capacity yearly than most countries have in total. Huawei is spending the resource it has (power, floor space, optics) to save the one it doesn’t (leading edge silicon below 7nm). NVIDIA wins on chips alone - that will forever be the case. The contest is at the system level, so Huawei is playing on its strength - networking (LPO in particular). LPO works as racks, boards, connectors all designed by the same vendor. Though reliability of such large scale up domain is yet to be proven. Huawei is playing an interesting game.

English

132

55K

Thaki Cloud@thakicloud·11 Haz

@JohnTinsman $50B in, $300-400B out. That's not a technology story — it's a capital allocation story. And it explains exactly why every organization that can't write a hyperscaler-sized check is rethinking what it means to own their own compute.

English

Thaki Cloud@thakicloud·11 Haz

@MilkRoadAI Everyone's counting GPUs. The actual bottleneck is the memory stacked on top of them. HBM supply is sold out through 2026, new fabs take years, and Jensen just said the second half of this year is going to be bigger than the first. That gap is real infrastructure risk.

English

Milk Road AI@MilkRoadAI·10 Haz

Jensen Huang just made a statement that every investor in AI infrastructure needs to hear (Save this). He said that the AI buildout is accelerating, the second half of this year is going to be much larger than the first half, and next year is going to be very, very large. Micron is the best positioned to win from this because every Nvidia GPU requires High Bandwidth Memory stacked directly on the chip to feed it data fast enough to keep up. There is no AI compute without memory, and right now there is simply not enough memory to go around. Micron's entire HBM supply for 2026 is already completely sold out under multi-year agreements before the year even started. Micron's own management has acknowledged they can only satisfy 50 to 65 percent of demand from some of their most important customers. That is not a problem that gets fixed quickly, because new fabs take years to build. Micron's Idaho expansion does not come online until mid-2026, a second Idaho facility is not expected until 2028, and a new New York fab is looking at 2030. The demand Jensen just described is arriving right now, and the supply to meet it is years away. The financial results already reflect this dynamic. Micron's Q2 fiscal 2026 revenue came in at $23.86 billion, nearly triple what it was a year earlier beating consensus by roughly $3.8 billion. The HBM market alone is expected to grow from $35 billion today to $100 billion by 2028, and Micron has been consistently ahead of that forecast. Jensen just told the world the second half of this year and all of next year are going to be larger than anything that came before. Micron is the company that supplies the memory those GPUs need to run, and it cannot build supply fast enough to keep up with demand. Come join Milk Road Pro for our full deep dive on Micron, the HBM supply thesis and our AI trade thesis! Link below!

English

482

77.6K

Thaki Cloud@thakicloud·11 Haz

@SemiAnalysis_ The village furnace vs. steel mill framing is right. What it leaves out is the regional mill — enterprise-grade, on-premises, datacenter economics without the data leaving the building. That's where the real volume is headed.

English

SemiAnalysis@SemiAnalysis_·10 Haz

Local LLMs are the Great Leap Forward for Inference. Every laptop is it's own datacenter, sovereignty over your own tokens, and the people can seize the means of token generation. And that's why it's destined for poor results. (1/4)🧵

English

350

75.1K

Thaki Cloud@thakicloud·11 Haz

@SemiAnalysis_ The most important AI infrastructure story isn't the models. It's who owns the compute underneath them. DeepSeek just made their answer clear.

English

SemiAnalysis@SemiAnalysis_·10 Haz

DeepSeek is going heavy-asset. On June 9, the company posted an opening for IDC planning engineers, a role explicitly scoped to the design and delivery of MW-to-GW scale infrastructure. It follows April's hiring of data center O&M engineers in Ulanqab, Inner Mongolia. Taken together, this is the first time DeepSeek has fully shown its hand on owning compute infrastructure rather than just renting it.

English

480

107.9K

Thaki Cloud@thakicloud·11 Haz

@SemiAnalysis_ The "withhold models from subscriptions" prediction assumes the labs stay vertically integrated on inference. If third-party inference gets cheap enough fast enough, that leverage disappears. The subscription ceiling is really a compute cost floor in disguise.

English

SemiAnalysis@SemiAnalysis_·11 Haz

What's the better business model for an AI lab, subscription or API? (1/4)🧵

English

696

169.4K

Thaki Cloud@thakicloud·11 Haz

@koreatechdesk True in AI infrastructure too. Access to great compute doesn't produce outcomes — disciplined operations around it do. The gap between capability and results is almost always an execution problem, not a resource problem.

English

koreatechdesk@koreatechdesk·11 Haz

💬 "Korea is an environment where good results can be made, but it does not guarantee the result." Advanced manufacturing helps. Operational discipline determines survival. 🏭⚙️📈 #Manufacturing #Hardware #Startups #Korea #Innovation #Factory #Industry koreatechdesk.com/korea-manufact…

English

Thaki Cloud@thakicloud·11 Haz

Anthropic may be spending $1,000 for every $100 you pay them on Claude subscriptions. A new analysis tracking real API costs on serious coding workloads makes a compelling case. #Anthropic #Claude #ClaudeCode #AIEconomics #EnterpriseAI #thakicloud #OpenAI linkedin.com/posts/anthropi…

English

Keşfet

@MilkRoadAI @chamath @zephyr_z9 @JohnTinsman @SemiAnalysis_ @elonmusk @BarackObama @taylorswift13