Subbu

77 posts

Subbu banner
Subbu

Subbu

@subbdue

I develop AI inference Chips/SoCs for a living. Read my work and subscribe to my newsletter at https://t.co/RGu9NKu1Ze

Santa Clara, CA 参加日 Haziran 2009
33 フォロー中706 フォロワー
Subbu
Subbu@subbdue·
One crazy metric from vLLM’s x Mooncake article: “On a real Codex agentic trace … the distributed KV cache pool improves vLLM throughput by 3.8x and reduces P50 TTFT and E2E latency by 46x and 8.6x” 46x improvement in Time To First Token!
Matej Sirovatka@m_sirovatka

KV Cache re-use is the most important thing for agentic rollouts. We've integrated Mooncake Store into prime-rl with vLLM, you can now use it as a drop-in replacement for native CPU/Disk offloading, giving you cross-node prefix cache reuse to make your agents go brrr🚀

English
1
0
4
7.6K
Subbu
Subbu@subbdue·
@vllm_project steadily stacking KV cache gains. It’s worth understanding this recipe … distributed KV caching (and GPUDirect RDMA on Nvidia GPUs) is how real commercial inference deployments operate, and companies like @weka have been industrializing exactly this for a while now. I have a couple of articles explaining this: NVIDIA KV-Cache Context Memory Storage: chiplog.io/p/analysis-of-… Origins of GPUDirect RDMA: chiplog.io/p/how-mellanox… One crazy metric from vLLM’s x Mooncake article: “On a real Codex agentic trace, the distributed KV cache pool improves vLLM throughput by 3.8x and reduces P50 TTFT and E2E latency by 46x and 8.6x” 46x improvement in Time To First Token!
vLLM@vllm_project

🚀 New on the @vllm_project blog: Serving Agentic Workloads at Scale with vLLM x Mooncake. Agentic traces grow to 80K+ tokens with 94%+ reusable prefixes, but local KV caches evict them and cross-instance routing misses them. By integrating Mooncake Store as a distributed KV cache pool, vLLM gets: 🚀 3.8x higher throughput ⚡ 46x lower P50 TTFT ⏱️ 8.6x lower E2E latency 📈 Cache hit rate 1.7% -> 92.2% 🌐 Scales near-linearly to 60 GB200 GPUs at >95% hit rate 🔥 Powered by a deep collaboration between @Inferact and @KT_Project_AI 📖 Read more: vllm.ai/blog/mooncake-… 🧵👇

English
0
0
1
191
Subbu がリツイート
Kairos
Kairos@KairosPraxis·
Absolutely sobering paragraph from @subbdue just as $MU and SK Hynix hit their 1T market caps.
Kairos tweet media
English
18
14
237
40.4K
Subbu がリツイート
Lawrence Hamtil
Lawrence Hamtil@lhamtil·
This is a very good history of the DRAM boom/bust cycles: "he product was a pure commodity, sold by the bit, indistinguishable across vendors. Five or six players were always willing to flood the market the moment demand softened. Every downturn turned into a price war, and every price war took out at least one company." chiplog.io/p/dram-was-the…
English
2
7
74
7K
Subbu
Subbu@subbdue·
A very American-80s style logo and graphic design. Tastefully done!
David Hansen 🇺🇸 🇳🇿@boxcardavid

Westmag is building American robot actuators and drone motors at scale. In 2025, @westmagco raised $11M led by @a16z, with participation from @FoundersFund, @LuxCapital, NFDG, @MenloVentures, and other top investors. Since then, we’ve been building industrial capacity, crawling up supply chains, and securing high-volume customers. Now, we’re ramping production at our factory in South San Francisco to deliver against committed offtake orders from high-volume customers. Westmag is committed to scaling quickly in the US to deliver millions of drone motors and robot actuators to the surging domestic and global market. We’re building the great American motor and actuator company.

English
1
0
7
1.3K
Subbu
Subbu@subbdue·
Other winners in the NVLink Fusion story: $CDNS and $SNPS. In last year’s Computex keynote, Jensen announced that NVLink IP would be distributed through them. Of course, MediaTek and Marvell are also key partners, helping enable companies like Ayar Labs. The future of computing (especially AI inference) is clearly heading toward heterogeneous architectures. The NVIDIA + Groq deal basically cements it. NVLink Fusion is NVIDIA’s play to protect its turf while embracing this shift. My article on NVLink Fusion, AI Inference & Heterogenous Computing: chiplog.io/p/why-speculat… Jensen’s keynote where he announced NVLink Fusion and its partners: youtube.com/live/TLzna9__D…
YouTube video
YouTube
Ayar Labs@AyarLabs

Today, @AyarLabs announced it has joined the @nvidia NVLink Fusion ecosystem, introducing co-packaged optics as a foundational building block for hyperscalers and system innovators deploying heterogeneous compute in NVIDIA AI factories. Press Release: bit.ly/4oa8epa

English
0
2
11
200.2K
Subbu
Subbu@subbdue·
The NVIDIA-MediaTek partnership is turning out to be quite the love story. I wrote this deep dive article on DGX Spark's GB10 SOC, MediaTek's role in it, and how exactly the chemistry between NVIDIA and MediaTek worked ... and even with NVLink Fusion, MediaTek is an essential partner. GB10 SOC Deep Dive: chiplog.io/p/analysis-of-… NVLink Fusion & Heterogenous Computing : chiplog.io/i/189813953/wh…
𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠@IanCutress

If a baker makes a cake with their own recipe using ingredients from a mutual friend, then gives it to you, would you pass it off as your own? Mediatek makes a CPU with their own recipe with arm ingredients and gives it to Nvidia. Broadcom makes a TPU with their own recipe but with Google ingredients and gives it back to Google.

English
0
0
5
554
Citrini
Citrini@citrini·
I’ve sold a lot of AI winners too early, I’m intent on selling very slowly with the agentic utility winners like $BAND $ATEN $SAIL… If the cycle has taught me anything, it’s that a name being up 700% from your entry is absolutely not a reason to sell.
English
58
34
1.1K
120.1K
Vikram Sekar
Vikram Sekar@vikramskr·
If you run into me at Computex, ask me for my newly minted NFC biz card — No more rummaging through paper cards from me. Let’s talk semi research hand-crafted to your order. 🤌
Vikram Sekar tweet media
English
7
1
73
9.4K
Subbu
Subbu@subbdue·
Micron has seen 6 boom & bust cycles since 2000. In 2019, hyperscaler over-ordering crashed their revenue from $30B to $21B. In 2023, the pandemic unwind cut it in half again, from $30B to $15B. Has the AI super-cycle finally broken this pattern? Or is a reckoning coming? — especially since it's the same set of buyers, buying for the same reasons. I analyze Micron through three lenses: Engineering, Finance, and Strategy. The current tailwinds, and the possible headwinds. Full Article here: chiplog.io/p/dram-was-the…
Subbu tweet media
English
1
2
22
45.2K
The Information
The Information@theinformation·
xAI’s GPU fleet is running at about 11% utilization, exposing how hard it is for AI labs to fully use expensive Nvidia hardware. Read more in our AI Agenda newsletter: thein.fo/4cHRjWI
English
62
50
557
1.3M