Austin Lyons

4K posts

Austin Lyons banner
Austin Lyons

Austin Lyons

@austinsemis

Substack: https://t.co/zxI0P7bBdO | Podcast: https://t.co/fmcrAXKfr6 | Tech Analyst @creativestrat

Katılım Eylül 2007
1.3K Takip Edilen4.6K Takipçiler
Sabitlenmiş Tweet
Austin Lyons
Austin Lyons@austinsemis·
Why is @MarvellTech partnering with @nvidia and supporting NVLink? My hypothesis: Because @AWS wants them to. Marvell is the XPU design partner for Trainium, and AWS announced back in December that Trainium4 would support NVLink. So this feels like Nvidia saying, “Yes, please Marvell, go ahead and use NVLink. Happy to help you”. Vik the interconnect guy can't help but point out the press release mentions “the companies will also collaborate on silicon photonics technology.” Could that involve Celestial AI’s photonic fabric? Could there be a future where NVLink scales up over optical links instead of copper? And if so, could Marvell/Celestial’s photonic fabric show up in future Trainium chips? After all, in Marvell’s press release about the Celestial acquisition, AWS was on the record saying they “believe optical interconnects will play an important role in the future of AI infrastructure” and that “Celestial AI has made impressive progress, and we expect their combination with a large-scale semiconductor company like Marvell will help further accelerate optical scale-up innovation for next-generation AI deployments.” Maybe a Trainium4 (or 5+?) will scale up using a photonic fabric? Good take @vikramskr! So there is definitely a believable angle about customer demand pulling this Nvidia/Marvell partnership into existence. This is a great move for Nvidia and helps them participate in the expanding XPU TAM. But it’s more than just getting a cut of the XPU TAM; it’s another step in the multi-silicon datacenter story. If I were Nvidia, I’d frame conversations with hyperscalers like this: hey AWS, if you’re buying Vera Rubin GPUs (and maybe some @GroqInc LPUs?), why not deploy some racks of Trn4 in the same datacenter hall? Same MGX rack design, same Spectrum-X scale-out fabric, same management plane. Route cheap inference to the XPUs, send the hard reasoning workloads to our GPUs. And while you’re at it, maybe you look into our storage racks for hardware-accelerated KV-cache, Vera CPU racks for orchestration, BlueField DPUs… it's about Nvidia owning the broader "walled garden" and letting silicon in. That's not a knock on Nvidia --- the value prop to the customer is that vertically integrated datacenter has the best performance compared to a mix and match, less coherent approach. So this NVLink fusion + XPU play is making the whole datacenter floor Nvidia-native so everything speaks the same language, even when the compute is heterogeneous, and even if the physical medium shifts to optics in the future. $MRVL $NVDA youtube.com/watch?v=oWQG20…
YouTube video
YouTube
English
2
6
58
63.9K
Daulty
Daulty@DaultyMaassen·
@austinsemis Love it! But love your content even more! Keep it up!
English
1
0
0
13
Daud, MD
Daud, MD@_Daudinho·
@austinsemis @FrameworkPuter @AMD How does this stack up on $/perf for Agentic workload vs an Apple air M4/M5. Planning on doing the home server too as per y’all’s pod recommendations as opposed to on my personal laptop. Any recommendations given I’m def a noob w this stuff.
English
1
0
0
49
Austin Lyons
Austin Lyons@austinsemis·
Lip-Bu + Elon + Cursor Lip-Bu said he wanted unconventional ways to refactor silicon process technology!
Austin Lyons tweet media
English
0
0
15
1.2K
Austin Lyons
Austin Lyons@austinsemis·
Between Elon & LBT, seems @Intel_Foundry 14A is good to go: "Intel 14A maturity, yield and performance are outpacing Intel 18A at a similar point in time.... I am particularly pleased that our progress today has driven us to land more of our own future product tiles on Intel 14A as well." $INTC
English
2
2
25
3.8K
Austin Lyons
Austin Lyons@austinsemis·
I love Jonathan, the Intel earnings voice actor guy. Wonder if he sounds like that at the dinner table?
English
0
1
4
483
Semi Doped Podcast
Semi Doped Podcast@semidoped·
Meta's core business is ads. Ads are AI workloads. But not LLM workloads. @austinlyons chatted with @Meta VP Matt Steiner to understand Meta's heterogeneous compute stack. Surprises: - Recommender training needs a different compute-to-memory ratio than LLMs. Hence MTIA. - Retrieval is memory-bound at Meta scale. Andromeda runs on a co-designed Grace Hopper SKU, not off-the-shelf. - Adaptive ranking scales compute per user. Power users with long histories get more. - Consolidating N ranking models into one (Lattice) improved performance, not just cost. - KernelEvolve (LLM-written kernels) flipped heterogeneous fleet economics. SWE demand going UP. - Meta wants ~100x more kernels per chip. Chapters: (00:00) Intro and scale (00:39) How Meta's ad system works (02:00) Meta Andromeda and the custom NVIDIA SKU (03:30) Lattice: consolidating ranking models (05:00) GEM, Meta's ads foundation model (06:30) Adaptive ranking for power users (08:17) The scale: 3B DAUs at sub-second latency (09:40) Why longer interaction histories matter (10:45) The anniversary gift analogy (12:57) A decade of compute evolution (15:21) Meta's infra as a CP-SAT problem (16:07) Co-designing Grace Hopper with NVIDIA (17:47) Matching compute shape to workload (18:26) Influencing hardware and software roadmaps (20:23) MTIA: why ads aren't LLMs (22:07) The personalization blob and I/O ratios (26:38) One trillion parameters at sub-second latency (28:26) Heterogeneous hardware trade-offs (29:30) KernelEvolve: LLMs writing custom kernels (33:30) GenAI and recommender systems cross-pollination (35:21) The 2-year infrastructure outlook (37:00) Why demand for software engineering is rising (38:53) How Matt stays on top of it all $META @austinlyons @vikramskr
English
3
10
86
30.3K
Austin Lyons
Austin Lyons@austinsemis·
28K views + many inbound DMs to say they appreciated the convo. Feels good man!
Semi Doped Podcast@semidoped

Meta's core business is ads. Ads are AI workloads. But not LLM workloads. @austinlyons chatted with @Meta VP Matt Steiner to understand Meta's heterogeneous compute stack. Surprises: - Recommender training needs a different compute-to-memory ratio than LLMs. Hence MTIA. - Retrieval is memory-bound at Meta scale. Andromeda runs on a co-designed Grace Hopper SKU, not off-the-shelf. - Adaptive ranking scales compute per user. Power users with long histories get more. - Consolidating N ranking models into one (Lattice) improved performance, not just cost. - KernelEvolve (LLM-written kernels) flipped heterogeneous fleet economics. SWE demand going UP. - Meta wants ~100x more kernels per chip. Chapters: (00:00) Intro and scale (00:39) How Meta's ad system works (02:00) Meta Andromeda and the custom NVIDIA SKU (03:30) Lattice: consolidating ranking models (05:00) GEM, Meta's ads foundation model (06:30) Adaptive ranking for power users (08:17) The scale: 3B DAUs at sub-second latency (09:40) Why longer interaction histories matter (10:45) The anniversary gift analogy (12:57) A decade of compute evolution (15:21) Meta's infra as a CP-SAT problem (16:07) Co-designing Grace Hopper with NVIDIA (17:47) Matching compute shape to workload (18:26) Influencing hardware and software roadmaps (20:23) MTIA: why ads aren't LLMs (22:07) The personalization blob and I/O ratios (26:38) One trillion parameters at sub-second latency (28:26) Heterogeneous hardware trade-offs (29:30) KernelEvolve: LLMs writing custom kernels (33:30) GenAI and recommender systems cross-pollination (35:21) The 2-year infrastructure outlook (37:00) Why demand for software engineering is rising (38:53) How Matt stays on top of it all $META @austinlyons @vikramskr

English
0
0
8
1.6K
chris
chris@christophauto·
@austinlyons1234 @BenBajarin @nvidia Since when is hardware receiving software updates to improve performance a new concept? Not sure I get the point of this tweet. Literally every hardware company has always done this? And “forever?” More like a decade.
English
2
0
0
255
Austin Lyons
Austin Lyons@austinsemis·
Imagine buying an apartment building, and the developer keeps sneaking in overnight to upgrade the kitchens. For free. Forever. And now imagine you had 1 million apartments getting improved! GCP has over a million @nvidia GPUs racked up. And Nvidia keeps shipping software updates that make these already-deployed GPUs faster. The performance in the rack today is not the performance @Google originally bought. That's Nvidia’s deal with the hyperscalers. Pretty sweet. $GOOG blogs.nvidia.com/blog/google-cl…
English
3
4
34
5.9K
Austin Lyons
Austin Lyons@austinsemis·
@christophauto @BenBajarin @nvidia An interesting thought at GCP scale (1M GPUs). And the performance updates for GPUs that are mostly running LLM workloads means collective performance from software updates is vast. Compared to say CPU software updates when those CPUs run a ton of different workloads.
English
0
0
0
112
Austin Lyons
Austin Lyons@austinsemis·
Everything that can be accelerated is, right? Nope. A surprising number of HPC and production rendering workloads still run on CPUs. The GPUs available during the CPU-to-GPU wave weren't the right config or price point to make switching worthwhile for everyone. And now GPU makers are (rightly) focused on datacenter AI and neural rendering, making tradeoffs that don’t help traditional simulation and production rendering (e.g. FP64 has been deprioritized while AI-focused formats like FP8 and FP4 get more silicon). So these customers were missed by the CPU-to-GPU wave and are deprioritized in the AI era. That creates an opening for a newcomer like Bolt to make different architectural bets, for example up to 384 GB of memory per card vs 96 GB on Nvidia's RTX 6000 Pro and 32 GB on the 5090. That'll help with rendering's "scenes don't fit in GPU memory" problem! @boltgraphics's announcement today is a 12nm test chip. There's still a long way to go until Bolt's 4Q'27 production, but the market opportunity is definitely there. prnewswire.com/news-releases/…
English
0
1
17
5.5K
Austin Lyons retweetledi
Ben Bajarin
Ben Bajarin@BenBajarin·
I was at a small preview event last night on Google's new TPU 8t and 8i. Have a post coming shortly, but got some eye candy! Board pics of 8t and 8i.
Ben Bajarin tweet mediaBen Bajarin tweet media
English
1
7
96
14.6K