Austin Lyons

Today we’re announcing an agreement with Amazon Web Services to bring tens of millions of AWS Graviton cores to our compute portfolio. This partnership marks an expansion of our diversified AI infrastructure and will help scale systems behind Meta AI and agentic experiences that serve billions of people. Learn more: go.meta.me/2bc5c5

6

58

63.9K

Austin Lyons@austinsemis·2h

CPU demand: Meta CPU supply: AWS + Arm

AI at Meta@AIatMeta

English

10

739

Austin Lyons@austinsemis·12h

@DaultyMaassen Thank you sir!

English

12

Daulty@DaultyMaassen·12h

@austinsemis Love it! But love your content even more! Keep it up!

English

NWS Des Moines@NWSDesMoines

0

13

Austin Lyons@austinsemis·14h

Tornado warning during small group at our house 😆

Tornado Warning including Waukee IA, De Soto IA and Van Meter IA until 7:45 PM CDT

English

0

2

803

Austin Lyons@austinsemis·13h

@DaultyMaassen Yes sir!

English

0

1

18

Daulty@DaultyMaassen·13h

@austinsemis Yoooooo! You’re from Iowa, too!?

English

0

1

45

Austin Lyons@austinsemis·17h

@_Daudinho @FrameworkPuter @AMD i will let you know as i research and tinker, check back in a few weeks!

English

1

33

Daud, MD@_Daudinho·17h

@austinsemis @FrameworkPuter @AMD How does this stack up on $/perf for Agentic workload vs an Apple air M4/M5. Planning on doing the home server too as per y’all’s pod recommendations as opposed to on my personal laptop. Any recommendations given I’m def a noob w this stuff.

English

0

49

Austin Lyons@austinsemis·19h

Gonna agent. @FrameworkPuter + @AMD

English

8

1.7K

Austin Lyons@austinsemis·17h

Lip-Bu + Elon + Cursor Lip-Bu said he wanted unconventional ways to refactor silicon process technology!

English

15

1.2K

Austin Lyons@austinsemis·17h

Between Elon & LBT, seems @Intel_Foundry 14A is good to go: "Intel 14A maturity, yield and performance are outpacing Intel 18A at a similar point in time.... I am particularly pleased that our progress today has driven us to land more of our own future product tiles on Intel 14A as well." $INTC

English

25

3.8K

Austin Lyons@austinsemis·17h

I love Jonathan, the Intel earnings voice actor guy. Wonder if he sounds like that at the dinner table?

English

($) TSMC's Margins in Uncharted Territory chipstrat.com/p/tsmcs-margin… $TSM

1

4

483

Austin Lyons@austinsemis·21h

TSMC Q1 gross margin: 66.2% Long-term target: 56% Widest spread ever. During the seasonally weakest quarter!

Chipstrat@chipstrat

English

3

6

1.5K

Austin Lyons@austinsemis·1d

Claude Code has @bcherny Codex has @thsottiaux What about xAI? @elonmusk should appoint someone from Cursor to own that role Same for Gemini. @sundarpichai needs a Boris.

English

0

3

501

Austin Lyons@austinsemis·1d

@StuartFloridian @semidoped @Meta @AristaNetworks Would love to chat @AristaNetworks !

English

1

27

StuartFloridian 🌅@StuartFloridian·1d

@semidoped @Austinlyons @Meta So good - your Matt Steiner @Meta drill down! ⛳ Next perhaps a discussion with the also in-the-know @AristaNetworks folks! youtu.be/F5PhIDy_yEs?t=…

YouTube

English

Semi Doped Podcast@semidoped

0

1

42

Semi Doped Podcast@semidoped·3d

Meta's core business is ads. Ads are AI workloads. But not LLM workloads. @austinlyons chatted with @Meta VP Matt Steiner to understand Meta's heterogeneous compute stack. Surprises: - Recommender training needs a different compute-to-memory ratio than LLMs. Hence MTIA. - Retrieval is memory-bound at Meta scale. Andromeda runs on a co-designed Grace Hopper SKU, not off-the-shelf. - Adaptive ranking scales compute per user. Power users with long histories get more. - Consolidating N ranking models into one (Lattice) improved performance, not just cost. - KernelEvolve (LLM-written kernels) flipped heterogeneous fleet economics. SWE demand going UP. - Meta wants ~100x more kernels per chip. Chapters: (00:00) Intro and scale (00:39) How Meta's ad system works (02:00) Meta Andromeda and the custom NVIDIA SKU (03:30) Lattice: consolidating ranking models (05:00) GEM, Meta's ads foundation model (06:30) Adaptive ranking for power users (08:17) The scale: 3B DAUs at sub-second latency (09:40) Why longer interaction histories matter (10:45) The anniversary gift analogy (12:57) A decade of compute evolution (15:21) Meta's infra as a CP-SAT problem (16:07) Co-designing Grace Hopper with NVIDIA (17:47) Matching compute shape to workload (18:26) Influencing hardware and software roadmaps (20:23) MTIA: why ads aren't LLMs (22:07) The personalization blob and I/O ratios (26:38) One trillion parameters at sub-second latency (28:26) Heterogeneous hardware trade-offs (29:30) KernelEvolve: LLMs writing custom kernels (33:30) GenAI and recommender systems cross-pollination (35:21) The 2-year infrastructure outlook (37:00) Why demand for software engineering is rising (38:53) How Matt stays on top of it all $META @austinlyons @vikramskr

English

3

10

86

30.3K

Austin Lyons@austinsemis·1d

28K views + many inbound DMs to say they appreciated the convo. Feels good man!

Meta's core business is ads. Ads are AI workloads. But not LLM workloads. @austinlyons chatted with @Meta VP Matt Steiner to understand Meta's heterogeneous compute stack. Surprises: - Recommender training needs a different compute-to-memory ratio than LLMs. Hence MTIA. - Retrieval is memory-bound at Meta scale. Andromeda runs on a co-designed Grace Hopper SKU, not off-the-shelf. - Adaptive ranking scales compute per user. Power users with long histories get more. - Consolidating N ranking models into one (Lattice) improved performance, not just cost. - KernelEvolve (LLM-written kernels) flipped heterogeneous fleet economics. SWE demand going UP. - Meta wants ~100x more kernels per chip. Chapters: (00:00) Intro and scale (00:39) How Meta's ad system works (02:00) Meta Andromeda and the custom NVIDIA SKU (03:30) Lattice: consolidating ranking models (05:00) GEM, Meta's ads foundation model (06:30) Adaptive ranking for power users (08:17) The scale: 3B DAUs at sub-second latency (09:40) Why longer interaction histories matter (10:45) The anniversary gift analogy (12:57) A decade of compute evolution (15:21) Meta's infra as a CP-SAT problem (16:07) Co-designing Grace Hopper with NVIDIA (17:47) Matching compute shape to workload (18:26) Influencing hardware and software roadmaps (20:23) MTIA: why ads aren't LLMs (22:07) The personalization blob and I/O ratios (26:38) One trillion parameters at sub-second latency (28:26) Heterogeneous hardware trade-offs (29:30) KernelEvolve: LLMs writing custom kernels (33:30) GenAI and recommender systems cross-pollination (35:21) The 2-year infrastructure outlook (37:00) Why demand for software engineering is rising (38:53) How Matt stays on top of it all $META @austinlyons @vikramskr

English

8

1.6K

Austin Lyons@austinsemis·1d

@christophauto @BenBajarin @nvidia So the scale and aggregate improvements is what was interesting to me.

English

58

chris@christophauto·1d

@austinlyons1234 @BenBajarin @nvidia Since when is hardware receiving software updates to improve performance a new concept? Not sure I get the point of this tweet. Literally every hardware company has always done this? And “forever?” More like a decade.

English

0

255

Austin Lyons@austinsemis·1d

Imagine buying an apartment building, and the developer keeps sneaking in overnight to upgrade the kitchens. For free. Forever. And now imagine you had 1 million apartments getting improved! GCP has over a million @nvidia GPUs racked up. And Nvidia keeps shipping software updates that make these already-deployed GPUs faster. The performance in the rack today is not the performance @Google originally bought. That's Nvidia’s deal with the hyperscalers. Pretty sweet. $GOOG blogs.nvidia.com/blog/google-cl…

English

3

4

34

5.9K

Austin Lyons@austinsemis·1d

@christophauto @BenBajarin @nvidia An interesting thought at GCP scale (1M GPUs). And the performance updates for GPUs that are mostly running LLM workloads means collective performance from software updates is vast. Compared to say CPU software updates when those CPUs run a ton of different workloads.

English

112

Austin Lyons@austinsemis·1d

@michaelbyrne @nvidia lol

1

158

Michael Byrne@michaelbyrne·1d

@austinlyons1234 @nvidia sounds creepy.

English

0

1

174

Austin Lyons@austinsemis·1d

Many workloads require FP64 and a ton of memory capacity. That's the beachhead.

Austin Lyons@austinsemis

Everything that can be accelerated is, right? Nope. A surprising number of HPC and production rendering workloads still run on CPUs. The GPUs available during the CPU-to-GPU wave weren't the right config or price point to make switching worthwhile for everyone. And now GPU makers are (rightly) focused on datacenter AI and neural rendering, making tradeoffs that don’t help traditional simulation and production rendering (e.g. FP64 has been deprioritized while AI-focused formats like FP8 and FP4 get more silicon). So these customers were missed by the CPU-to-GPU wave and are deprioritized in the AI era. That creates an opening for a newcomer like Bolt to make different architectural bets, for example up to 384 GB of memory per card vs 96 GB on Nvidia's RTX 6000 Pro and 32 GB on the 5090. That'll help with rendering's "scenes don't fit in GPU memory" problem! @boltgraphics's announcement today is a 12nm test chip. There's still a long way to go until Bolt's 4Q'27 production, but the market opportunity is definitely there. prnewswire.com/news-releases/…

English

0

9

2.1K

Austin Lyons@austinsemis·1d

Everything that can be accelerated is, right? Nope. A surprising number of HPC and production rendering workloads still run on CPUs. The GPUs available during the CPU-to-GPU wave weren't the right config or price point to make switching worthwhile for everyone. And now GPU makers are (rightly) focused on datacenter AI and neural rendering, making tradeoffs that don’t help traditional simulation and production rendering (e.g. FP64 has been deprioritized while AI-focused formats like FP8 and FP4 get more silicon). So these customers were missed by the CPU-to-GPU wave and are deprioritized in the AI era. That creates an opening for a newcomer like Bolt to make different architectural bets, for example up to 384 GB of memory per card vs 96 GB on Nvidia's RTX 6000 Pro and 32 GB on the 5090. That'll help with rendering's "scenes don't fit in GPU memory" problem! @boltgraphics's announcement today is a 12nm test chip. There's still a long way to go until Bolt's 4Q'27 production, but the market opportunity is definitely there. prnewswire.com/news-releases/…

English