SemiAnalysis
1.9K posts


Olympian Gold Medalist Alysa Liu, recently went viral for her Teen Vogue rant on OpenAI Codex.
“I can see why Sam Altman open sourced Codex. Clearly the experience is significantly worse than Claude Code. I was unable to feel the AGI using Codex. As oppose to using Claude Code, I felt the enlightenment coming and support UBI ”


English

KING ALERT: Congrats to @roaner & @AnushElangovan & 10x AMD China Engineering team for its amazing FP8 MI355 ROCm SGLang disaggregated performance beating NVIDIA Blackwell! They are also the Inference King! 👑
Ramine Roane@roaner
English

The mechanism is financial services. Securities, insurance, credit intermediation — the most AI-exposed sector in the economy — are 17.7% of Q5's budget and 2.1% of Q1's.
Under a 10% cost reduction assumption: Q5 saves $2,325/yr (1.5% of budget). Q1 saves $346 (1.0%).
Even as a share of their own spending, Q5 benefits 50% more.

English

We mapped Felten et al.'s AI exposure scores onto BLS consumption data by income quintile.
The top 20% of households have 29% more of their spending basket exposed to AI-driven cost reductions than the bottom 20%.
AI deflation has a distributional problem.
The mechanism is financial services. Securities, insurance, credit intermediation — the most AI-exposed sector in the economy — are 17.7% of Q5's budget and 2.1% of Q1's.

English

Subscribe to ChipBook and make sure you stay on top of the latest in the global Memory market: (2/2)semianalysis.com/chipbook/
English

Furthermore, XPO is designed from ground-up to deploy LPO. Signal reconditioning is not necessary as it can travel from the switch ASIC to the faceplate via CPC connectors and flyover cables. This poses an existential risk to DSPs. Subscribers of our networking model can find out more in our flash note at (5/5) semianalysis.com/ai-networking-….
English

On FP8 Disaggregated Serving, MI355 beats B200 on both raw tok/s/gpu and cost per million tokens. On the image below, u can see that not only does MI355 beat B200, over time the gap between MI355 & B200 widens due to MI355's fast software progression for fp8. This trend happens on MI355 MTP vs B200 MTP and on MI355 non-MTP vs B200 non-MTP. Great job to @roaner & @AnushElangovan's team!


English

We wrote a chapter in this one. "Teaching Sand to Think." Go check it out!
Arena Magazine@arenamagdotcom
Announcing our first book: Silicon A beautiful coffee table book about the world of transistors, chips, and the greatest technology revolution of all time. 384 pages. Almost five pounds. Preorders open now, shipping in May: arenamag.com/silicon
English

During training, MLA works like a standard MHA, but compresses QKV vectors with low rank projection. Notably, KV vectors are compressed to one KV latent. During inference, MLA operates at the latent dimension with a weight absorption trick, avoiding up projecting QKV vectors. Since all query heads share one KV latent, MLA works like Multi-Query Attention (MQA). The MHA / MQA mode design is also why DeepSeek Sparse Attention (DSA), which separately trains a lightning indexer for inference, is implemented based on the MQA mode.
We show the absorption math below and omit RoPE for simplicity.


English

Subscribe to ChipBook and get all of our memory data sets updated monthly: (2/2) semianalysis.com/chipbook/
English




