SemiAnalysis

1.9K posts

SemiAnalysis banner
SemiAnalysis

SemiAnalysis

@SemiAnalysis_

Katılım Ocak 2024
22 Takip Edilen66.1K Takipçiler
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
Meta looked at their NVIDIA bill and chose violence. MTIA v3 is real.
English
2
1
37
5.5K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
Olympian Gold Medalist Alysa Liu, recently went viral for her Teen Vogue rant on OpenAI Codex. “I can see why Sam Altman open sourced Codex. Clearly the experience is significantly worse than Claude Code. I was unable to feel the AGI using Codex. As oppose to using Claude Code, I felt the enlightenment coming and support UBI ”
SemiAnalysis tweet mediaSemiAnalysis tweet media
English
36
19
596
82.1K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
KING ALERT: Congrats to @roaner & @AnushElangovan & 10x AMD China Engineering team for its amazing FP8 MI355 ROCm SGLang disaggregated performance beating NVIDIA Blackwell! They are also the Inference King! 👑
Ramine Roane@roaner

English
9
12
128
24.3K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
The strongest case against: healthcare is a Baumol sector. Q1 spends more on it (10.5% vs 6.4%), and AI could cure the cost disease there. If that channel dominates, this flips progressive.
English
0
0
9
3.2K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
The mechanism is financial services. Securities, insurance, credit intermediation — the most AI-exposed sector in the economy — are 17.7% of Q5's budget and 2.1% of Q1's. Under a 10% cost reduction assumption: Q5 saves $2,325/yr (1.5% of budget). Q1 saves $346 (1.0%). Even as a share of their own spending, Q5 benefits 50% more.
SemiAnalysis tweet media
English
2
3
12
3.7K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
We mapped Felten et al.'s AI exposure scores onto BLS consumption data by income quintile. The top 20% of households have 29% more of their spending basket exposed to AI-driven cost reductions than the bottom 20%. AI deflation has a distributional problem. The mechanism is financial services. Securities, insurance, credit intermediation — the most AI-exposed sector in the economy — are 17.7% of Q5's budget and 2.1% of Q1's.
SemiAnalysis tweet media
English
5
5
33
7.1K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
South Korea Memory makers are firing on all cylinders, across both DRAM and NAND. (1/2)🧵
SemiAnalysis tweet mediaSemiAnalysis tweet media
English
6
14
117
14.5K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
Everyone has an AI chip now. almost none of them matter. here's the ones that do.
English
1
6
36
9.4K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
Furthermore, XPO is designed from ground-up to deploy LPO. Signal reconditioning is not necessary as it can travel from the switch ASIC to the faceplate via CPC connectors and flyover cables. This poses an existential risk to DSPs. Subscribers of our networking model can find out more in our flash note at (5/5) semianalysis.com/ai-networking-….
English
0
0
6
4.5K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
The XPO-MSA introduces an alternative scaling solution. It is 4x denser than OSFP from a bandwidth perspective and can transmit 204.8T of switching capacity from a single 1RU box. (4/5)
SemiAnalysis tweet media
English
1
0
8
4.7K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
Last week, Arista, together with dozens of partners, introduced a new MSA for a denser XPO form factor combining 8 OSFP cages and 64x200G SerDes lanes for a total bandwidth of 12.8T for the first go-to-market product. (1/5) 🧵
SemiAnalysis tweet media
English
2
11
121
15.9K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
On FP8 Disaggregated Serving, MI355 beats B200 on both raw tok/s/gpu and cost per million tokens. On the image below, u can see that not only does MI355 beat B200, over time the gap between MI355 & B200 widens due to MI355's fast software progression for fp8. This trend happens on MI355 MTP vs B200 MTP and on MI355 non-MTP vs B200 non-MTP. Great job to @roaner & @AnushElangovan's team!
SemiAnalysis tweet mediaSemiAnalysis tweet media
English
9
14
198
19.8K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
During training, MLA works like a standard MHA, but compresses QKV vectors with low rank projection. Notably, KV vectors are compressed to one KV latent. During inference, MLA operates at the latent dimension with a weight absorption trick, avoiding up projecting QKV vectors. Since all query heads share one KV latent, MLA works like Multi-Query Attention (MQA). The MHA / MQA mode design is also why DeepSeek Sparse Attention (DSA), which separately trains a lightning indexer for inference, is implemented based on the MQA mode. We show the absorption math below and omit RoPE for simplicity.
SemiAnalysis tweet mediaSemiAnalysis tweet media
English
4
16
157
13K
SemiAnalysis
SemiAnalysis@SemiAnalysis_·
The ChipBook HBM Tracker is a great tool for monitoring the sustainability of high-end memory demand. (1/2) 🧵
SemiAnalysis tweet media
English
9
22
107
29.6K