
Anton Smith
1.3K posts

Anton Smith
@Anton5mith
Product Lover | Network n3rd, k8s n00b | Ex Ericsson/Nokia | ex Canonical | call me infra guy | Nothing is simple. Not even Nothing. Bruno Marchal






DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints. Does this mean you don't need large GPU clusters for frontier LLMs? No but you have to ensure that you're not wasteful with what you have, and this looks like a nice demonstration that there's still a lot to get through with both data and algorithms. Very nice & detailed tech report too, reading through.

Chip Stocks Overnight Reaction to DeepSeek: 1. Arm, $ARM: -5.5% 2. Nvidia, $NVDA: -5.3% 3. Broadcom, $AVGO: -4.9% 4. Super Micro, $SMCI: -4.6% 5. Taiwan Semi, $TSM: -4.5% 6. Micron, $MU: -4.3% 7. Qualcomm, $QCOM: -2.8% 8. AMD, $AMD: -2.5% 9. Intel, $INTC: -2.0% US markets are on track to erase over $1 trillion of market cap in Monday's session. All as earnings, tariffs, and the Fed meeting are in the spotlight. It's going to be another wild week.















