
1.6 trillion parameters. 49B active per token. Too large for a single node.
We deployed DeepSeek-V4-Pro on Hyperstack using multi-node Kubernetes - 16 NVIDIA H100s across two worker nodes, hybrid Data + Expert Parallelism, and a 960 GB FP4+FP8 checkpoint loaded from local NVMe.
In this tutorial:
→ Multi-node Kubernetes cluster on Hyperstack (2x 8x NVIDIA H100-80G PCIe-NVLink)
→ LeaderWorkerSet API for coordinated 2-node inference
→ vLLM with hybrid DEP topology and MTP speculative decoding
→ 1M token context window with three reasoning tiers
→ Long-horizon autonomous code refactoring with self-correction
→ Plugging into Claude Code, OpenClaw, and OpenCode as a local backend
80.6 on SWE-Bench Verified. 93.5 on LiveCodeBench v6.
Full tutorial on the blog: bit.ly/4f1jamb
#DeepSeek #AgenticAI
English