Hyperstack: "295 billion parameters. 21B active per token. 600 GB BF16 checkpoint, too large "

Hyperstack@Hyperstackcloud·6d

295 billion parameters. 21B active per token. 600 GB BF16 checkpoint, too large for a single node. We deployed Hy3-preview on Hyperstack using multi-node Kubernetes with 16 NVIDIA H100s across two worker nodes, hybrid Tensor + Expert Parallelism and a 600 GB BF16 checkpoint loaded from local NVMe. In this tutorial: → Multi-node Kubernetes cluster on Hyperstack (two 8x H100-80G PCIe-NVLink) → LeaderWorkerSet API for coordinated 2-node inference → vLLM with native multi-node tensor parallelism and MTP speculative decoding → 256K token context window with three reasoning tiers (no_think / low / high) → Multi-agent code review pipeline with parallel specialist agents and tool calling → Plugging into Claude Code, OpenClaw, and OpenCode as a local backend 80.6 on SWE-Bench Verified. 34.86 on LiveCodeBench v6. Full tutorial on the blog: Deploy Hy3-preview on Hyperstack: A Multi-Node Kubernetes Guide #Hyperstack #Hy3preview

English