
295 billion parameters. 21B active per token. 600 GB BF16 checkpoint, too large for a single node.
We deployed Hy3-preview on Hyperstack using multi-node Kubernetes with 16 NVIDIA H100s across two worker nodes, hybrid Tensor + Expert Parallelism and a 600 GB BF16 checkpoint loaded from local NVMe.
In this tutorial:
→ Multi-node Kubernetes cluster on Hyperstack (two 8x H100-80G PCIe-NVLink)
→ LeaderWorkerSet API for coordinated 2-node inference
→ vLLM with native multi-node tensor parallelism and MTP speculative decoding
→ 256K token context window with three reasoning tiers (no_think / low / high)
→ Multi-agent code review pipeline with parallel specialist agents and tool calling
→ Plugging into Claude Code, OpenClaw, and OpenCode as a local backend
80.6 on SWE-Bench Verified. 34.86 on LiveCodeBench v6.
Full tutorial on the blog: Deploy Hy3-preview on Hyperstack: A Multi-Node Kubernetes Guide
#Hyperstack #Hy3preview
English