
Azeez
63 posts

Azeez
@AtlasInference
Building Atlas, Rust inference engine with custom CUDA kernels for DGX Spark GB10, and we're gearing up for an open source release. 102 tok/s on Qwen3.6


be patient anon. i could fake the AMD numbers tonight. i won't. so we wait.






























📊 Pipeline Parallelism Improves Performance by 2.2x on Dual DGX Spark Cluster ✅ Direct Spark-to-Spark link, no switch. ✅ 240GB unquantized model 🤔 Guessing they used vLLM Key Takeaways 🔹 200GbE ConnectX-7 fabric = real home clustering of two 128GB boxes 🔹 PP=2 beats TP=2 by 2.2× on 120B models (555 vs 252 tok/s @ batch 128) 🔹 Only 1 cross-box handoff per token vs 160 with tensor parallelism 🔹 TP wins only at batch=1 (single user). PP wins for any real serving load 🔹 Dell / GIGABYTE / HP = dead heat — choose by build & support 🏠 At-home win! Cluster two Sparks + PP to run huge models or boost batched inference speed Full benchmarks + charts in ALT from @storagereview 👇

Two of these connected can run DeepSeek v4 Flash and one can run Nemotron 120B and Qwen 3.6 27B!



