
GoogooGaggle#5
696 posts

GoogooGaggle#5
@ADCs934
Burner account to scare the normies on







Running Qwen 3.6 27B locally on hardware from 2016. 2× GTX 1080 Ti (Pascal, sm_61) - 10-year-old GPUs. 14 tok/s generation, 65K context, full OpenAI API. Hardware: HP Z840 workstation - 2× Xeon E5-2650 v3 (40 threads) - 128GB DDR4 ECC - 2× GTX 1080 Ti (22GB VRAM total) Stack: - llama.cpp TurboQuant fork (TheTom/llama-cpp-turboquant) @no_stp_on_snek - Qwen 3.6 27B UD-Q4_K_XL (17GB GGUF) - Pipeline Parallelism across both GPUs - NUMA-aware thread distribution The secret weapon: TurboQuant KV Cache (ICLR 2026 paper) Standard llama.cpp: 65K context, OOM at 131K TurboQuant (q8_0 K + turbo4 V): 131K context at ZERO speed cost 2× context. Same 14 tok/s. No quality loss. What didn't work: - KTransformers/SGLang → needs sm_80+ (Ampere) - vLLM → FlashAttention needs sm_75+ - Speculative Decoding → no net speedup on hybrid models - Tensor Parallel → incompatible with KV quantization Pascal is the hard limit. Only raw CUDA math works. The bottleneck is VRAM bandwidth: 484 GB/s per GPU, ~22% efficiency. 14 tok/s is the physical ceiling for 2× GTX 1080 Ti. No software trick changes that. It's a hardware wall. What's next: - RTX 3090 → vLLM + MTP spec decode = 85 tok/s - That's 6× more speed for the same money - TurboQuant PR #21089 is open for llama.cpp mainline Key learnings: - Pipeline Parallel > Tensor Parallel for identical GPUs - NUMA awareness = +5-10% prefill on dual socket - TurboQuant is real and it's a gamechanger - 10-year-old hardware can run frontier models locally --- Thanks @DrTBehrens (Support) and @badlogicgames for PI and we can work with 65K context ... not possible with other tools ... --- see ya!



JUST IN: An AI data center moratorium is now projected to pass this year as protests intensify nationwide. 85% chance.


NEW: on the @NewcomerMedia podcast, Anthropic's philosopher queen @AmandaAskell. Meet the person charged with developing Claude's personality and ethical core. I ask whether Claude experiences consciousness. She's not ruling it out.
















