Lucebox
26 posts














What hardware actually powers open-source AI? Not benchmarks. Not vendor marketing. Real-world community usage. We’re launching @huggingface Hardware: → trending GPUs & CPUs → VRAM distribution → inference hardware trends → what the OSS AI ecosystem really runs on



⚡ Luce Megakernel just proved the NVIDIA efficiency gap is a software problem not a hardware one 🔬 a 2020 RTX 3090 at 220W now matches Apple M5 Max efficiency and delivers 1.8x the throughput 🔹 413 tok/s decode vs 267 tok/s on llama.cpp — same GPU, different software 🔹 1.87 tok/J — matching Apple M5 Max at less than a third of the system cost 🔹 All 24 layers of Qwen3.5-0.8B fused into a single CUDA kernel — zero CPU round trips 🔹 25x faster than PyTorch HuggingFace on the same hardware 🔹 Hybrid DeltaNet and Attention architecture — the first megakernel ever built for this pattern 🔥 Full breakdown and live benchmark below 👇 youtu.be/e6jY4goVIu0







