
notallthere
202 posts

notallthere
@notallthere_net
Let’s talk about local computer vision lllms




The fresh new B70 PCIe 4.0 build is running Minimax now. Stock was about 13-tps; currently 83-tps (decode) after applying my optimizations. It's a bit short of the 93 I had on the PCIe 5.0 mobo, but I might have missed some patches. Sanity check tests passed. See reply for more












NVIDIA has done the impossible and nobody's talking about it. They trained a 12 BILLION parameter LLM in 4-bit precision on 10 trillion tokens. For years, the AI industry has been stuck. If you wanted to train a world-class AI, you had to use 16-bit or 8-bit precision. Going lower to 4-bit, was a death sentence for the model. It would become unstable, "hallucinate" its own math, and eventually collapse. But NVIDIA proved that "impossible" was just a math problem. They used a new format called NVFP4. Instead of a standard, rigid structure, NVFP4 uses "micro-scaling." It groups numbers into tiny blocks and applies individual scaling factors to each one. It’s like giving the AI a pair of high-definition glasses for its own data, allowing it to see fine details even with 75% less memory. The result is a total paradigm shift: - 2× to 3× faster arithmetic performance. - 50% reduction in memory usage. - Near-zero loss in intelligence. The researchers compared the 4-bit model against a massive 8-bit baseline. The curves are identical. On MMLU, GSM8K, and coding benchmarks, the "tiny" 4-bit version performed within 0.1% of the more expensive model. This is an economic earthquake. Training a frontier model used to require tens of thousands of GPUs and months of time. NVIDIA just showed we can get the same results with half the hardware and a fraction of the electricity.















