NVIDIA AI
12.8K posts

NVIDIA AI
@NVIDIAAI
Teaching your AI new tricks.




Blackwell's NVFP4 is cooking 🔥 I have been trying to figure out ways to maximize the @NVIDIAAI Spark when using LLMs so i tried some concurrency tests. Now i need to figure out how to get an agent to take advantage of this. Nemotron-3-Nano-Omni (30B-A3B-Reasoning) on DGX Spark GB10 @ 50k context, post-warmup:FP8 vs NVFP4 throughput (tok/s) user/request at a time= 1: 39 → 47 (+21%) user/request at a time= 4: 96 → 132 (+38%) user/request at a time= 8: 178 → 198 (+11%) ← peak user/request at a time=16: 171 → 199 (+16%) Same concurrency curve (sweet spot at n=8, dip at 9-10, secondary peak at 16), but NVFP4 wins at every single level. Biggest gains in the moderate concurrency range (n=4-7: +31-38%). At peak, +11% throughput and noticeably snappier latency. Full benchmarks 👇

"World models" is one of the buzziest yet ambiguous terms in AI right now. I started this video with many questions: - How are they different from video generation? - Can they do more than AI slop? - Can LeCun be trusted given that he wears knee-high white socks? Many thanks to @tjgalda and @NVIDIAAI for helping me answer (most) of these questions!

1/ We just installed IntelFactor at a knife factory in an enterprise factory. 2/ Real-time pass/fail on every blade. 3/ Workers love the dashboard on their phones. Video in next post ↓

16 local AI agents streaming at once! MiniMax M2.7 NVFP4 — 2x GB10, no cloud APIs.























