rishi
2.2K posts



We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between 1⃣Autoregressive, 2⃣Diffusion, and 3⃣Self-Speculation decoding by simply changing the attention pattern/mask. One model Three decoding modes. No extra draft models. No architecture changes. Just significantly better efficiency across different concurrency levels. Up to 4× higher real throughput for a single user. 🤗 HF Collection: huggingface.co/collections/nv…, open license 🛜 Project page: research.nvidia.com/publication/20… 📰 Tech report: bit.ly/Nemotron-Labs-… Details below 👇



Leading the training for this model was a privilege. Training diffusion style models will be the future regardless of whether it is discrete/speculative or continuous.

Leading the training for this model was a privilege. Training diffusion style models will be the future regardless of whether it is discrete/speculative or continuous.

We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵


We present ZAYA1-8B-Diffusion-Preview, the first diffusion language model trained on @AMD. Autoregressive LLMs generate one token at a time; diffusion generates a block in parallel, speeding up inference. We show a 4.6-7.7x decoding speedup with minimal quality degradation 🧵








Today we're releasing ZAYA1-74B-Preview, a major milestone in scaling pretraining on @AMD. ZAYA1-74B-Preview is a 4B active / 74B total MoE. This preview model is a strong pre-RL base checkpoint. The final post-trained reasoning model is coming soon. 🧵


