
🎉Thrilled to announce EAGLE 3.1 - the next evolution of speculative decoding from @EagleCorp, developed by @hongyangzh, @dogacel0, and the EAGLE team in collaboration with vLLM @vllm_project and TorchSpec @lightseekorg! 💡EAGLE 3.1 introduces a new FC normalization + post-normalization hidden-state feedback architecture that significantly improves long-context robustness, acceptance length, and serving stability across real-world inference environments. Shoutout to @NVIDIA who has been instrumental in the large-scale training, benchmarking, and inference validation of EAGLE 3.1 to help bring this next step in inference acceleration to production environments. For EAGLE 3.1, the EAGLE team identified attention drift as a key bottleneck behind deeper-step acceptance-length degradation in speculative decoding. ✨What's new: • Up to 2× longer acceptance length in long-context • Stronger long-context + chat-template robustness • More stable serving across diverse prompts or environments • Native vLLM support • TorchSpec training support • Open-source Kimi K2.6 EAGLE 3.1 draft model 🔗 Blog: vllm.ai/blog/2026-05-2…
