
Diffusion-based approach beats autoregressive models at solving puzzles and planning 🤖 Original Problem: Autoregressive LLMs struggle with complex reasoning and long-term planning tasks despite their impressive capabilities. They have inherent difficulties maintaining global coherence and handling deliberate planning scenarios. ----- 🔧 Solution in this Paper: • Introduces Multi-granularity Diffusion Modeling (MDM) that prioritizes subgoals based on difficulty during learning • Uses a multi-view learning framework where challenging subgoals are decomposed into manageable interrelated views • Implements sequence-level and token-level reweighting mechanisms to enhance training efficiency • Employs an easy-first TopK decoding strategy for superior performance ----- 💡 Key Insights: • Not all tokens are equally difficult to learn in autoregressive models • Diffusion models can effectively learn difficult subgoals that elude autoregressive approaches • The performance gap between MDM and autoregressive models widens as task difficulty increases • Global coherence is better maintained through multi-step denoising process ----- 📊 Results: • On Countdown task: MDM achieves 91.5% accuracy vs 45.8% for autoregressive models • On Sudoku: MDM reaches 100% accuracy vs 20.7% for autoregressive models • With just 6M parameters, MDM outperforms 303M parameter GPT-2 and 13B parameter LLaMA • 10x faster inference with single diffusion step while maintaining superior accuracy












