
🤯 Think better visuals mean better world models? Think again. 💥 Surprise: Agents don’t need eye candy— they need wins. Meet World-in-World, the first open benchmark that ranks world models by closed-loop task success, not pixels. We uncover 3 shocks: 1️⃣ Visuals ≠ utility 2️⃣ Action data > bigger models 3️⃣ Scaling test-time compute = more success 🤗 huggingface.co/papers/2510.18… 🌍 world-in-world.github.io 📄 arxiv.org/abs/2510.18135 github.com/World-In-World…













