
🧠💡 What if your 7B model could beat GPT-4o and Qwen2.5-72B—using just 11k training samples? No distillation. No warm-start. Just smart data and reinforcement learning. Inspired by Moravec’s Paradox, we let the model decide what's actually hard. 🚨 New paper: "SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement" We show how ThinkLite-VL-7B achieves SoTA on MathVista—75.1%, surpassing much larger models. 👇 Here’s how we did it: 🔗 arxiv.org/abs/2504.07934 🧠 Code: github.com/si0wang/ThinkL… #AI #VisionLanguageModels #ReinforcementLearning #MachineLearning #LessIsMore












