Simone Alghisi retweetledi

🧮 Counting to 10 is easy, unless you’re a Vision-Language Model.
Our new study dissects why VLMs fail at basic reasoning & how we can teach them to do better (= +21% accuracy).
🧠 Towards models that don’t just see, but reason!!
🔗 arxiv.org/abs/2510.19555
English

