Youngsun Wi
60 posts

Youngsun Wi
@WiYoungsun
PhD Student doing research on robotic manipulation 🦾🤖| Currently at @UMRobotics | Interned at @AIatMeta @NVIDIARobotics @amazondrives

🎉 Accepted to #CVPR2026 🔎 VLMs fall short on complex spatial reasoning. They struggle with: • Precise geometric perception • Multi-step reasoning grounded in 3D • Adapting perception dynamically to task and context 🚀 We propose a solution: visual tool-augmented spatial reasoning — bridging perception and multi-step reasoning through diverse, error-aware, adaptive vision tool use. And we go one step further: 🤖 enabling robot control by treating robots themselves as tools. Our framework is powered by: ⚡ Double Interactive RL (DIRL), a new training framework combining demonstrations + real exploration 🛠 Real interaction with specialized computer vision models during RL 🤖 Toolshed, a scalable, asynchronous system for multimodal execution of vision tools and robots-as-tools 🔗 Project: spacetools.github.io > Code: github.com/spacetools/Spa… Toolshed is released with frontier-model demos. Full training & evaluation release coming soon. Done during my internship @NVIDIA — big thanks to the amazing collaborators! 🙌 #CVPR2026 #EmbodiedAI #ComputerVision #Robotics #ReinforcementLearning #MultimodalAI







Human videos are great for robot learning, but they critically lack the sense of touch needed for complex manipulation tasks! Introducing OSMO, an open-source tactile glove that captures rich contact signals during human demonstrations for direct, successful transfer to robots.



