DeepThink Lab
25 posts

DeepThink Lab がリツイート

🎉 Accepted to #CVPR2026
🔎 VLMs fall short on complex spatial reasoning.
They struggle with:
• Precise geometric perception
• Multi-step reasoning grounded in 3D
• Adapting perception dynamically to task and context
🚀 We propose a solution: visual tool-augmented spatial reasoning —
bridging perception and multi-step reasoning through diverse, error-aware, adaptive vision tool use.
And we go one step further:
🤖 enabling robot control by treating robots themselves as tools.
Our framework is powered by:
⚡ Double Interactive RL (DIRL), a new training framework combining demonstrations + real exploration
🛠 Real interaction with specialized computer vision models during RL
🤖 Toolshed, a scalable, asynchronous system for multimodal execution of vision tools and robots-as-tools
🔗 Project: spacetools.github.io
> Code: github.com/spacetools/Spa…
Toolshed is released with frontier-model demos.
Full training & evaluation release coming soon.
Done during my internship @NVIDIA — big thanks to the amazing collaborators! 🙌
#CVPR2026 #EmbodiedAI #ComputerVision #Robotics #ReinforcementLearning #MultimodalAI
English

@AndrewLampinen Wow! Understanding mem/gen through the implicit bias or 'abilities' of networks is truly exciting!
In our ICLR2026 paper (arxiv.org/abs/2512.20963), we prove diffusion models also generalize when learning structures from data and memorize when they store&match training samples.
English

Experiments indicate our results hold in nonlinear Transformers (GPT-2) and nonlinear function classes, which is beyond our simplified theoretical setting.
Link to arXiv version: arxiv.org/abs/2505.14808
English

Experiments on synthetic and real data show similar conclusions hold beyond our theoretical setting.
Link to arXiv version: arxiv.org/abs/2501.02364
English
DeepThink Lab がリツイート

'Understanding Deep Representation Learning via Layerwise Feature Compression and Discrimination', by Peng Wang, Xiao Li, Can Yaras, Zhihui Zhu, Laura Balzano, Wei Hu, Qing Qu.
jmlr.org/papers/v26/24-…
#classification #deep #features
English

🔷 Learned representations can indicate whether the model is learning underlying data structures (with balanced, informative representations) or memorizing training data (with spiky representations). We elaborate below.
Read the full paper on arXiv: arxiv.org/abs/2512.20963
English






