
Toward general dexterity
We are training robot policies that learn a rich and coherent physical representation of the world by conditioning on multimodal observations
Here’s a small glimpse of what we’ve been building:
Task: Pick the ramen cup and place it in the box
Given the current world state, the model generates a multimodal trajectory; here we show the decoded video and the corresponding actions executed on the humanoid
English