Our newest model, π0.7, has some interesting emergent capabilities: it can control a new robot to fold shirts for which we had no shirt folding data, figure out how to use an appliance with language-based coaching, and perform a wide range of dexterous tasks all in one model!
π, But Make It Fly ✈️
We fine-tuned π0, a VLA model pretrained entirely on manipulators, to fly a drone that picks up objects, navigates through gates, and composes both skills from language commands.
With RL, the robot can learn very precise tasks, like fastening a zip tie, and can actually do it more consistently and more quickly than even human teleoperation.
We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.
I'm extremely excited to announce that we've successfully inferenced π0.5 on our excavator!
We've collected a massive corpus of real-world data with natural language labels from operators in the industry and are using it to create some really cool policies. Here's our first demo of it successfully completing a task with just 200 trajectories. More on the way :)
Read our blog post: labs.actor/research/vla-e…@physical_int
Apart from solving new tasks, memory also allows our policies to be more robust: we show early signs of in-context adaptation, where the robot learns to adapt its behavior on-the-fly by learning from its past mistakes.
We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory.
Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇