LeRobot@LeRobotHF
🚀 Introducing X-VLA ; LeRobot’s new soft-prompted Vision-Language-Action model.
X-VLA is built to scale across many embodiments: different robots, cameras, action spaces, and environments, all handled by one unified transformer backbone.
- Generalist across robots (Franka, WidowX, Agibot, sim + real)
- Soft-prompt domain IDs let the model adapt to new hardware with tiny learnable embeddings
- Flow-matching + transformer core for smooth, continuous 50 Hz control
- Pretrained on a mixed-embodiment dataset spanning 7+ platforms and diverse tasks
- Fine-tune on any dataset using one of the 6 checkpoints we provide out of the box.