Results:
15 min data → huge gains
Up to 3× faster execution
Beats human teleop on insertion
This is the real Physical AI loop:
foundation model → deploy → online RL → get better while working
Robots that improve on the job is where this is going.
Physical Intelligence keeps dropping bangers.
VLAs can do everything… until the last millimeter.
Screw insertion, cable plugging, tight alignment → demos aren’t enough.
RLT adds online RL on top of a frozen VLA so the robot can refine precision skills in minutes, not weeks.
A full stack system: open set perception, 3D reconstruction, physics simulation, and VLM based scoring. Robots don’t just act, they simulate, evaluate, and choose what works.
strong success rates across diverse tasks
This is the key to general purpose manipulation in the wild
It is time to rethink robot learning:
instead of copying demos, they build a digital twin of the world and plans inside it, unlocking zero shot manipulation on completely unseen objects and tasks.
Result: real-time control with video-level reasoning.
Why this works:
Robot control becomes easier if the model already knows how the world evolves.
With video priors:
10× better sample efficiency
2× faster convergence
SOTA on LIBERO, SIMPLER, and real bimanual dexterity.
VLAs learn control from images, but they don’t understand physics.
Video models do.
mimic-video proposes Video-Action Models:
use a pretrained video diffusion model to predict future trajectories, then decode actions from its latent plan.
Results are strong:
+23% button pressing
+11% insertion
+28% real robot pick-place
Shows something important:
For flow-based robot policies, action coherence matters more than stronger conditioning.
Test-time guidance may be a big direction for VLA control.
Flow / diffusion VLA policies are powerful, but imitation learning makes them copy human noise too.
Jerks, pauses, jitter → action incoherence → manipulation failures.
ACG fixes this with test-time guidance that makes flow policies generate smoother, more stable actions.
Key trick in HALO:
Treat payload as structured sim-to-real gap, not noise.
Stage 1 → calibrate nominal robot
Stage 2 → identify payload mass + CoM
This avoids over-randomization and keeps agility, unlike domain randomization which becomes too conservative.
Heavy payloads break sim-to-real for humanoids.
HALO fixes this with differentiable system ID in MuJoCo XLA + a two-stage calibration that separates base model errors from payload dynamics.
Result: zero-shot RL policies that still work when the robot carries heavy loads.
Real robot results are impressive:
Humanoid with heavy weights achieves
• 73% less position drift
• 72% lower jump angle error
• 100% success on agile motions
Differentiable simulation + SysID might be the missing piece for reliable humanoid sim-to-real.
Results are wild:
TacVLA beats Pi0.5 + diffusion policies on real robot tasks
• 83% success on constraint-locked disassembly
• 70% in-box picking with occlusion
• recovers from human disturbance
Touch + VLA + gating might be the recipe for real-world manipulation.
VLAs struggle with contact-rich manipulation because vision isn’t enough.
TacVLA adds tactile sensing to VLA policies with contact-aware gating, so touch is used only when contact happens.
Result: much better disassembly, picking, and robustness under occlusion.