Sabitlenmiş Tweet

How can we enable finetuning of humanoid manipulation policies, directly in the real world?
In our new paper, Residual Off-Policy RL for Finetuning BC Policies, we demonstrate real-world RL on a bimanual humanoid with 5-fingered hands (29 DoF) and improve pre-trained policies with ~15-75 minutes of robot interaction.
By learning residual corrections on frozen BC policies using sample-efficient off-policy RL, we achieve significant improvements in sample efficiency, enabling policy finetuning directly on the hardware — to our knowledge, one of the first examples of this on a humanoid with bimanual dexterous hands.
(If you know of other examples, let me know!)
English














