Zhou Zihan

19 posts

Zhou Zihan

@zzhouredo

CS PhD student at UofT.

Toronto Katılım Nisan 2023

28 Takip Edilen44 Takipçiler

Zhou Zihan@zzhouredo·24 Eki

SPIRE was the result of a wonderful collaboration with @animesh_garg, Dieter Fox, @CaelanGarrett, and @AjayMandlekar. Website: sites.google.com/view/spire-cor… arXiv: arxiv.org/abs/2410.18065

English

132

Zhou Zihan@zzhouredo·24 Eki

SPIRE agents reach an 80% success rate in 8 out of 9 challenging long-horizon tasks while only using 60% of the time needed by BC agents to finish the task. SPIRE achieves the same level of proficiency as alternatives using only 17% of the demonstrations. 🧵 8/

English

154

Zhou Zihan@zzhouredo·24 Eki

One common issue with using RL-based finetuning is that new behavior does not preserve safety considerations implicit in the human teleoperation data. In contrast, our RL finetuning scheme allows SPIRE agents to preserve human behavior, allowing for safe deployments. 🧵 6/

English

500

Zhou Zihan@zzhouredo·24 Eki

With RL fine-tuning, SPIRE massively improves both the success rate and task completion speed over a naive BC agent, allowing us to train superior policies with as few as 10 demos. 🧵 5/

English

111

Zhou Zihan@zzhouredo·24 Eki

SPIRE trains a residual policy with RL to fine-tune the BC policy. A KL penalty term prevents it from deviating from the BC policy. This allows for structured exploration and use of sparse task completion rewards - no more painful reward tuning! 🧵 4/

English

121

Zhou Zihan@zzhouredo·24 Eki

SPIRE agents consistently solve long-horizon tasks, including those with 5 sequential sections, using only a handful of demonstrations. 🧵 3/

English

108

Zhou Zihan@zzhouredo·24 Eki

Want your robot to make a cup of coffee but don’t want to spend hours collecting demos? We introduce SPIRE, a system that solves long-horizon manipulation tasks with limited demos through planning, Behavior Cloning, and Reinforcement Learning. #CoRL2024 #NVIDIAResearch 👇 🧵 1/

English

1.8K

Zhou Zihan@zzhouredo·24 Eki

SPIRE decomposes long-horizon tasks into shorter subtasks using task and motion planning. Challenging subtasks not solvable by planning are deferred to a human teleoperator. Then, it trains behavior cloning policies with the collected data and fine-tunes them with RL. 🧵 2/

English

139

Zhou Zihan@zzhouredo·1 May

This project is done under the supervision of @animesh_garg. Check out our paper and code repo to learn more! Paper: openreview.net/pdf?id=NDWl9qc… Code: github.com/pairlab/iclr-2… (10/10)

English

Zhou Zihan@zzhouredo·1 May

This is a video of a SEA agent unlocking collect_diamond, the hardest achievement in Crafter. (9/n)

GIF

English

134

Zhou Zihan@zzhouredo·1 May

Exploration in complex environments has always been a major challenge in RL, especially in procedurally-generated environments where state-oriented methods can be ineffective. In our ICLR 2023 submission, we propose to explore the concept of achievements as a solution. (1/n)

English

425

Keşfet

@animesh_garg @CaelanGarrett @AjayMandlekar @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates