Xinhu Li

@xinhuliusc

Katılım Eylül 2021

95 Takip Edilen35 Takipçiler

Xinhu Li@xinhuliusc·22 Nis

We are excited to be presenting LfCD at ICLR 2026! Please reach out if you want to chat! Paper: arxiv.org/abs/2510.09096. Website: sites.google.com/view/constrain… Joint work with — @Ayushj240 @zhaojingy_1026 @yigitkkorkmaz @ebiyik_ — huge thanks to an amazing team!

English

276

Xinhu Li@xinhuliusc·22 Nis

Our solution is to allow the robot to explore and find shorter paths to the goal, without any environmental reward. We use the constrained demonstrations to infer a state-only reward signal that measures task progress, and self-label reward for unknown states using “temporal interpolation”. This enables the agent to use actions and behaviors the expert never had access to.

English

300

Xinhu Li@xinhuliusc·22 Nis

Teleoperating robots is difficult due to high DoF. As a result, most demonstrations are collected under interface or safety constraints (e.g., keyboard / joystick). For example, a joystick can move a robotic arm only in a 2D plane, even though the robot is able to operate in a higher-dimensional space. Direct imitation of such behavior leads to suboptimal robots! As a result, demonstrations are inherently suboptimal and limit what robots can learn.

GIF

English

9.1K

Xinhu Li retweetledi

Stephen James@stepjamUK·18 Ağu

𝗜'𝘃𝗲 𝗵𝗲𝗮𝗿𝗱 𝘁𝗵𝗶𝘀 𝗮 𝗹𝗼𝘁 𝗿𝗲𝗰𝗲𝗻𝘁𝗹𝘆: "𝗪𝗲 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝗼𝘂𝗿 𝗿𝗼𝗯𝗼𝘁 𝗼𝗻 𝗼𝗻𝗲 𝗼𝗯𝗷𝗲𝗰𝘁 𝗮𝗻𝗱 𝗶𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗲𝗱 𝘁𝗼 𝗮 𝗻𝗼𝘃𝗲𝗹 𝗼𝗯𝗷𝗲𝗰𝘁 - 𝘁𝗵𝗲𝘀𝗲 𝗻𝗲𝘄 𝗩𝗟𝗔 𝗺𝗼𝗱𝗲𝗹𝘀 𝗮𝗿𝗲 𝗰𝗿𝗮𝘇𝘆!" Let's talk about what's actually happening in that "A" (Action) part of your VLA model. The Vision and Language components? They're incredible. Pre-trained on internet-scale data, they understand objects, spatial relationships, and task instructions better than ever. But the Action component? That's still learned from scratch on your specific robot demonstrations. 𝗛𝗲𝗿𝗲'𝘀 𝘁𝗵𝗲 𝗿𝗲𝗮𝗹𝗶𝘁𝘆: Your VLA model has internet-scale understanding of what a screwdriver looks like and what "tighten the screw" means. But the actual motor pattern for "rotating wrist while applying downward pressure"? That comes from your 500 robot demos. 𝗪𝗵𝗮𝘁 𝘁𝗵𝗶𝘀 𝗺𝗲𝗮𝗻𝘀 𝗳𝗼𝗿 "𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻": • 𝗩𝗶𝘀𝗶𝗼𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Recognises novel objects instantly (thanks to pre-training) • 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Understands new task instructions (thanks to pre-training) • 𝗔𝗰𝘁𝗶𝗼𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Still limited to motor patterns seen during robot training Ask that same robot to "unscrew the bottle cap" and it fails because: • Vision: Recognises bottle and cap • Language: Understands "unscrew" • Action: Never learned the "twist while pulling" motor pattern 𝗧𝗵𝗲 𝗵𝗮𝗿𝗱 𝘁𝗿𝘂𝘁𝗵 𝗮𝗯𝗼𝘂𝘁 𝗩𝗟𝗔 𝗺𝗼𝗱𝗲𝗹𝘀: The "VL" gives you incredible zero-shot understanding. The "A" still requires task-specific demonstrations. We've cracked the perception and reasoning problem. We haven't cracked the motor generalisation problem.

English

390

51.3K

Xinhu Li retweetledi

Erdem Bıyık@ebiyik_·7 Ağu

This paper has now received the "Outstanding Paper Award on Empirical Reinforcement Learning Research" at #rlc2025 @RL_Conference🥳 Congratulations to all my co-authors! If you're interested in recruiting a best-paper-award-winner student, Xinhu Li will apply for PhD this year!

Ayush Jain@Ayushj240

At @RL_Conference🍁, I'm presenting a talk and a poster on Aug 6, Track 1: Reinforcement Learning Algorithms. We find that Deterministic Policy Gradient methods like TD3 often get stuck at local optima under complex Q-functions, and propose a novel actor architecture! 🧵

English

Xinhu Li@xinhuliusc·7 Ağu

Honored to be a co-author on our work that received the Outstanding Paper Award on Empirical Reinforcement Learning Research at @RL_Conference! Huge thanks to my amazing co-authors and mentors @JosephLim_AI and @ebiyik_!

Ayush Jain@Ayushj240

Honored that our @RL_Conference paper won the Outstanding Paper Award on Empirical Reinforcement Learning Research! 📜Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-Functions 📎openreview.net/forum?id=H3jcT… Grateful to my advisors @JosephLim_AI and @ebiyik_!

English

Keşfet

@Ayushj240 @zhaojingy_1026 @yigitkkorkmaz @ebiyik_ @RL_Conference @JosephLim_AI @elonmusk @BarackObama