Runpei Dong

115 posts

Runpei Dong

@RunpeiDong

CS PhD student @UofIllinois | Previously @Tsinghua_IIIS and XJTU | Interested in robot learning & machine learning

Champaign, IL Katılım Nisan 2020

1.4K Takip Edilen427 Takipçiler

Sabitlenmiş Tweet

Runpei Dong@RunpeiDong·19 Şub

Want to ask your humanoid to get novel objects you want from a novel table? We introduce HERO, first achieving open-vocabulary visual loco-manipulation using human language queries! Now we can ask the humanoid to see and grasp the target object (eg, a carrot instead of a tissue box) with whole-body coordination. How do we do it? Key designs and findings in this work: We propose a residual-aware end-effector tracking policy that tracks the target accurately in a closed loop. We find that the robot's forward kinematics are quite inaccurate, and thus we propose residual neural forward models that correct the end-effector FK and base leg odometry. We design a modular system powered by visual foundation models that achieves generalizable grasping capabilities with 83% success rate on novel daily objects and daily scenes. Check out our new project: Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation Project page: hero-humanoid.github.io Paper: arxiv.org/pdf/2602.16705

English

1.9K

Runpei Dong retweetledi

Tairan He@TairanHe99·4d

GR00T-VisualSim2Real is now open source! VIRAL and DoorMan are now available with training code, simulation assets, and the full recipe for bringing visual sim-to-real loco-manipulation skills to your own humanoids. Repo: github.com/NVlabs/GR00T-V…

Tairan He@TairanHe99

Zero teleoperation. Zero real-world data. ➔ Autonomous humanoid loco-manipulation in reality. Introducing VIRAL: Visual Sim-to-Real at Scale. We achieved 54 autonomous cycles (walk, stand, place, pick, turn) using a simple recipe: 1. RL 2. Simulation 3. GPUs Website: viral-humanoid.github.io Arxiv: arxiv.org/abs/2511.15200 Deep dive with me: 🧵

English

601

96.1K

Runpei Dong retweetledi

Peter Stone@PeterStone_TX·23 Nis

I'm super excited and proud about this result. A fantastic team at Sony AI, led by Peter Duerr, put together just the right mix of science and engineering to create the first robot to beat a professional athlete in a real-world competitive sport!

English

118

4.6K

Runpei Dong retweetledi

Yaru Niu@yaru_niu·15 Nis

A touch-aware humanoid manipulation policy that cleans the lab for you🧹🧪 Introducing Humanoid Touch Dream: a real-world system for dexterous, contact-rich humanoid loco-manipulation. Our key idea is simple: the policy predicts future hand forces and tactile latents alongside actions, within a single-stage training framework. humanoid-touch-dream.github.io 1/7

English

277

68.7K

Runpei Dong retweetledi

Shaowei Liu@stevenpg8·10 Nis

📢MoRight: Motion Control Done Right "What if your video model actually understood cause and effect?" Existing motion-controlled video models entangle camera and object motion, and treat everything as kinematic displacement. MoRight changes both. 🔥 Motion Causality — MoRight decomposes motion into actions & consequences. Give an action → MoRight predicts consequences (aka motion simulation) . Give a desired outcome → MoRight recovers the driving action (aka motion planning). Not merely displacing pixels. 🎬 Disentangled Control — MoRight separates camera and object motion, allowing users to independently control each of them. No entanglement. Project Page: research.nvidia.com/labs/sil/proje… Paper: arxiv.org/abs/2604.07348

English

234

30.9K

Runpei Dong retweetledi

Yining Hong@yining_hong·4 Nis

I wrote a blog "Three Levels of TTT" — Test-Time Training, Meta Training, World Models, 3D & Self-Supervised Learning: evelinehong.github.io/ttt_three_leve… The three levels are: 🧠 Episode — hippocampus encodes fast, neocortex consolidates slow. No labels needed. 🌱 Individual Lifetime — there is no train/test split. Every minute is testing as well as training. 🌍 Natural Selection & Evolution — continuous adaptation integrates into the species' prior. Each level is the meta-training of the level below. Each level is the test-time training of the level above. Priors flow down to the lower level; consolidated adaptations flow up to the next higher level. 🧠 The self-supervised signal needs no labels — it comes from the structure of experience itself: what did I expect vs. what happened? What follows what? What appears together? 🌱 This consolidates across a lifetime — every minute is testing as well as training. Given the priors of the human species, an infant develops 3D perception, object permanence, intuitive physics — not from instruction, but from reaching, crawling, acting. The world teaches the rest through self-supervised learning. 🌍 But what gives us those priors? Two front-facing eyes, exactly the right distance apart for depth to emerge. Pain and proprioception as free error signals. A face-detection circuit running at birth. Billions of years of test-time feedback from individual lives, accumulated and frozen into hardware. Evolution doesn't optimize behavior — it optimizes the prior you start from.

English

266

18.5K

Runpei Dong retweetledi

Siyuan Huang@siyuanhuang95·17 Mar

Excited to introduce OmniClone, a robust teleoperation system for humanoid mobile manipulation. While systems like TWIST2 and SONIC paved the way, we put efforts into solving the critical stability and scaling gaps. 1/ 📊 Moving past "vibe-based" testing. We’ve built a comprehensive diagnostic benchmark to systematically evaluate whole-body teleoperation. No more trial-and-error—get the actionable insights needed for true policy optimization. 2/ 👤 Universal Human-to-Robot Mapping. Teleop often breaks when switching operators. OmniClone mitigates biases from hardware fluctuations and, crucially, diverse human body shapes, ensuring high-stability control regardless of the person in the suit. 3/ 🚀 System Optimizations for Whole-body Manipulation Policy. By optimizing for affordability and reproducibility, OmniClone provides the high-fidelity pipeline necessary to collect data and train humanoid whole-body policies at scale. fully The model checkpoints and deploy code are now fully released—welcome to play with it! 📦 📄 Paper: arxiv.org/abs/2603.14327 🌐 Project: omniclone.github.io 💻 Code: github.com/yixxuan-li/Omn…

English

137

13.5K

Runpei Dong retweetledi

Shuran Song@SongShuran·17 Mar

Turning a behavior "Prior" to a high-performing "Pro" in hours⚡️with DICE-RL (Distribution Contractive RL Finetuning)💡 Check out @s_zhanyi 's 🧵 for the secret source 😉

Zhanyi Sun@s_zhanyi

We find that RL post-training can substantially improve BC policies without teaching them anything fundamentally new. So what is RL doing? In DICE-RL, it contracts a broad behavior prior toward high-value modes. (1/n) zhanyisun.github.io/dice.rl.2026/

English

10.9K

Runpei Dong retweetledi

Zhikai Zhang@Zhikai273·15 Mar

🎾Introducing LATENT: Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data Dynamic movements, agile whole-body coordination, and rapid reactions. A step toward athletic humanoid sports skills. Project: zzk273.github.io/LATENT/ Code: github.com/GalaxyGeneralR…

English

162

637

4.1K

1.4M

Runpei Dong@RunpeiDong·11 Mar

Humanoid visual loco-manipulation is extremely challenging, especially for tasks like apple pealing. This is a very impressive result, congrats!

Sharpa@SharpaRobotics

We believe we’re the first robotics company to demonstrate a robot peeling an apple with dual dexterous human-like hands. This breakthrough closes a key gap in robotics, achieving bimanual, contact-rich manipulation and moving far beyond the limits of simple grippers. 🧵↓ Today’s AI models (VLMs) are excellent at perception but struggle with action. Controlling high-degree-of-freedom hands for tasks like this is incredibly complex, and precise finger-level teleoperation is nearly impossible for humans. Our first step was a shared-autonomy system: rather than controlling every finger, the operator triggers pre-learned skills like a “rotate apple or tennis ball” primitive via a keyboard press or pedal. This makes scalable data collection and RL training possible. How does the AI manage this? We created "MoDE-VLA" (Mixture of Dexterous Experts). It fuses vision, language, force, and touch data by using a team of specialist "experts," making control in high-dimensional spaces stable and effective. The combination of these two innovations allows for seamless, contact-rich manipulation. The human provides high-level guidance, and the robot executes the complex in-hand coordination required. This work paves the way for robots that can safely handle delicate tasks in human environments. Want the full technical details? 📄 Read the full research paper: arxiv.org/abs/2603.08122 Visit us at NVIDIA GTC Booth #1838, Hall 3 to learn more! #Robotics #AI #DexterousManipulation #VLA #NVIDIAGTC @nvdia @nvidiagtc

English

Runpei Dong retweetledi

Yunzhu Li@YunzhuLiYZ·5 Mar

For a long time, I was skeptical about action-conditioned video prediction for robotics. Many models look impressive, but once you ask them to handle long-horizon manipulation with real physical interaction, things quickly fall apart (e.g., Genie is amazing but mostly focused on navigation). This project changed my mind. I'm beyond excited to share Interactive World Simulator, a project we have been working on for the past ~1.5 years 🤖 One of the first world models that produces convincing results for long-horizon robotic manipulation involving complex physical interactions, across a diverse range of objects (rigid objects, deformables, ropes, object piles). It directly unlocks scalable data generation for robotic policy training and policy evaluation. Try it yourself (no installation needed): yixuanwang.me/interactive_wo… Play directly with the simulator in your browser. Key Takeaways: 1️⃣ 15 Hz long-horizon action-conditioned video prediction for 10+ minutes on a single RTX 4090 GPU 2️⃣ Visual and dynamic fidelity: people often ask how much sim data equals one real data point. In our experiments, it turns out to be close to one-to-one using the Interactive World Simulator 3️⃣ Stress testing matters: we emphasize interactive stress testing to understand robustness and stability and to build trust in the simulator 4️⃣ The model is trained with only ~6 hours of real-world random interaction data on a single GPU. Imagine what happens if we scale this 1000× or even 1M× Huge credit to @YXWangBot, who led this effort with countless hours of work on data collection, training recipes, and system design. I'm incredibly proud of the work he did here! Enjoy the demos and videos. We also fully open-sourced the codebase for anyone interested in applying this to their own tasks. #Robotics #RobotLearning #WorldModels #EmbodiedAI

Yixuan Wang@YXWangBot

1/ World models are getting popular in robotics 🤖✨ But there’s a big problem: most are slow and break physical consistency over long horizons. 2/ Today we’re releasing Interactive World Simulator: An action-conditioned world model that supports stable long-horizon interaction. 3/ Key result: ✅ 10+ minutes of interactive prediction ✅ 15 FPS ✅ on a single RTX 4090🔥 4/ Why this matters: it unlocks two critical robotics applications: 🚀 Scalable data generation for policy training 🧪 Faithful policy evaluation 5/ You can play with our world model NOW at #interactive-demo" target="_blank" rel="nofollow noopener">yixuanwang.me/interactive_wo…. NO git clone, NO pip install, NO python. Just click and play! NOTE ⚠️ ALL videos here are generated purely by our model in pixel space! They are **NOT** from a real camera More details coming 👇 (1/9) #Robotics #AI #MachineLearning #WorldModels #RobotLearning #ImitationLearning

English

368

75.3K

Runpei Dong retweetledi

Y Combinator@ycombinator·6 Mar

Origami Robotics is building high-DOF robotic hands with in-joint motors and a co-designed data-collection glove to eliminate the embodiment gap by collecting high-quality, real-world data at scale. Congrats on the launch, @DanielXieee and @QuanliangX! ycombinator.com/launches/Pcl-o…

English

299

86.9K

Runpei Dong@RunpeiDong·4 Mar

Excited to share our new work ULTRA! We explore how to enable autonomous humanoid loco-manipulation by specifying what should happen to the object, rather than commanding robot motions. ULTRA is a unified multimodal controller, supporting mocap reference tracking, object-goal control, and egocentric visual inputs.

Xialin He@Xialin_He

Real-world loco-manipulation demands more than replaying fixed reference motions. We argue that true autonomy requires two capabilities: 1️⃣ flexibly leveraging whatever signals are available — dense references, partial cues, state estimates, or egocentric perception 2️⃣ remaining capable when any of these signals are missing or unreliable We introduce ULTRA — an all-in-one controller for unified humanoid loco-manipulation 🤖 It supports: • general reference tracking • sparse goal following • execution with motion capture • execution with egocentric perception 🔗 Project page: ultra-humanoid.github.io

English

570

Runpei Dong@RunpeiDong·21 Şub

@builds_robots Thanks!

English

PRB@builds_robots·20 Şub

@RunpeiDong congrats!

English

Runpei Dong@RunpeiDong·19 Şub

English

1.9K

Runpei Dong retweetledi

Yuanhang Zhang@Yuanhang__Zhang·4 Şub

Robust humanoid perceptive locomotion is still underexplored. Especially when different cameras see different terrains, paths get narrow, and payloads disturb balance... Introduce RPL, tackling this with one unified policy: • Challenging terrains (slopes, stairs and stepping stones); • Multiple directions; • Payloads; Trained in sim. Validated long-horizon in the real world. Watch the robot walk it all🦿 Details below👇

English

277

57.5K

Runpei Dong@RunpeiDong·4 Kas

Thrilled to share our work AlphaOne🔥 at @emnlpmeeting 2025, @jyzhang1208 and I will be presenting this work online, and please feel free to join and talk to us!!! 📆Date: 8:00-9:00, Nov 7, Friday (Beijing Standard Time, UTC+8) 📺Session: Gather Session 4

Junyu Zhang@jyzhang1208

💥Excited to share our paper “AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time” at #EMNLP2025 🚀 this Friday, Nov. 7, during Gather Session 4. Come say hi virtually!👋 📄Paper: arxiv.org/pdf/2505.24863 🪩Website & Code: alphaone-project.github.io #AI #LLMs #Reasoning

English

805

Runpei Dong retweetledi

Pieter Abbeel@pabbeel·8 Eki

ResMimic: learns a whole-body loco-manipulation policy on top of general motion tracking a policy Key ideas: (i) pre-train general motion tracking (ii) post-train task-specific residual policy with: (a) object tracking reward (b) contact reward (c) virtual object force curriculum

Siheng Zhao@SihengZhao

ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.

English

200

24.7K

Runpei Dong retweetledi

Zhen Wu@zhenkirito123·1 Eki

Humanoid motion tracking performance is greatly determined by retargeting quality! Introducing 𝗢𝗺𝗻𝗶𝗥𝗲𝘁𝗮𝗿𝗴𝗲𝘁🎯, generating high-quality interaction-preserving data from human motions for learning complex humanoid skills with 𝗺𝗶𝗻𝗶𝗺𝗮𝗹 RL: - 5 rewards, - 4 DR terms, - Proprio. ONLY, - NO history/curriculum. Ready for agile, human-like 🤖? (Best with 🎧) 🔗 omniretarget.github.io 🎥 1/9