Runpei Dong

115 posts

Runpei Dong banner
Runpei Dong

Runpei Dong

@RunpeiDong

CS PhD student @UofIllinois | Previously @Tsinghua_IIIS and XJTU | Interested in robot learning & machine learning

Champaign, IL Katılım Nisan 2020
1.4K Takip Edilen427 Takipçiler
Sabitlenmiş Tweet
Runpei Dong
Runpei Dong@RunpeiDong·
Want to ask your humanoid to get novel objects you want from a novel table? We introduce HERO, first achieving open-vocabulary visual loco-manipulation using human language queries! Now we can ask the humanoid to see and grasp the target object (eg, a carrot instead of a tissue box) with whole-body coordination. How do we do it? Key designs and findings in this work: We propose a residual-aware end-effector tracking policy that tracks the target accurately in a closed loop. We find that the robot's forward kinematics are quite inaccurate, and thus we propose residual neural forward models that correct the end-effector FK and base leg odometry. We design a modular system powered by visual foundation models that achieves generalizable grasping capabilities with 83% success rate on novel daily objects and daily scenes. Check out our new project: Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation Project page: hero-humanoid.github.io Paper: arxiv.org/pdf/2602.16705
English
2
5
20
1.9K
Runpei Dong retweetledi
Tairan He
Tairan He@TairanHe99·
GR00T-VisualSim2Real is now open source! VIRAL and DoorMan are now available with training code, simulation assets, and the full recipe for bringing visual sim-to-real loco-manipulation skills to your own humanoids. Repo: github.com/NVlabs/GR00T-V…
Tairan He@TairanHe99

Zero teleoperation. Zero real-world data. ➔ Autonomous humanoid loco-manipulation in reality. Introducing VIRAL: Visual Sim-to-Real at Scale. We achieved 54 autonomous cycles (walk, stand, place, pick, turn) using a simple recipe: 1. RL 2. Simulation 3. GPUs Website: viral-humanoid.github.io Arxiv: arxiv.org/abs/2511.15200 Deep dive with me: 🧵

English
6
96
601
96.1K
Runpei Dong retweetledi
Peter Stone
Peter Stone@PeterStone_TX·
I'm super excited and proud about this result. A fantastic team at Sony AI, led by Peter Duerr, put together just the right mix of science and engineering to create the first robot to beat a professional athlete in a real-world competitive sport!
Peter Stone tweet media
English
8
13
118
4.6K
Runpei Dong retweetledi
Yaru Niu
Yaru Niu@yaru_niu·
A touch-aware humanoid manipulation policy that cleans the lab for you🧹🧪 Introducing Humanoid Touch Dream: a real-world system for dexterous, contact-rich humanoid loco-manipulation. Our key idea is simple: the policy predicts future hand forces and tactile latents alongside actions, within a single-stage training framework. humanoid-touch-dream.github.io 1/7
English
12
56
277
68.7K
Runpei Dong retweetledi
Shaowei Liu
Shaowei Liu@stevenpg8·
📢MoRight: Motion Control Done Right "What if your video model actually understood cause and effect?" Existing motion-controlled video models entangle camera and object motion, and treat everything as kinematic displacement. MoRight changes both. 🔥 Motion Causality — MoRight decomposes motion into actions & consequences. Give an action → MoRight predicts consequences (aka motion simulation) . Give a desired outcome → MoRight recovers the driving action (aka motion planning). Not merely displacing pixels. 🎬 Disentangled Control — MoRight separates camera and object motion, allowing users to independently control each of them. No entanglement. Project Page: research.nvidia.com/labs/sil/proje… Paper: arxiv.org/abs/2604.07348
English
4
34
234
30.9K
Runpei Dong retweetledi
Yining Hong
Yining Hong@yining_hong·
I wrote a blog "Three Levels of TTT" — Test-Time Training, Meta Training, World Models, 3D & Self-Supervised Learning: evelinehong.github.io/ttt_three_leve… The three levels are: 🧠 Episode — hippocampus encodes fast, neocortex consolidates slow. No labels needed. 🌱 Individual Lifetime — there is no train/test split. Every minute is testing as well as training. 🌍 Natural Selection & Evolution — continuous adaptation integrates into the species' prior. Each level is the meta-training of the level below. Each level is the test-time training of the level above. Priors flow down to the lower level; consolidated adaptations flow up to the next higher level. 🧠 The self-supervised signal needs no labels — it comes from the structure of experience itself: what did I expect vs. what happened? What follows what? What appears together? 🌱 This consolidates across a lifetime — every minute is testing as well as training. Given the priors of the human species, an infant develops 3D perception, object permanence, intuitive physics — not from instruction, but from reaching, crawling, acting. The world teaches the rest through self-supervised learning. 🌍 But what gives us those priors? Two front-facing eyes, exactly the right distance apart for depth to emerge. Pain and proprioception as free error signals. A face-detection circuit running at birth. Billions of years of test-time feedback from individual lives, accumulated and frozen into hardware. Evolution doesn't optimize behavior — it optimizes the prior you start from.
Yining Hong tweet media
English
7
38
266
18.5K
Runpei Dong retweetledi
Siyuan Huang
Siyuan Huang@siyuanhuang95·
Excited to introduce OmniClone, a robust teleoperation system for humanoid mobile manipulation. While systems like TWIST2 and SONIC paved the way, we put efforts into solving the critical stability and scaling gaps. 1/ 📊 Moving past "vibe-based" testing. We’ve built a comprehensive diagnostic benchmark to systematically evaluate whole-body teleoperation. No more trial-and-error—get the actionable insights needed for true policy optimization. 2/ 👤 Universal Human-to-Robot Mapping. Teleop often breaks when switching operators. OmniClone mitigates biases from hardware fluctuations and, crucially, diverse human body shapes, ensuring high-stability control regardless of the person in the suit. 3/ 🚀 System Optimizations for Whole-body Manipulation Policy. By optimizing for affordability and reproducibility, OmniClone provides the high-fidelity pipeline necessary to collect data and train humanoid whole-body policies at scale. fully The model checkpoints and deploy code are now fully released—welcome to play with it! 📦 📄 Paper: arxiv.org/abs/2603.14327 🌐 Project: omniclone.github.io 💻 Code: github.com/yixxuan-li/Omn…
English
4
28
137
13.5K
Runpei Dong retweetledi
Shuran Song
Shuran Song@SongShuran·
Turning a behavior "Prior" to a high-performing "Pro" in hours⚡️with DICE-RL (Distribution Contractive RL Finetuning)💡 Check out @s_zhanyi 's 🧵 for the secret source 😉
Zhanyi Sun@s_zhanyi

We find that RL post-training can substantially improve BC policies without teaching them anything fundamentally new. So what is RL doing? In DICE-RL, it contracts a broad behavior prior toward high-value modes. (1/n) zhanyisun.github.io/dice.rl.2026/

English
1
9
66
10.9K
Runpei Dong retweetledi
Zhikai Zhang
Zhikai Zhang@Zhikai273·
🎾Introducing LATENT: Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data Dynamic movements, agile whole-body coordination, and rapid reactions. A step toward athletic humanoid sports skills. Project: zzk273.github.io/LATENT/ Code: github.com/GalaxyGeneralR…
English
162
637
4.1K
1.4M
Runpei Dong
Runpei Dong@RunpeiDong·
Humanoid visual loco-manipulation is extremely challenging, especially for tasks like apple pealing. This is a very impressive result, congrats!
Sharpa@SharpaRobotics

We believe we’re the first robotics company to demonstrate a robot peeling an apple with dual dexterous human-like hands. This breakthrough closes a key gap in robotics, achieving bimanual, contact-rich manipulation and moving far beyond the limits of simple grippers. 🧵↓ Today’s AI models (VLMs) are excellent at perception but struggle with action. Controlling high-degree-of-freedom hands for tasks like this is incredibly complex, and precise finger-level teleoperation is nearly impossible for humans. Our first step was a shared-autonomy system: rather than controlling every finger, the operator triggers pre-learned skills like a “rotate apple or tennis ball” primitive via a keyboard press or pedal. This makes scalable data collection and RL training possible. How does the AI manage this? We created "MoDE-VLA" (Mixture of Dexterous Experts). It fuses vision, language, force, and touch data by using a team of specialist "experts," making control in high-dimensional spaces stable and effective. The combination of these two innovations allows for seamless, contact-rich manipulation. The human provides high-level guidance, and the robot executes the complex in-hand coordination required. This work paves the way for robots that can safely handle delicate tasks in human environments. Want the full technical details? 📄 Read the full research paper: arxiv.org/abs/2603.08122 Visit us at NVIDIA GTC Booth #1838, Hall 3 to learn more! #Robotics #AI #DexterousManipulation #VLA #NVIDIAGTC @nvdia @nvidiagtc

English
1
0
1
91
Runpei Dong retweetledi
Yunzhu Li
Yunzhu Li@YunzhuLiYZ·
For a long time, I was skeptical about action-conditioned video prediction for robotics. Many models look impressive, but once you ask them to handle long-horizon manipulation with real physical interaction, things quickly fall apart (e.g., Genie is amazing but mostly focused on navigation). This project changed my mind. I'm beyond excited to share Interactive World Simulator, a project we have been working on for the past ~1.5 years 🤖 One of the first world models that produces convincing results for long-horizon robotic manipulation involving complex physical interactions, across a diverse range of objects (rigid objects, deformables, ropes, object piles). It directly unlocks scalable data generation for robotic policy training and policy evaluation. Try it yourself (no installation needed): yixuanwang.me/interactive_wo… Play directly with the simulator in your browser. Key Takeaways: 1️⃣ 15 Hz long-horizon action-conditioned video prediction for 10+ minutes on a single RTX 4090 GPU 2️⃣ Visual and dynamic fidelity: people often ask how much sim data equals one real data point. In our experiments, it turns out to be close to one-to-one using the Interactive World Simulator 3️⃣ Stress testing matters: we emphasize interactive stress testing to understand robustness and stability and to build trust in the simulator 4️⃣ The model is trained with only ~6 hours of real-world random interaction data on a single GPU. Imagine what happens if we scale this 1000× or even 1M× Huge credit to @YXWangBot, who led this effort with countless hours of work on data collection, training recipes, and system design. I'm incredibly proud of the work he did here! Enjoy the demos and videos. We also fully open-sourced the codebase for anyone interested in applying this to their own tasks. #Robotics #RobotLearning #WorldModels #EmbodiedAI
Yixuan Wang@YXWangBot

1/ World models are getting popular in robotics 🤖✨ But there’s a big problem: most are slow and break physical consistency over long horizons. 2/ Today we’re releasing Interactive World Simulator: An action-conditioned world model that supports stable long-horizon interaction. 3/ Key result: ✅ 10+ minutes of interactive prediction ✅ 15 FPS ✅ on a single RTX 4090🔥 4/ Why this matters: it unlocks two critical robotics applications: 🚀 Scalable data generation for policy training 🧪 Faithful policy evaluation 5/ You can play with our world model NOW at #interactive-demo" target="_blank" rel="nofollow noopener">yixuanwang.me/interactive_wo…. NO git clone, NO pip install, NO python. Just click and play! NOTE ⚠️ ALL videos here are generated purely by our model in pixel space! They are **NOT** from a real camera More details coming 👇 (1/9) #Robotics #AI #MachineLearning #WorldModels #RobotLearning #ImitationLearning

English
2
52
368
75.3K
Runpei Dong retweetledi
Y Combinator
Y Combinator@ycombinator·
Origami Robotics is building high-DOF robotic hands with in-joint motors and a co-designed data-collection glove to eliminate the embodiment gap by collecting high-quality, real-world data at scale. Congrats on the launch, @DanielXieee and @QuanliangX! ycombinator.com/launches/Pcl-o…
English
29
44
299
86.9K
Runpei Dong
Runpei Dong@RunpeiDong·
Want to ask your humanoid to get novel objects you want from a novel table? We introduce HERO, first achieving open-vocabulary visual loco-manipulation using human language queries! Now we can ask the humanoid to see and grasp the target object (eg, a carrot instead of a tissue box) with whole-body coordination. How do we do it? Key designs and findings in this work: We propose a residual-aware end-effector tracking policy that tracks the target accurately in a closed loop. We find that the robot's forward kinematics are quite inaccurate, and thus we propose residual neural forward models that correct the end-effector FK and base leg odometry. We design a modular system powered by visual foundation models that achieves generalizable grasping capabilities with 83% success rate on novel daily objects and daily scenes. Check out our new project: Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation Project page: hero-humanoid.github.io Paper: arxiv.org/pdf/2602.16705
English
2
5
20
1.9K
Runpei Dong retweetledi
Yuanhang Zhang
Yuanhang Zhang@Yuanhang__Zhang·
Robust humanoid perceptive locomotion is still underexplored. Especially when different cameras see different terrains, paths get narrow, and payloads disturb balance... Introduce RPL, tackling this with one unified policy: • Challenging terrains (slopes, stairs and stepping stones); • Multiple directions; • Payloads; Trained in sim. Validated long-horizon in the real world. Watch the robot walk it all🦿 Details below👇
English
5
57
277
57.5K
Runpei Dong
Runpei Dong@RunpeiDong·
Thrilled to share our work AlphaOne🔥 at @emnlpmeeting 2025, @jyzhang1208 and I will be presenting this work online, and please feel free to join and talk to us!!! 📆Date: 8:00-9:00, Nov 7, Friday (Beijing Standard Time, UTC+8) 📺Session: Gather Session 4
Junyu Zhang@jyzhang1208

💥Excited to share our paper “AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time” at #EMNLP2025 🚀 this Friday, Nov. 7, during Gather Session 4. Come say hi virtually!👋 📄Paper: arxiv.org/pdf/2505.24863 🪩Website & Code: alphaone-project.github.io #AI #LLMs #Reasoning

English
0
1
6
805
Runpei Dong retweetledi
Pieter Abbeel
Pieter Abbeel@pabbeel·
ResMimic: learns a whole-body loco-manipulation policy on top of general motion tracking a policy Key ideas: (i) pre-train general motion tracking (ii) post-train task-specific residual policy with: (a) object tracking reward (b) contact reward (c) virtual object force curriculum
Siheng Zhao@SihengZhao

ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.

English
5
24
200
24.7K
Runpei Dong retweetledi
Zhen Wu
Zhen Wu@zhenkirito123·
Humanoid motion tracking performance is greatly determined by retargeting quality! Introducing 𝗢𝗺𝗻𝗶𝗥𝗲𝘁𝗮𝗿𝗴𝗲𝘁🎯, generating high-quality interaction-preserving data from human motions for learning complex humanoid skills with 𝗺𝗶𝗻𝗶𝗺𝗮𝗹 RL: - 5 rewards, - 4 DR terms, - Proprio. ONLY, - NO history/curriculum. Ready for agile, human-like 🤖? (Best with 🎧) 🔗 omniretarget.github.io 🎥 1/9
English
31
151
671
801K