Yuke Zhu

368 posts

Yuke Zhu banner
Yuke Zhu

Yuke Zhu

@yukez

Associate Professor @UTCompSci | Director @NVIDIAAI Co-Leading GEAR | CS PhD @Stanford | Building generalist robot autonomy in the wild | Opinions are my own

Austin, TX Katılım Ağustos 2008
478 Takip Edilen22.1K Takipçiler
Yuke Zhu
Yuke Zhu@yukez·
I will be in Vienna in two weeks to give a keynote at #ICRA2026. I'll share our recent progress on building generalist humanoid robots and show some of the latest results. Check out my talk on June 3: 2026.ieee-icra.org/program/keynot…
Yuke Zhu tweet media
English
3
5
89
5K
Yuke Zhu retweetledi
Fernando Castañeda
Fernando Castañeda@FerCastanedaGR·
Now you can use GR00T N1.7 and SONIC together to enable tasks that require TRUE whole-body coordination!! Including simultaneous precise hand and foot placement, like opening a trash can with the foot pedal and throwing an object inside! Try it yourself, it is so fun!
Zhengyi “Zen” Luo@zhengyiluo

Open-sourcing the whole package here! The last piece of our SONIC open-source, data collection, gr00t VLA post-training, inference just hit the repo! Train your Autonomous policies on G1 Whole-body with SONIC and gr00t N1.7! 🧑‍💻Code: github.com/NVlabs/GR00T-W… 📑Docs: nvlabs.github.io/GR00T-WholeBod…

English
0
10
35
8.6K
Yuke Zhu retweetledi
Tairan He
Tairan He@TairanHe99·
GR00T-VisualSim2Real is now open source! VIRAL and DoorMan are now available with training code, simulation assets, and the full recipe for bringing visual sim-to-real loco-manipulation skills to your own humanoids. Repo: github.com/NVlabs/GR00T-V…
Tairan He@TairanHe99

Zero teleoperation. Zero real-world data. ➔ Autonomous humanoid loco-manipulation in reality. Introducing VIRAL: Visual Sim-to-Real at Scale. We achieved 54 autonomous cycles (walk, stand, place, pick, turn) using a simple recipe: 1. RL 2. Simulation 3. GPUs Website: viral-humanoid.github.io Arxiv: arxiv.org/abs/2511.15200 Deep dive with me: 🧵

English
6
98
615
114.2K
Yuke Zhu retweetledi
tingwu.wang
tingwu.wang@TingwuWang·
What is missing to bring real-time motion research into AAA games and real-world robotics? We present MotionBricks, a step toward bridging this gap with two key components: - a single generative latent motion backbone covering 350,000+ motion skills, running at 15,000 FPS with 2 ms latency and substantially improved quality and reliability. - a unified smart primitive interface for locomotion, object / scene interaction, with fine-grained control over generated behaviors. Webpage: nvlabs.github.io/motionbricks/ Code: github.com/NVlabs/GR00T-W… Paper: arxiv.org/abs/2604.24833 (ACM TOG / SIGGRAPH 2026)
English
27
150
1.2K
150.2K
Yuke Zhu retweetledi
Yu Lei
Yu Lei@_OutofMemory_·
🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM). But why does it work? How can we improve it further? Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed a rigorous mechanistic analysis, drawing on theoretical insights and multi-layered experiments. 😮Key insight: it’s all about representations. - Alignment → enables transfer - Discernibility → enables adaptation ⚖️Both are necessary — it's better to have more aligned representations, but the model must be able to discern the domains. We term this as structured representation alignment. ⬇️Let’s take a deep dive into that: Paper: arxiv.org/pdf/2604.13645 Website: science-of-co-training.github.io
Yu Lei tweet media
English
5
66
385
61.7K
Yuke Zhu retweetledi
Max Fu
Max Fu@letian_fu·
Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability. From @NVIDIA @Berkeley_AI @CMU_Robotics @StanfordAILab capgym.github.io 🧵
English
20
127
633
165.8K
Yuke Zhu retweetledi
Huihan Liu
Huihan Liu@huihan_liu·
Catastrophic forgetting has long been a challenge in continual learning. However, our new study found that pretrained Vision-Language-Action (VLA) models are surprisingly resistant to forgetting! Zero forgetting, or even positive backward transfer, is possible with simple experience replay. arxiv.org/abs/2603.03818
Huihan Liu tweet media
English
17
108
668
53.3K
Yuke Zhu
Yuke Zhu@yukez·
Today, we publicly released RoboCasa365, a large-scale simulation benchmark for training and systematically evaluating generalist robot models. Built upon our original RoboCasa framework, it offers: • 2,500 realistic kitchen environments; • 365 everyday tasks (basic skills + long-horizon mobile manipulation); • Over 3,200 objects with many articulated fixtures/appliances. All are designed for fully controlled, reproducible benchmarking of robotic policies. Progress in robotic foundation models is real. But it’s still hard to answer basic questions like: How close are we to general-purpose autonomy? What factors drive generalization? What are the model/data scaling curves like? Real-world eval is slow and noisy, and existing sims (like LIBERO, which we built 3 years ago) often lack sufficient task and scene diversity. This benchmark comes with 2,200+ hours of demonstrations and 500K+ trajectories to support studies of multi-task training, pretraining, and continual learning at scale. Check it out at robocasa.ai
English
13
63
339
23.3K
Yuke Zhu
Yuke Zhu@yukez·
CoRL is coming to Austin, TX this November! As General Chair, I'm thrilled to welcome the robot learning community. 2026 feels like a pivotal year as AI-powered robotic systems begin deploying at scale for real-world tasks. This year, I hope CoRL will be the forum that connects cutting-edge research with industrial practice. Please submit your best work and join us in Austin. DM me what you'd love to see CoRL do better! corl.org
Conference on Robot Learning@corl_conf

Calling all researchers! 🤖The CoRL 2026 website is officially live at corl.org with key dates for your submissions: 🗓 May 25: Abstract Submission 🗓 May 28: Full Paper Submission 🗓 Nov 9-12: Conference in Austin, TX Send us your coolest work! #RobotLearning

English
5
7
166
15.7K
Yuke Zhu retweetledi
Jim Fan
Jim Fan@DrJimFan·
We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate. Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution. Our recipe is called "EgoScale": - Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks. - Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency. - Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone. The scalable path to robot dexterity was never more robots. It was always us. Deep dives in thread:
English
149
282
1.7K
281.4K
Yuke Zhu retweetledi
Jim Fan
Jim Fan@DrJimFan·
Announcing DreamDojo: our open-source, interactive world model that takes robot motor controls and generates the future in pixels. No engine, no meshes, no hand-authored dynamics. It's Simulation 2.0. Time for robotics to take the bitter lesson pill. Real-world robot learning is bottlenecked by time, wear, safety, and resets. If we want Physical AI to move at pretraining speed, we need a simulator that adapts to pretraining scale with as little human engineering as possible. Our key insights: (1) human egocentric videos are a scalable source of first-person physics; (2) latent actions make them "robot-readable" across different hardware; (3) real-time inference unlocks live teleop, policy eval, and test-time planning *inside* a dream. We pre-train on 44K hours of human videos: cheap, abundant, and collected with zero robot-in-the-loop. Humans have already explored the combinatorics: we grasp, pour, fold, assemble, fail, retry—across cluttered scenes, shifting viewpoints, changing light, and hour-long task chains—at a scale no robot fleet could match. The missing piece: these videos have no action labels. So we introduce latent actions: a unified representation inferred directly from videos that captures "what changed between world states" without knowing the underlying hardware. This lets us train on any first-person video as if it came with motor commands attached. As a result, DreamDojo generalizes zero-shot to objects and environments never seen in any robot training set, because humans saw them first. Next, we post-train onto each robot to fit its specific hardware. Think of it as separating "how the world looks and behaves" from "how this particular robot actuates." The base model follows the general physical rules, then "snaps onto" the robot's unique mechanics. It's kind of like loading a new character and scene assets into Unreal Engine, but done through gradient descent and generalizes far beyond the post-training dataset. A world simulator is only useful if it runs fast enough to close the loop. We train a real-time version of DreamDojo that runs at 10 FPS, stable for over a minute of continuous rollout. This unlocks exciting possibilities: - Live teleoperation *inside* a dream. Connect a VR controller, stream actions into DreamDojo, and teleop a virtual robot in real time. We demo this on Unitree G1 with a PICO headset and one RTX 5090. - Policy evaluation. You can benchmark a policy checkpoint in DreamDojo instead of the real world. The simulated success rates strongly correlate with real-world results - accurate enough to rank checkpoints without burning a single motor. - Model-based planning. Sample multiple action proposals → simulate them all in parallel → pick the best future. Gains +17% real-world success out of the box on a fruit packing task. We open-source everything!! Weights, code, post-training dataset, eval set, and whitepaper with tons of details to reproduce. DreamDojo is based on NVIDIA Cosmos, which is open-weight too. 2026 is the year of World Models for physical AI. We want you to build with us. Happy scaling! Links in thread:
English
81
177
1.2K
207.4K
Yuke Zhu
Yuke Zhu@yukez·
We have seen rapid progress in humanoid control — specialist robots can reliably generate agile, acrobatic, but preset motions. Our singular focus this year: putting generalist humanoids to do real work. To progress toward this goal, we developed SONIC (nvlabs.github.io/GEAR-SONIC/), a Behavior Foundation Model for real-time, whole-body motion generation that supports teleoperation and VLA inference for loco-manipulation. Today, we’re open-sourcing SONIC on GitHub. We are excited to see what the community builds upon SONIC and to collectively push humanoid intelligence toward real-world deployment at scale. 🌐 Paper: arxiv.org/abs/2511.07820 📃 Code: github.com/NVlabs/GR00T-W…
English
11
67
351
66.6K
Yuke Zhu retweetledi
Zi-ang Cao
Zi-ang Cao@ziang_cao·
🚀 Introducing CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation! Current humanoids face a trade-off: they are either Agile & Stiff OR Slow & Soft. CHIP breaks this barrier. We enable on-the-fly switching between Compliant (wiping 🧼, collaborative holding 📦) and Stiff (lifting dumbbells 🏋️, opening doors 🚪💪) behaviors—all while maintaining agile skills like running! 🏃💨 Website: nvlabs.github.io/CHIP/ Join me for a deep dive on how CHIP enables adaptive control for complex tasks. 🧵↓
English
10
51
215
25.2K
Yuke Zhu
Yuke Zhu@yukez·
📢 New paper from GEAR team @NVIDIARobotics We released DreamZero, a World Action Model that turns video world models into zero-shot robot policies. Built on a pretrained video diffusion backbone, it jointly predicts future video frames and actions. 🌐 dreamzero0.github.io
Joel Jang@jang_yoel

Introducing DreamZero 🤖🌎 from @nvidia > A 14B “World Action Model” that achieves zero-shot generalization to unseen tasks & few-shot adaptation to new robots > The key? Jointly predicting video & actions in the same diffusion forward pass Project Page: dreamzero0.github.io 🧵 (1/10)

English
0
8
102
8.4K
Yuke Zhu retweetledi
Haoru Xue ✈️ CVPR
Haoru Xue ✈️ CVPR@HaoruXue·
Reality of robotics: humanoid kung fu is solved before they can open doors with RGB. Here we are. Introducing the frontier of sim2real at NVIDIA GEAR. 100% sim data. RGB input only. Code name: 𝗗𝗼𝗼𝗿𝗠𝗮𝗻. We are opening the sim-to-real door. doorman-humanoid.github.io 🧵
English
14
81
513
364.9K
Yuke Zhu
Yuke Zhu@yukez·
Robustness is key for real-world robot deployment — and RL is key to robustness. Proud of our work scaling GPU-based simulations and vision-based sim2real to build robust policies for humanoid loco-manipulation tasks.
Tairan He@TairanHe99

Zero teleoperation. Zero real-world data. ➔ Autonomous humanoid loco-manipulation in reality. Introducing VIRAL: Visual Sim-to-Real at Scale. We achieved 54 autonomous cycles (walk, stand, place, pick, turn) using a simple recipe: 1. RL 2. Simulation 3. GPUs Website: viral-humanoid.github.io Arxiv: arxiv.org/abs/2511.15200 Deep dive with me: 🧵

English
3
10
74
12.1K