Zhenyang Chen

32 posts

Zhenyang Chen

Zhenyang Chen

@DanielZhenyang

Robotics Ph.D at Georgia Tech

Katılım Aralık 2021
430 Takip Edilen114 Takipçiler
Zhenyang Chen retweetledi
Tairan He
Tairan He@TairanHe99·
GR00T-VisualSim2Real is now open source! VIRAL and DoorMan are now available with training code, simulation assets, and the full recipe for bringing visual sim-to-real loco-manipulation skills to your own humanoids. Repo: github.com/NVlabs/GR00T-V…
Tairan He@TairanHe99

Zero teleoperation. Zero real-world data. ➔ Autonomous humanoid loco-manipulation in reality. Introducing VIRAL: Visual Sim-to-Real at Scale. We achieved 54 autonomous cycles (walk, stand, place, pick, turn) using a simple recipe: 1. RL 2. Simulation 3. GPUs Website: viral-humanoid.github.io Arxiv: arxiv.org/abs/2511.15200 Deep dive with me: 🧵

English
5
93
592
89K
Zhenyang Chen
Zhenyang Chen@DanielZhenyang·
when you ask Codex to optimize doc in codebase, this is what happened: it first happily deleted the old CLAUDE.md and write this 😂😭 #claude #codex #ai #code #robotics
Zhenyang Chen tweet media
English
0
0
0
57
Zhenyang Chen retweetledi
Saining Xie
Saining Xie@sainingxie·
vision🍌 is here vision-banana.github.io if you got into computer vision the way I did, starting with pixel-level labeling tasks like segmentation, edges, depth, or surface normals, you’ll probably feel the same seeing these results -- something big has quietly shifted, and it’s going to change how we approach these problems for good 🧵
English
11
112
784
62.7K
Zhenyang Chen retweetledi
Jeff Bezos
Jeff Bezos@JeffBezos·
ZXX
11.5K
14.7K
205.3K
18.9M
Zhenyang Chen retweetledi
Younghyo Park
Younghyo Park@younghyo_park·
What's different between these two BC policies? It's the same architecture, training budget, and data collection setup — the only difference is the controller gains! Controller gains are an understudied design parameter in robot learning. In our new work (w/ @BronarsToni*, @pulkitology), we show how they act as an inductive bias across BC, RL, and Sim2Real transfer, with real consequences on performance. Here's what we found 🧵 * Equal Contribution 📄arxiv: arxiv.org/abs/2604.02523 🔗website: younghyopark.me/tune-to-learn/
English
7
78
480
149.9K
Zhenyang Chen retweetledi
Patrick OShaughnessy
Patrick OShaughnessy@patrick_oshag·
My conversation with Sergey Levine (@svlevine). Sergey is the co-founder of @physical_int -- a company building foundation models that can control any robot to do any task in any environment. The company's thesis is that generality is more scalable than specialization, meaning that a model trained across many different robots and tasks will ultimately outperform any system built to do one thing well (eg, just wash dishes). Sergey is a researcher by background, but I think you will appreciate how practical and commercially grounded this conversation is. We discuss: - Why changing a diaper will be the last task a robot masters - The simulation v. real-world data debate - How multimodal LLMs give robots common sense - Moravec's Paradox + Robot Olympics - Why robots can do long-horizon tasks now - A realistic timeline for robots in our homes I should note that I am an investor in Physical Intelligence -- I made the investment because I believe it is one of the most important companies tackling the problem of robotics. Enjoy! Timestamps: 0:00 Intro 2:39 Defining Physical Intelligence 5:19 The Challenge of Building General Models 6:34 The Stakes and Future of General Purpose Robotics 8:15 Pros and Cons of Humanoid Robots 10:12 Historical Milestones in Robotics Research 15:31 Combining Generative AI and Deep RL 21:24 Moravec's Paradox 25:33 Kitchen Robots 29:30 Simulation vs. Real-World Data 30:48 The Robot Olympics 36:31 The Physiological Reality of Embodiment 38:56 Controversies in the Robotics Community 44:18 What Makes a Great Researcher 48:27 How Businesses Should Prepare for Robotics 54:09 Tracking Progress Through Research Papers 57:02 The Next Step: Mid-Level Reasoning 1:02:00 The Kindest Thing
English
21
63
577
133.2K
Zhenyang Chen retweetledi
Abhishek Gupta
Abhishek Gupta@abhishekunique7·
Excited to share the project that has surprised me the most in the last year! Large-scale RL in simulation, no demos and no reward engineering can solve dynamic, dexterous and contact rich tasks. The learned behaviors are reactive, forceful and use the environment for recovery in ways that are extremely challenging to bake in or teleoperate! You can play with the policies yourself to see: weirdlabuw.github.io/omnireset/ And, the learned behavior transfers to real world robots from RGB camera inputs! So what’s the trick - using simulator resets carefully! Let’s unpack (1/10)
English
17
91
614
81.2K
Zhenyang Chen retweetledi
Patrick Yin
Patrick Yin@patrickhyin·
We’re releasing OmniReset, a framework for training robot policies using large-scale RL and diverse resets for contact-rich, dexterous manipulation. OmniReset pushes the frontier of robustness and dexterity, without any reward engineering or demonstrations. Try the policies yourself in our interactive simulator! weirdlabuw.github.io/omnireset/ (1/N 🧵)
English
21
95
468
107.5K
Zhenyang Chen retweetledi
Jacob Zietek
Jacob Zietek@JacobZietek·
Robotics has spent decades optimizing for research. Deployment requires a completely different kind of person: operators, industrialists, and outsiders the field typically ignores. There's a wave of people who want to build in robotics. The field doesn't know what to do with them. New essay, Robotics Needs Fewer Roboticists* below 👇
English
44
33
403
71.9K
Zhenyang Chen retweetledi
Lucas Maes
Lucas Maes@lucasmaes_·
JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: le-wm.github.io
English
107
561
3.9K
933.9K
Zhenyang Chen
Zhenyang Chen@DanielZhenyang·
Like this work a lot. Whole body mobile manipulation is hard. Demos HoMMI showing and design choices they made are interesting. We are also pushing towards this direction. Stay tuned
Xiaomeng Xu@XiaomengXu11

Can we learn whole-body mobile manipulation directly from human demonstrations? Introducing Whole-Body Mobile Manipulation Interface (HoMMI) Egocentric + UMI, 0 teleop -> bimanual & whole-body manipulation, long-horizon navigation, active perception hommi-robot.github.io

English
0
7
13
1.9K
Zhenyang Chen
Zhenyang Chen@DanielZhenyang·
This is how we should evaluate robotics system papers: not as isolated components, but as integrated systems. What matters is how components interact—and how those interactions unlock new capabilities.
Guanya Shi@GuanyaShi

I’m so tired of writing rebuttals to this kind of “lack of novelty” review: “This paper trivially combines A, B, and C, so the algorithmic novelty is limited.” Technically, most (if not all) robotics papers are convex combinations of existing ideas. I still deeply appreciate A+B+C papers—especially when they deliver: - New capabilities: the “trivial combination” unlocks behaviors we simply couldn’t achieve before - Sensible & organic design: A+B+C is clearly the right composition—not some arbitrary A′+B+C′ - Nontrivial interactions: careful analysis of the dynamics, coupling, or failure modes between A, B, C - Rehabilitating old ideas: A was dismissed for years, but paired with modern B/C, it suddenly works—and teaches us why - System-level & "interface" insight: the contribution is not any single piece, but how the pieces talk to each other - Scaling laws or regimes: identifying when/why A+B+C works (and when it doesn’t) - Engineering clarity: making something actually work robustly in the real world is not “trivial” - New problem formulations: sometimes the real novelty is in the reformulation—only under this view does A+B+C make sense. Maybe worth keeping these in mind when reviewing the next A+B+C paper : )

English
0
0
2
128
Zhenyang Chen retweetledi
Danfei Xu
Danfei Xu@danfei_xu·
Introducing EgoVerse: an ecosystem for robot learning from egocentric human data. Built and tested by 4 research labs + 3 industry partners, EgoVerse enables both science and scaling 1300+ hrs, 240 scenes, 2000+ tasks, and growing Dataset design, findings, and ecosystem 🧵
English
34
158
855
251.6K
Zhenyang Chen retweetledi
Yuke Zhu
Yuke Zhu@yukez·
Today, we publicly released RoboCasa365, a large-scale simulation benchmark for training and systematically evaluating generalist robot models. Built upon our original RoboCasa framework, it offers: • 2,500 realistic kitchen environments; • 365 everyday tasks (basic skills + long-horizon mobile manipulation); • Over 3,200 objects with many articulated fixtures/appliances. All are designed for fully controlled, reproducible benchmarking of robotic policies. Progress in robotic foundation models is real. But it’s still hard to answer basic questions like: How close are we to general-purpose autonomy? What factors drive generalization? What are the model/data scaling curves like? Real-world eval is slow and noisy, and existing sims (like LIBERO, which we built 3 years ago) often lack sufficient task and scene diversity. This benchmark comes with 2,200+ hours of demonstrations and 500K+ trajectories to support studies of multi-task training, pretraining, and continual learning at scale. Check it out at robocasa.ai
English
13
63
338
22.1K
Zhenyang Chen
Zhenyang Chen@DanielZhenyang·
With all the effort community put in humanoid robot hardware and learning, we will see a smaller and smaller embodiment gap between human and robot. And human data sacling will shine ✨
Ruijie Zheng@ruijie_zheng12

Proud to introduce EgoScale: We pretrained a GR00T VLA model on 20K+ hours of egocentric human video and discovered that robot dexterity can be scaled, not with more robots, but with more human data. A thread on 🧵what we learned. 👇

English
0
0
4
182
Zhenyang Chen retweetledi
Danfei Xu
Danfei Xu@danfei_xu·
1/ Ever wonder why many VLA demos look smooth only after 5-10× video speed-up? Running VLAs in real time (or faster) is not one problem. It’s several tightly coupled ones. Here is a "mini literature survey" thread on recent papers (RTC, SAIL, VLASH) in this paradigm.
Danfei Xu tweet media
English
6
75
519
33.5K
Zhenyang Chen
Zhenyang Chen@DanielZhenyang·
@IliaLarchenko very cool work and congrats on the 1st place. You mentioned training smaller model from scratch failed, have you tried to initialize random weight for pi05? And what other smaller models have you guys tried?
English
1
0
0
19
Ilia
Ilia@IliaLarchenko·
- BEHAVIOR has a fixed set of 50 tasks. We do not need to generalize to new text prompts, so we removed text entirely and replaced it with 50 trainable task embeddings (one per task). - The training dataset contained multiple modalities (RGB, depth, segmentation) as well as extra subtask annotations, but we stuck to the simple approach: RGB images + robot state only. - We predict 30-step action chunks (1s) and use delta actions with per-timestamp normalization.
English
2
0
8
1.1K
Ilia
Ilia@IliaLarchenko·
A couple of days ago, I presented our 1st place solution for the 2025 BEHAVIOR Challenge at @NeurIPSConf . Now, we've open-sourced our solution: code, model weights, and a detailed tech report. Let me unpack what we did 👇
English
8
34
214
27.5K