Hanjung Kim

75 posts

Hanjung Kim

Hanjung Kim

@KimD0ing

Research Scientist Intern @nvidia GEAR | Ph.D. student @ Yonsei University | prev. @nyuuniversity

Santa Clara, CA เข้าร่วม Şubat 2023
289 กำลังติดตาม179 ผู้ติดตาม
ทวีตที่ปักหมุด
Hanjung Kim
Hanjung Kim@KimD0ing·
How can we effectively leverage human videos for robot learning by bridging the inherent embodiment gap? We introduce UniSkill, a universal skill representation, a scalable method for learning cross-embodiment skill representations from large-scale in-the-wild video data. 1/n
English
4
30
187
28.4K
Hanjung Kim รีทวีตแล้ว
Max Fu
Max Fu@letian_fu·
Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability. From @NVIDIA @Berkeley_AI @CMU_Robotics @StanfordAILab capgym.github.io 🧵
English
19
128
628
153.3K
Hanjung Kim รีทวีตแล้ว
Irmak Guzey
Irmak Guzey@irmakkguzey·
Learning from human data requires human-like hardware. Humans use their wrists constantly, but table-top manipulators lack this flexibility. We build upon RUKA and introduce RUKA-v2: a tendon-driven hand with a 2-DOF wrist and finger abduction/adduction 👋✌️
English
7
28
113
8.1K
Hanjung Kim รีทวีตแล้ว
Hanjung Kim รีทวีตแล้ว
Danfei Xu
Danfei Xu@danfei_xu·
Introducing EgoVerse: an ecosystem for robot learning from egocentric human data. Built and tested by 4 research labs + 3 industry partners, EgoVerse enables both science and scaling 1300+ hrs, 240 scenes, 2000+ tasks, and growing Dataset design, findings, and ecosystem 🧵
English
33
159
821
230.4K
Chan Hee (Luke) Song
Chan Hee (Luke) Song@luke_ch_song·
🎓I defended my PhD at @OhioState! Grateful to my advisor @ysu_nlp and all my collaborators along the way :) Excited to be starting at @nvidia (just in time for #NVIDIAGTC😆) and continuing my research on spatial intelligence in multimodal foundation models.
Chan Hee (Luke) Song tweet media
English
9
3
89
4.2K
Hanjung Kim รีทวีตแล้ว
Ruijie Zheng
Ruijie Zheng@ruijie_zheng12·
Proud to introduce EgoScale: We pretrained a GR00T VLA model on 20K+ hours of egocentric human video and discovered that robot dexterity can be scaled, not with more robots, but with more human data. A thread on 🧵what we learned. 👇
English
24
65
331
94.8K
Hanjung Kim รีทวีตแล้ว
Jim Fan
Jim Fan@DrJimFan·
We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate. Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution. Our recipe is called "EgoScale": - Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks. - Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency. - Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone. The scalable path to robot dexterity was never more robots. It was always us. Deep dives in thread:
English
148
286
1.8K
275.7K
Hanjung Kim รีทวีตแล้ว
Jim Fan
Jim Fan@DrJimFan·
Announcing DreamDojo: our open-source, interactive world model that takes robot motor controls and generates the future in pixels. No engine, no meshes, no hand-authored dynamics. It's Simulation 2.0. Time for robotics to take the bitter lesson pill. Real-world robot learning is bottlenecked by time, wear, safety, and resets. If we want Physical AI to move at pretraining speed, we need a simulator that adapts to pretraining scale with as little human engineering as possible. Our key insights: (1) human egocentric videos are a scalable source of first-person physics; (2) latent actions make them "robot-readable" across different hardware; (3) real-time inference unlocks live teleop, policy eval, and test-time planning *inside* a dream. We pre-train on 44K hours of human videos: cheap, abundant, and collected with zero robot-in-the-loop. Humans have already explored the combinatorics: we grasp, pour, fold, assemble, fail, retry—across cluttered scenes, shifting viewpoints, changing light, and hour-long task chains—at a scale no robot fleet could match. The missing piece: these videos have no action labels. So we introduce latent actions: a unified representation inferred directly from videos that captures "what changed between world states" without knowing the underlying hardware. This lets us train on any first-person video as if it came with motor commands attached. As a result, DreamDojo generalizes zero-shot to objects and environments never seen in any robot training set, because humans saw them first. Next, we post-train onto each robot to fit its specific hardware. Think of it as separating "how the world looks and behaves" from "how this particular robot actuates." The base model follows the general physical rules, then "snaps onto" the robot's unique mechanics. It's kind of like loading a new character and scene assets into Unreal Engine, but done through gradient descent and generalizes far beyond the post-training dataset. A world simulator is only useful if it runs fast enough to close the loop. We train a real-time version of DreamDojo that runs at 10 FPS, stable for over a minute of continuous rollout. This unlocks exciting possibilities: - Live teleoperation *inside* a dream. Connect a VR controller, stream actions into DreamDojo, and teleop a virtual robot in real time. We demo this on Unitree G1 with a PICO headset and one RTX 5090. - Policy evaluation. You can benchmark a policy checkpoint in DreamDojo instead of the real world. The simulated success rates strongly correlate with real-world results - accurate enough to rank checkpoints without burning a single motor. - Model-based planning. Sample multiple action proposals → simulate them all in parallel → pick the best future. Gains +17% real-world success out of the box on a fruit packing task. We open-source everything!! Weights, code, post-training dataset, eval set, and whitepaper with tons of details to reproduce. DreamDojo is based on NVIDIA Cosmos, which is open-weight too. 2026 is the year of World Models for physical AI. We want you to build with us. Happy scaling! Links in thread:
English
82
176
1.2K
204.8K
Hanjung Kim รีทวีตแล้ว
Zhengyi “Zen” Luo
Zhengyi “Zen” Luo@zhengyiluo·
SONIC is now open-source! Generalist whole-body teleoperation for EVERYONE! Our team has long been building comprehensive pipelines for whole-body control, kinematic planner, and teleoperation, and they will all be shared. This will be a continuous update; inference code + model already there, training code and gr00t integration coming soon! Code: github.com/NVlabs/GR00T-W… Docs: nvlabs.github.io/GR00T-WholeBod… Site: nvlabs.github.io/GEAR-SONIC/
English
35
202
905
210.6K
Hanjung Kim รีทวีตแล้ว
Seonghyeon Ye
Seonghyeon Ye@SeonghyeonYe·
VLAs (from VLMs) ❌ => WAMs (from Video Models) ✅ Why WAMs? 1️⃣ World Physics: VLMs know the internet, but Video Models implicitly model the physical laws essential for manipulation. 2️⃣ The "GPT Direction": VLAs are like BERT (rely heavily on task-specific post-training). WAMs are like GPT (pre-train & prompt), unlocking incredible zero-shot transfer! What I want to see in 2026: 📈 Scaling Laws: We will see much clearer scaling laws for robotics compared to VLAs. 🤝 Human-to-Robot Transfer: Unlocking massive transfer capabilities using video as a shared representation space. 🤖 Zero-Shot Mastery: Moving from short-horizon tasks to long-horizon, dexterous manipulation without task-specific demonstrations. We recently open-sourced the checkpoints, training and inference code. Dive into the research! 👇 📄 Paper: arxiv.org/abs/2602.15922 💻 Code: github.com/dreamzero0/dre… 🤗 HF: huggingface.co/GEAR-Dreams/Dr…
Seonghyeon Ye tweet media
English
5
64
517
74.7K
Hanjung Kim รีทวีตแล้ว
Siddhant Haldar
Siddhant Haldar@haldar_siddhant·
Robot foundation models are limited by costly real data, while simulation data is plentiful but visually mismatched to reality. We present Point Bridge, a method that enables zero-shot sim-to-real transfer for robot learning with minimal visual alignment. pointbridge3d.github.io
English
4
41
221
19.2K
Hanjung Kim รีทวีตแล้ว
Mahi Shafiullah 🏠🤖
Mahi Shafiullah 🏠🤖@notmahi·
Why buy a robot when you can build your own? Meet YOR, our new open-source bimanual mobile manipulator robot – built for researchers and hackers alike for only ~$10k. 🧵👇
English
7
22
171
37.3K
Hanjung Kim รีทวีตแล้ว
Jeff Cui
Jeff Cui@jeffacce·
We don't need the name of an object to pick it up; we simply need to know where it is and what it looks like. Introducing Contact-Anchored Policies (CAPs): instead of language, we explicitly condition on contacts. Our policy learns object pickup with only 16 hours of data! 🧵
English
5
28
111
12.7K
Hongsuk Benjamin Choi
Hongsuk Benjamin Choi@redstone_hong·
Some exciting takeaways in addition to Brent's post: • We show flow policies working for sim2real humanoid locomotion & motion tracking without distillation or shortcut models. • The same recipe works for both from-scratch RL and BC → RL fine-tuning for manipulation---no bells and whistles. Code will be released: github.com/amazon-far/fpo…
Brent Yi@brenthyi

New project! Flow Policy Gradients for Robot Control tldr; a simple online RL recipe for training and fine-tuning flow policies for robots co-led w/ @redstone_hong: hongsukchoi.github.io/fpo-control

English
3
22
104
9.9K
Hanjung Kim รีทวีตแล้ว
Seonghyeon Ye
Seonghyeon Ye@SeonghyeonYe·
We just gave robots "imagination," and the results are wild. 🤯 This robot wasn't trained to untie shoes or shake hands. It's never seen these tasks before. It simply "dreams" the future outcome, then acts to make it real. 🧵👇
English
4
22
83
16.2K
Hanjung Kim รีทวีตแล้ว
Joel Jang
Joel Jang@jang_yoel·
Introducing DreamZero 🤖🌎 from @nvidia > A 14B “World Action Model” that achieves zero-shot generalization to unseen tasks & few-shot adaptation to new robots > The key? Jointly predicting video & actions in the same diffusion forward pass Project Page: dreamzero0.github.io 🧵 (1/10)
English
18
49
262
59.6K
Hanjung Kim รีทวีตแล้ว
Moo Jin Kim
Moo Jin Kim@moo_jin_kim·
We release Cosmos Policy 💫: a state-of-the-art robot policy built on a video diffusion model backbone. - policy + world model + value function — in 1 model - no architectural changes to the base video model - SOTA in LIBERO (98.5%), RoboCasa (67.1%), & ALOHA tasks (93.6%) 🧵👇
English
17
110
868
147.2K
Ulyana Piterbarg
Ulyana Piterbarg@ulyanapiterbarg·
Very happy to share that I moved to the Bay Area and joined the Gemini team at @googledeepmind ! Grateful to be working with a great team on long horizons, RL for LLMs, and agents I'm looking forward to seeing old friends again and making new ones, DMs are open :)
Ulyana Piterbarg tweet media
English
34
8
458
23.6K
Hanjung Kim รีทวีตแล้ว
Irmak Guzey
Irmak Guzey@irmakkguzey·
We just released AINA, a framework for learning robot policies from Aria 2 demos, and are now open-sourcing the code: github.com/facebookresear…. It includes: ✅ Aria 2 data processing into 3D observations like shown ✅Training of point-based policies ✅Calibration Give it a try!
GIF
English
4
32
139
22.4K