Yevgen Chebotar

45 posts

Yevgen Chebotar

Yevgen Chebotar

@YevgenChebotar

Robotic foundation models @NVIDIA 🤖 Previously @GoogleDeepMind (RT-2, VLAs, Offline RL) and @Figure_robot (Helix)

Katılım Mart 2017
343 Takip Edilen2.1K Takipçiler
Yevgen Chebotar retweetledi
Ruijie Zheng
Ruijie Zheng@ruijie_zheng12·
Proud to introduce EgoScale: We pretrained a GR00T VLA model on 20K+ hours of egocentric human video and discovered that robot dexterity can be scaled, not with more robots, but with more human data. A thread on 🧵what we learned. 👇
English
24
65
331
93.1K
Yevgen Chebotar retweetledi
Zhengyi “Zen” Luo
Zhengyi “Zen” Luo@zhengyiluo·
SONIC is now open-source! Generalist whole-body teleoperation for EVERYONE! Our team has long been building comprehensive pipelines for whole-body control, kinematic planner, and teleoperation, and they will all be shared. This will be a continuous update; inference code + model already there, training code and gr00t integration coming soon! Code: github.com/NVlabs/GR00T-W… Docs: nvlabs.github.io/GR00T-WholeBod… Site: nvlabs.github.io/GEAR-SONIC/
English
33
197
852
149.3K
Yevgen Chebotar
Yevgen Chebotar@YevgenChebotar·
We are moving towards world-model based backbones for robotic policies, pre-trained on web-scale videos and outputting robotic actions directly within the same diffusion model! There are new levels of transfer to unseen tasks and motions being unlocked, something that has been missing in VLM-based pre-training ever since the early days of RT-2 VLA models, which although generalizing well to new objects and semantics always struggled with new “verbs” and “motions”. We also observe signs of efficient cross-embodiment transfer to new robots with a little amount of data. There are a lot of optimizations that allow us to run a 14B world action model in real time! My favorite trick is DreamZero-Flash, which while still denoising videos and actions jointly, introduces separate noise schedules, such that actions can be denoised much faster than videos to enable higher frequency control! Check out the website, paper and especially the eval gallery! dreamzero0.github.io
Joel Jang@jang_yoel

Introducing DreamZero 🤖🌎 from @nvidia > A 14B “World Action Model” that achieves zero-shot generalization to unseen tasks & few-shot adaptation to new robots > The key? Jointly predicting video & actions in the same diffusion forward pass Project Page: dreamzero0.github.io 🧵 (1/10)

English
2
1
26
1.7K
Yevgen Chebotar
Yevgen Chebotar@YevgenChebotar·
Excited to join the NVIDIA GEAR team to help build the next generation of open robotic foundation models!
Yevgen Chebotar tweet media
English
13
2
154
13.9K
Yevgen Chebotar
Yevgen Chebotar@YevgenChebotar·
The path to VLAs lies through VLMs. A very nice intro for everyone interested in working with Vision-Language Models: arxiv.org/abs/2405.17247
English
1
17
68
6.3K
Yevgen Chebotar
Yevgen Chebotar@YevgenChebotar·
Some personal updates! Excited to join the team @Figure_robot to help building AI for the robot age! 🤖
English
24
2
165
66.3K
Yevgen Chebotar
Yevgen Chebotar@YevgenChebotar·
RT-H learns a hierarchy all the way from high-level tasks through low-level “language motions” to robot actions! ✅ Improved performance and generalization through better data sharing ✅ Automated grounded “bottom-up” labeling ✅ Ability to intervene and correct with language
Suneel Belkhale@suneel_belkhale

Is language capable of representing low-level *motions* of a robot? RT-Hierarchy learns an action hierarchy using motions described in language, like “move arm forward” or “close gripper” to improve policy learning. 📜: arxiv.org/abs/2403.01823 🏠: rt-hierarchy.github.io (1/10)

English
1
3
37
4.6K
Yevgen Chebotar retweetledi
Ted Xiao
Ted Xiao@xiao_ted·
Had a great time today with @YevgenChebotar and @QuanVng visiting @USCViterbi to give a talk on “Robot Learning in the Era of Foundation Models”. Slides out soon, packed with works from *just the past 5 months* 🤯 Thanks to @daniel_t_seita for hosting!
Ted Xiao tweet media
English
1
3
58
5.6K
Sasha
Sasha@_shydrie·
@YevgenChebotar Would it be possible get the website up again? It is currently unavailable.
English
1
0
0
143
Yevgen Chebotar
Yevgen Chebotar@YevgenChebotar·
Offline RL strikes back! In our new Q-Transformer paper, we introduce a scalable framework for offline reinforcement learning using Transformers and autoregressive Q-Learning to learn from mixed-quality datasets! Website and paper: q-transformer.github.io 🧵
GIF
English
8
105
522
210.6K
Yevgen Chebotar
Yevgen Chebotar@YevgenChebotar·
Exciting times for Robot Learning! 60 datasets from 22 different robots and 21 institutions combined in a single Open-X Embodiment data repository, resulting in over 1 million episodes and improved RT-X models! Amazing and a very important collaboration across the world! 🤖🌐
Quan Vuong@QuanVng

RT-X: generalist AI models lead to 50% improvement over RT-1 and 3x improvement over RT-2, our previous best models. 🔥🥳🧵 Project website: robotics-transformer-x.github.io

English
0
1
12
1.4K
Yevgen Chebotar
Yevgen Chebotar@YevgenChebotar·
Our real robot policies significantly improve upon RT-1 and other baselines when trained on limited amount of human demonstrations by leveraging autonomously collected negatives and dynamic programming properties of Q-learning.
English
1
0
15
2.1K
Yevgen Chebotar
Yevgen Chebotar@YevgenChebotar·
Excited to present RT-2, a large unified Vision-Language-Action model! By converting robot actions to strings, we can directly train large visual-language models to output actions while retaining their web-scale knowledge and generalization capabilities! robotics-transformer2.github.io
Yevgen Chebotar tweet media
Google DeepMind@GoogleDeepMind

Today, we announced 𝗥𝗧-𝟮: a first of its kind vision-language-action model to control robots. 🤖 It learns from both web and robotics data and translates this knowledge into generalised instructions. Find out more: dpmd.ai/introducing-rt2

English
0
14
76
24.2K