Eric Chan

56 posts

Eric Chan

Eric Chan

@ericryanchan

Chief Scientist of Rhoda AI prev. PhD Student, Stanford University

Katılım Aralık 2020
106 Takip Edilen707 Takipçiler
Sabitlenmiş Tweet
Eric Chan
Eric Chan@ericryanchan·
Today, we announce our team’s progress in pursuing a different type of foundation model for robotics: the Direct Video Action Model (DVA), which does our best to take robotics and turn it into a generative modeling problem we can scale. Technical blog: rhoda.ai/research/direc…
English
12
29
197
19K
Rhoda AI
Rhoda AI@rhoda_ai_·
Most robot demos are “golden runs”: a perfect take selected from many attempts. But real-world deployment is about Continuous Operation. Watch our DVA model tackle a real-world decanting task for 1.5 hours straight: Uncut, Zero human intervention. 🧵👇
English
4
11
40
2K
Eric Chan
Eric Chan@ericryanchan·
@rhoda_ai_ The most challenging part of a real-world task is handling all of the edge cases. A powerful base model is needed to achieve high robustness without requiring a lot of robot data.
English
0
0
3
62
Eric Chan retweetledi
Yilun Du
Yilun Du@du_yilun·
Robot video foundation models can build very powerful robot manipulation policies! These policies enable complex, dexterous manipulation, solve tasks that require long-term visual memory, and do in-context demonstration learning!
Rhoda AI@rhoda_ai_

To bring generalist intelligent robots to the real world, we have to overcome the data scarcity problem. At Rhoda, we are solving it by reformulating robot policies as video generation. Today, we introduce the Direct Video-Action Model (DVA)

English
0
3
24
2.2K
Eric Chan retweetledi
Stephen James
Stephen James@stepjamUK·
Excited to see @rhoda_ai_ come out of stealth! As their advisor, I've had a front-row seat of their work on Direct Video-Action Models which reformulates robot control as video generation. The data efficiency here is super promising. Complex industrial tasks learned from just ~10 hours of robot data. Big things ahead!
Rhoda AI@rhoda_ai_

To bring generalist intelligent robots to the real world, we have to overcome the data scarcity problem. At Rhoda, we are solving it by reformulating robot policies as video generation. Today, we introduce the Direct Video-Action Model (DVA)

English
2
3
14
1.3K
Eric Chan
Eric Chan@ericryanchan·
Vincent has been an inspiration for me since I started in AI — it's not an exaggeration that I wouldn't have done research at all if it were not for him. Thank you for the kind words!
Vincent Sitzmann@vincesitzmann

These are very impressive results! The Rhoda team has decisively gotten "video models for robotics" to work. They train a generalist real-time, causal video model that they then quickly fine-tune using task-specific data to generate video plans (1/n)

English
0
1
12
3.1K
Eric Chan
Eric Chan@ericryanchan·
@QianqianWang5 is a brilliant researcher and we're very lucky to have her on the team! I'm especially excited about her explorations on handling long context, since it's so important for pushing generalization and task complexity
Qianqian Wang@QianqianWang5

Very excited to share our exploration of a new robotics foundation model at Rhoda AI. We train a causal video model from scratch, unlocking new capabilities for robust, long-horizon closed-loop robot control. Learn more: rhoda.ai/research/direc…

English
0
0
21
648
Vincent Sitzmann
Vincent Sitzmann@vincesitzmann·
These are very impressive results! The Rhoda team has decisively gotten "video models for robotics" to work. They train a generalist real-time, causal video model that they then quickly fine-tune using task-specific data to generate video plans (1/n)
Rhoda AI@rhoda_ai_

To bring generalist intelligent robots to the real world, we have to overcome the data scarcity problem. At Rhoda, we are solving it by reformulating robot policies as video generation. Today, we introduce the Direct Video-Action Model (DVA)

English
1
1
39
7.1K
Eric Chan
Eric Chan@ericryanchan·
But the long context also gives a very natural way of doing one-shot learning: we can simply shove the example demonstration into the context window. This may eventually let us do real tasks without any robot data at all! x.com/rhoda_ai_/stat…
Rhoda AI@rhoda_ai_

Because we support long-context visual memory, our robots can learn on the fly. Show the robot a single human demonstration, and it understands both the intent and the motion. It can even extrapolate to novel objects and environments it's never seen before. 🧺✍️

English
0
2
7
1.1K
Eric Chan
Eric Chan@ericryanchan·
Another key advantage is our models gain the ability to handle long context almost for free by training on lots of long videos. For robotics, this is important for handling long-context tasks x.com/rhoda_ai_/stat…
Rhoda AI@rhoda_ai_

Most robots have "amnesia": they only see a few frames at a time. 🧠 In contrast, our model natively supports hundreds of frames of visual context, enabling it to: → Keep track of the world state → Handle complex, multi-step tasks end-to-end

English
1
2
7
1.6K
Eric Chan
Eric Chan@ericryanchan·
Today, we announce our team’s progress in pursuing a different type of foundation model for robotics: the Direct Video Action Model (DVA), which does our best to take robotics and turn it into a generative modeling problem we can scale. Technical blog: rhoda.ai/research/direc…
English
12
29
197
19K
Eric Chan
Eric Chan@ericryanchan·
@tianyuanzhang99 Thank you, Tianyuan! Yes, we see the same benefits—it’s absolutely more scalable because there is so much more data, and in addition, the ability to simulate next states opens up a world of possibilities for planning, eval, and inference time scaling!
English
0
0
1
127
Tianyuan Zhang
Tianyuan Zhang@tianyuanzhang99·
Congrats! Excited to see video generated actions being deployed on real world. Video model learns both world simulating and action planning. more importantly, it's not data bounded yet, more compute bounded.
Eric Chan@ericryanchan

@startupjag @rhoda_ai_ Incredibly excited to introduce a new type of foundation model for robotics. At its core, robotics is a data problem, but that doesn't mean collecting data directly is the only solution.

English
2
0
22
5.9K
Eric Chan
Eric Chan@ericryanchan·
Thrilled to announce what we’ve been working on for the last 17 months, at the intersection of real-time video generation and robotics! We’ve published a technical blog that showcases some of the things we’ve learned along the way.
Rhoda AI@rhoda_ai_

To bring generalist intelligent robots to the real world, we have to overcome the data scarcity problem. At Rhoda, we are solving it by reformulating robot policies as video generation. Today, we introduce the Direct Video-Action Model (DVA)

English
1
2
15
1.5K
Eric Chan retweetledi
Vinod Khosla
Vinod Khosla@vkhosla·
The bar for robotics isn’t lab demos — it’s autonomous operation in real production environments. What impressed me about @rhoda_ai_ was seeing that level of performance with remarkably little robot training data. Pretraining on internet-scale video to build a strong physical prior may seem unconventional today, but approaches like this are what will ultimately unlock general-purpose robotics.
Jagdeep Singh@startupjag

After operating in stealth for the last 18 months @rhoda_ai_ , we’re excited today to finally show the world what we’ve been working on. We believe we’re on a path to physical AGI with the launch of our brand new foundation model, the Direct Video Action (DVA) model.

English
22
43
294
68.3K
Jagdeep Singh
Jagdeep Singh@startupjag·
After operating in stealth for the last 18 months @rhoda_ai_ , we’re excited today to finally show the world what we’ve been working on. We believe we’re on a path to physical AGI with the launch of our brand new foundation model, the Direct Video Action (DVA) model.
English
53
79
599
234K
Eric Chan
Eric Chan@ericryanchan·
@startupjag @rhoda_ai_ Incredibly excited to introduce a new type of foundation model for robotics. At its core, robotics is a data problem, but that doesn't mean collecting data directly is the only solution.
English
0
0
23
8.7K
Eric Chan retweetledi
Rhoda AI
Rhoda AI@rhoda_ai_·
The gap between robotics in the lab and robotics in the real world has been one of the hardest unsolved problems in the industry. We’re excited to come out of stealth and show the research community how we’re tackling the issue. Bloomberg article in comment.
Rhoda AI tweet media
English
13
14
91
18.9K
Rhoda AI
Rhoda AI@rhoda_ai_·
03.10.26
15
35
252
47.1K
Eric Chan
Eric Chan@ericryanchan·
@rhoda_ai_ Excited to share what we've been working on :)
English
1
0
7
717
Eric Chan retweetledi
Nataniel Ruiz
Nataniel Ruiz@natanielruizg·
Excited to show some surprising inventions on generative multiplayer games we made at Google with Stanford. We call the work MultiGen. I've always been inspired by early studios like id Software with Doom or Blizzard with Warcraft bringing networked video games to the next level. We are at the point in history where we can make strides like them, but for generative games. It's a strange feeling to be in the age of generative video games while still discovering how exactly to train the models and design the tools that make them useful. All of the tools that have been invented for classic game engines need to be redesigned for generative games. For example level and world design is not entirely possible with existing technology. We introduce editable memory to diffusion game engines that allow for design of new levels via a minimap. But we can easily imagine how this can be expanded with different creation tools. The end goal of this research direction is to allow game designers to be able to guide the generation process of their world, at the granularity that they prefer. Editable memory also allows us to add multiplayer to Generative Doom. We were amazed when we saw GameNGen some years ago, and now you can play it live with friends in real-time, on your couch or even online. Shared representations like our editable memory seem like the future for this type of experience. Models are, in some cases, expensive and approximate encoders but great interpolators and extrapolators. Leveraging their strengths lets you have completely new experiences that can be realized now and not in the distant future. This work was started at my previous team and continued in collaboration with Stanford. Congratulations to all for the discoveries.
English
32
79
571
98.2K