Changan Chen

120 posts

Changan Chen banner
Changan Chen

Changan Chen

@changanvr

Building @RhodaAI Exploring new paradigms for scaling visual intelligence toward physical AGI. Prev: @Stanford @UTAustin @FAIR.

Bay Area, San Francisco Katılım Aralık 2015
450 Takip Edilen780 Takipçiler
Changan Chen
Changan Chen@changanvr·
@RhodaAI Learning from human demonstrations is a crucial step toward enabling robots to quickly acquire new tasks without relying on teleoperation.
English
0
0
1
103
Rhoda AI
Rhoda AI@RhodaAI·
Teaching a robot a new task typically means stopping operations, collecting teleoperated demonstrations, and retraining. That process takes hours at a minimum. We wanted to know if we could collapse it to seconds — from a single human demo, on the fly, no retraining required. Early research preview: we can.
English
9
15
84
6.7K
Changan Chen
Changan Chen@changanvr·
@GeneralistAI @BerkayAntmen @RhodaAI Great to see the shell game getting traction! Long-context memory is genuinely hard to get right. Good to see the robot learning field paying more attention to long-context visual memory.
English
0
0
8
370
Generalist
Generalist@GeneralistAI·
GEN-1 plays the 🐚 shell game, trained on just 1 hr of robot data. It also generalizes to unseen objects, like @BerkayAntmen 's car keys. Physical AI models should be capable of benchmark tasks like this one. It's interesting for the all the reasons @RhodaAI calls out -- requires visual memory, and the model must track the cups from the very start, at high frame rates. Interestingly, GEN-1 appears to exhibit a degree of "active perception." It's subtle; the hands can sometimes appear to "follow" the cups, using its own movements to help attend to where it thinks the object should be. Read more about GEN-1 in our blog post in the comments below ↓
Rhoda AI@RhodaAI

Here’s something we’ve never seen done before. Real-world tasks are long and ambiguous. Solving them requires visual memory and state tracking. Most robot policies only see the last few frames. Ours doesn't. We put our DVA, FutureVision, to the perfect testbed: the shell game 🐚. The DVA nails it.

English
10
38
272
856.2K
Changan Chen
Changan Chen@changanvr·
@rhodaai While this may seem like a simple task, it requires long-context memory and the ability to reason about object motion across an entire sequence. This capability is naturally supported by our video models that are trained on large-scale video data.
English
0
0
3
637
Rhoda AI
Rhoda AI@RhodaAI·
Here’s something we’ve never seen done before. Real-world tasks are long and ambiguous. Solving them requires visual memory and state tracking. Most robot policies only see the last few frames. Ours doesn't. We put our DVA, FutureVision, to the perfect testbed: the shell game 🐚. The DVA nails it.
English
8
38
234
82.4K
Ivan Skorokhodov
Ivan Skorokhodov@isskoro·
After almost 3 years at Snap, several dozens of research and engineering projects, countless wild moments lived through together with the incredible team, I've decided to start a new adventure in robotics and join @rhodaai to bring the technological singularity a little closer
Ivan Skorokhodov tweet media
English
9
1
68
3.4K
Changan Chen
Changan Chen@changanvr·
We are entering a new era of robot deployment. With our model, teams can iterate on solutions and data collection more rapidly thanks to highly efficient training. What once took months can now be achieved in as little as 19 days, bringing development to an entirely new pace.
Rhoda AI@RhodaAI

1/ We are speed running industrial robotics. It took us just 19 days from the first day of data collection to filming a 2.5-hour continuous run of our model autonomously breaking down industrial containers — zero human intervention. The data efficiency of our DVA model is fundamentally changing how fast we bring robots out of the lab and into the factory. Autonomous operation with 3 hours of data collection at a customer factory.

English
1
6
16
3.7K
Changan Chen
Changan Chen@changanvr·
@karpathy This is cool! I wonder what you think about the following potential issues: 1. How does the agent avoid getting stuck in local minima during exploration? 2. How well does training on a small model translate to larger-scale models with significantly more compute and data?
English
0
0
0
72
Andrej Karpathy
Andrej Karpathy@karpathy·
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)
Andrej Karpathy tweet media
English
1.1K
3.7K
28.4K
11M
Changan Chen
Changan Chen@changanvr·
Over the past year and a half, I’ve spoken with many robot learning researchers, and a recurring concern is how to achieve a 99.99% success rate. Many video demos showcase a single flawless run, but that doesn’t address the long-tail challenges of real-world tasks, where countless corner cases arise. What we’ve observed with video models is that, once sufficiently powerful and pre-trained, they can capture multi-modal distributions effectively. In practice, this means the model can learn diverse behaviors for handling edge cases without requiring hundreds of hours of video data.
Rhoda AI@RhodaAI

Most robot demos are “golden runs”: a perfect take selected from many attempts. But real-world deployment is about Continuous Operation. Watch our DVA model tackle a real-world decanting task for 1.5 hours straight: Uncut, Zero human intervention. 🧵👇

English
0
0
4
186
Changan Chen retweetledi
0x796F
0x796F@0x796F·
You can now train @physical_int style robots in 1 day for only $5k. Anvil’s devkits have all the hardware, software, controls, cameras, and more ready-to-go. (1/5)
English
22
73
576
324.1K
Devendra Chaplot
Devendra Chaplot@dchaplot·
I'm joining SpaceX and xAI, working closely with Elon and team to build superintelligence. Together SpaceX and xAI combine physical and digital intelligence under a leader who understands hardware at the deepest level. Add a high-agency culture with frontier-scale resources, and you get the possibility to achieve something truly unique. I’m excited to advance the fields I’ve obsessed over for years, from robotics research to building AI models on the founding teams of Mistral and TML. Both were extraordinary journeys with extraordinary people that shaped how I think about building intelligence from the ground up. Grateful for everything that brought me here and can’t wait to get started.
Devendra Chaplot tweet media
English
2.9K
2.1K
28.2K
43.5M
Changan Chen retweetledi
Yilun Du
Yilun Du@du_yilun·
Robot video foundation models can build very powerful robot manipulation policies! These policies enable complex, dexterous manipulation, solve tasks that require long-term visual memory, and do in-context demonstration learning!
Rhoda AI@RhodaAI

To bring generalist intelligent robots to the real world, we have to overcome the data scarcity problem. At Rhoda, we are solving it by reformulating robot policies as video generation. Today, we introduce the Direct Video-Action Model (DVA)

English
0
3
24
2.5K
Changan Chen
Changan Chen@changanvr·
@vkhosla @rhodaai Extremely grateful for backing us up! Indeed, web-scale video data and video generation as the modeling gives the model a strong physical prior that enables the model to pick up new tasks in the physical world quickly and effectively.
English
0
0
1
111
Vinod Khosla
Vinod Khosla@vkhosla·
The bar for robotics isn’t lab demos — it’s autonomous operation in real production environments. What impressed me about @rhodaai was seeing that level of performance with remarkably little robot training data. Pretraining on internet-scale video to build a strong physical prior may seem unconventional today, but approaches like this are what will ultimately unlock general-purpose robotics.
Jagdeep Singh@startupjag

After operating in stealth for the last 18 months @rhodaai , we’re excited today to finally show the world what we’ve been working on. We believe we’re on a path to physical AGI with the launch of our brand new foundation model, the Direct Video Action (DVA) model.

English
23
37
292
68.8K
Changan Chen
Changan Chen@changanvr·
This matters a lot for production robotics. In real deployments, workflows change constantly. What we need is a system where setup → data collection → training → deployment can happen within a few days, not months.
English
1
0
1
122
Changan Chen
Changan Chen@changanvr·
Today, we introduce Direct Video-Action Models (DVA), the first native causal video model for robot control. By reformulating robot policy as video generation, DVA is extremely scalable by leveraging web-scale video data and using video generation as the modeling objective.
Rhoda AI@RhodaAI

To bring generalist intelligent robots to the real world, we have to overcome the data scarcity problem. At Rhoda, we are solving it by reformulating robot policies as video generation. Today, we introduce the Direct Video-Action Model (DVA)

English
2
1
8
434