Changan Chen

120 posts

Changan Chen

@changanvr

Building @RhodaAI Exploring new paradigms for scaling visual intelligence toward physical AGI. Prev: @Stanford @UTAustin @FAIR.

Bay Area, San Francisco Katılım Aralık 2015

450 Takip Edilen780 Takipçiler

Changan Chen@changanvr·5d

We got our large-scale foundational video model running real-time on an edge device! Amazing work led by @leooeld!

Rhoda AI@RhodaAI

Can a large foundation video model run as a real-time robot policy at the edge, on a single RTX 5090? • ✅ No quantization • ✅ No distillation • ✅ Full denoising (all the way from noise to clean video) We just proved it's possible. 👇🎬

English

2.3K

Changan Chen@changanvr·23 Nis

@RhodaAI Learning from human demonstrations is a crucial step toward enabling robots to quickly acquire new tasks without relying on teleoperation.

English

103

Rhoda AI@RhodaAI·23 Nis

Teaching a robot a new task typically means stopping operations, collecting teleoperated demonstrations, and retraining. That process takes hours at a minimum. We wanted to know if we could collapse it to seconds — from a single human demo, on the fly, no retraining required. Early research preview: we can.

English

6.7K

Changan Chen@changanvr·21 Nis

@GeneralistAI @BerkayAntmen @RhodaAI Great to see the shell game getting traction! Long-context memory is genuinely hard to get right. Good to see the robot learning field paying more attention to long-context visual memory.

English

370

Generalist@GeneralistAI·21 Nis

GEN-1 plays the 🐚 shell game, trained on just 1 hr of robot data. It also generalizes to unseen objects, like @BerkayAntmen 's car keys. Physical AI models should be capable of benchmark tasks like this one. It's interesting for the all the reasons @RhodaAI calls out -- requires visual memory, and the model must track the cups from the very start, at high frame rates. Interestingly, GEN-1 appears to exhibit a degree of "active perception." It's subtle; the hands can sometimes appear to "follow" the cups, using its own movements to help attend to where it thinks the object should be. Read more about GEN-1 in our blog post in the comments below ↓

Rhoda AI@RhodaAI

Here’s something we’ve never seen done before. Real-world tasks are long and ambiguous. Solving them requires visual memory and state tracking. Most robot policies only see the last few frames. Ours doesn't. We put our DVA, FutureVision, to the perfect testbed: the shell game 🐚. The DVA nails it.

English

272

856.2K

Changan Chen@changanvr·10 Nis

Amazing work on long-context video memory and reasoning led by @QianqianWang5!

Rhoda AI@RhodaAI

English

2.8K

Changan Chen@changanvr·10 Nis

@rhodaai While this may seem like a simple task, it requires long-context memory and the ability to reason about object motion across an entire sequence. This capability is naturally supported by our video models that are trained on large-scale video data.

English

637

Rhoda AI@RhodaAI·9 Nis

English

234

82.4K

Changan Chen@changanvr·28 Mar

@isskoro @rhodaai We’re thrilled to have you join the team!

English

158

Ivan Skorokhodov@isskoro·28 Mar

After almost 3 years at Snap, several dozens of research and engineering projects, countless wild moments lived through together with the incredible team, I've decided to start a new adventure in robotics and join @rhodaai to bring the technological singularity a little closer

English

3.4K

Changan Chen@changanvr·25 Mar

We are entering a new era of robot deployment. With our model, teams can iterate on solutions and data collection more rapidly thanks to highly efficient training. What once took months can now be achieved in as little as 19 days, bringing development to an entirely new pace.

Rhoda AI@RhodaAI

1/ We are speed running industrial robotics. It took us just 19 days from the first day of data collection to filming a 2.5-hour continuous run of our model autonomously breaking down industrial containers — zero human intervention. The data efficiency of our DVA model is fundamentally changing how fast we bring robots out of the lab and into the factory. Autonomous operation with 3 hours of data collection at a customer factory.

English

3.7K

Changan Chen@changanvr·21 Mar

@karpathy This is cool! I wonder what you think about the following potential issues: 1. How does the agent avoid getting stuck in local minima during exploration? 2. How well does training on a small model translate to larger-scale models with significantly more compute and data?

English

Andrej Karpathy@karpathy·7 Mar

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

1.1K

3.7K

28.4K

11M

Changan Chen@changanvr·18 Mar

Over the past year and a half, I’ve spoken with many robot learning researchers, and a recurring concern is how to achieve a 99.99% success rate. Many video demos showcase a single flawless run, but that doesn’t address the long-tail challenges of real-world tasks, where countless corner cases arise. What we’ve observed with video models is that, once sufficiently powerful and pre-trained, they can capture multi-modal distributions effectively. In practice, this means the model can learn diverse behaviors for handling edge cases without requiring hundreds of hours of video data.

Rhoda AI@RhodaAI

Most robot demos are “golden runs”: a perfect take selected from many attempts. But real-world deployment is about Continuous Operation. Watch our DVA model tackle a real-world decanting task for 1.5 hours straight: Uncut, Zero human intervention. 🧵👇

English

186

Changan Chen retweetledi

0x796F@0x796F·16 Mar

You can now train @physical_int style robots in 1 day for only $5k. Anvil’s devkits have all the hardware, software, controls, cameras, and more ready-to-go. (1/5)

English

576

324.1K

Changan Chen@changanvr·14 Mar

@dchaplot Congrats Devendra!

Català

959

Devendra Chaplot@dchaplot·14 Mar

I'm joining SpaceX and xAI, working closely with Elon and team to build superintelligence. Together SpaceX and xAI combine physical and digital intelligence under a leader who understands hardware at the deepest level. Add a high-agency culture with frontier-scale resources, and you get the possibility to achieve something truly unique. I’m excited to advance the fields I’ve obsessed over for years, from robotics research to building AI models on the founding teams of Mistral and TML. Both were extraordinary journeys with extraordinary people that shaped how I think about building intelligence from the ground up. Grateful for everything that brought me here and can’t wait to get started.

English

2.9K

2.1K

28.2K

43.5M

Changan Chen retweetledi

Yilun Du@du_yilun·11 Mar

Robot video foundation models can build very powerful robot manipulation policies! These policies enable complex, dexterous manipulation, solve tasks that require long-term visual memory, and do in-context demonstration learning!

Rhoda AI@RhodaAI

To bring generalist intelligent robots to the real world, we have to overcome the data scarcity problem. At Rhoda, we are solving it by reformulating robot policies as video generation. Today, we introduce the Direct Video-Action Model (DVA)

English

2.5K

Changan Chen@changanvr·11 Mar

@sainingxie @ylecun @amilabs Congrats and look forward to what you will be building!

English

199

Saining Xie@sainingxie·10 Mar

i’m joining forces with @ylecun and an incredible group of people to start AMI Labs @amilabs. AMI isn’t a conventional lab. we don’t intend to become one. a lot to say about why this moment matters, but for now we’re heads down building. join us: amilabs.xyz

AMI Labs@amilabs

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English

153

162

2.8K

489.1K

Changan Chen@changanvr·11 Mar

@vkhosla @rhodaai Extremely grateful for backing us up! Indeed, web-scale video data and video generation as the modeling gives the model a strong physical prior that enables the model to pick up new tasks in the physical world quickly and effectively.

English

111

Vinod Khosla@vkhosla·10 Mar

The bar for robotics isn’t lab demos — it’s autonomous operation in real production environments. What impressed me about @rhodaai was seeing that level of performance with remarkably little robot training data. Pretraining on internet-scale video to build a strong physical prior may seem unconventional today, but approaches like this are what will ultimately unlock general-purpose robotics.

Jagdeep Singh@startupjag

After operating in stealth for the last 18 months @rhodaai , we’re excited today to finally show the world what we’ve been working on. We believe we’re on a path to physical AGI with the launch of our brand new foundation model, the Direct Video Action (DVA) model.

English

292

68.8K

Changan Chen@changanvr·11 Mar

See the technical blog for more details: rhoda.ai/research/direc…

English

107

Changan Chen@changanvr·11 Mar

This matters a lot for production robotics. In real deployments, workflows change constantly. What we need is a system where setup → data collection → training → deployment can happen within a few days, not months.

English

122

Changan Chen@changanvr·11 Mar

Today, we introduce Direct Video-Action Models (DVA), the first native causal video model for robot control. By reformulating robot policy as video generation, DVA is extremely scalable by leveraging web-scale video data and using video generation as the modeling objective.

Rhoda AI@RhodaAI

English

434

Keşfet

@leooeld @RhodaAI @GeneralistAI @BerkayAntmen @QianqianWang5 @rhodaai @isskoro @karpathy