RoboPapers

235 posts

RoboPapers banner
RoboPapers

RoboPapers

@RoboPapers

@chris_j_paxton, @micoolcho & @DJiafei geeking out weekly with authors of robotics AI papers. On YouTube / X / Spotify / Substack

Katılım Şubat 2025
2 Takip Edilen4.3K Takipçiler
RoboPapers
RoboPapers@RoboPapers·
Achieving generalizable manipulation is the north star for robotics learning, and while we’ve in the past seen incredible results on specific tasks using fine-tuned VLAs, this north star has remained elusive. Perhaps what is needed is a different approach. DreamZero proposes World Action models (WAMs), which jointly model both action and video in order to achieve state-of-the-art performance on benchmarks like MolmoSpaces and RoboArena. @SeonghyeonYe of @NVIDIARobotics joins us to talk about building a 14B parameter autoregressive diffusion model which achieves state-of-the-art generalization on real world tasks and on the best available benchmarks. Watch episode #68 of RoboPapers, with @micoolcho and @chris_j_paxton, now!
English
0
1
6
54
RoboPapers
RoboPapers@RoboPapers·
Robotics research is moving fast, and being able to modify and improve upon hardware is crucial to maintaining velocity. That’s why Menlo Research has started working on their own open-source humanoid project, Asimov @asimovinc. And they are moving fast. It’s been roughly six months since they started the project, and they already have full humanoid with arms, legs, and a head, which can walk forwards and backwards. @selim__1903 and @AlexE_00 of Menlo Research join us to talk about the development of this open-source humanoid. Watch episode 67 of RoboPapers, with @chris_j_paxton and @DJiafei, now! And follow @asimovinc for daily updates on humanoid robot development.
English
2
18
47
10.5K
RoboPapers
RoboPapers@RoboPapers·
How should we represent robot actions for autoregressive transformers? Most robot policies use diffusion or flow to generate continuous action sequences, but this isn’t how large language models work; they predict output tokens, which has many advantages. But coming up with a set of useful action tokens, so we can skip the slow and expensive diffusion steps, is difficult. @liu730chaoqi says action tokens need three qualities: reasonable compression, universal decodability, and a left-to-right causally ordered token space, and he proposes Ordered Action Tokenization as a solution to all three. Watch Episode 66 of RoboPapers now, with @micoolcho and @chris_j_paxton, to learn more!
English
0
5
46
12.7K
RoboPapers
RoboPapers@RoboPapers·
Pretraining is essential for good performance on a wide variety of robotics tasks, and so most vision-language-action models build off of a vision language model (VLM) trained on a wide variety of image-language data. But how does the choice of VLM translate to downstream robotics performance? Jianke Zhang and @GYanjiang join us to talk about this key part of the robot policy, looking at a wide variety of different VLMs and how they perform. Interestingly, they see that performance on auxiliary tasks like quesiton answering did not lead to downstream improvements in control. To learn more, watch episode 65 of RoboPapers now, with @chris_j_paxton and @DJiafei!
English
2
20
124
23.6K
RoboPapers
RoboPapers@RoboPapers·
Human motion is instinctual. We know how to interact with the world around us, almost without thinking about it at all. @ziwenzhuang_leo and @ShaotingZ38103 joined us on RoboPapers to talk about their ambitious Project Instinct: which provides the tools, algorithms, and environments necessary to build humanoid whole-body control which can handle contact with the environment. Watch Episode #64 of RoboPapers with @micoolcho and @DJiafei now!
English
0
8
31
7.5K
RoboPapers
RoboPapers@RoboPapers·
The holy grail of robotics is to be able to perform previously-unseen, out-of-distribution manipulation tasks “zero shot” in a new environment. NovaFlow proposes an approach which (1) generates a video, (2) computes predicted flow — how points move through the scene — and (3) uses this flow as an objective to generate a motion. Using this procedure, NovaFlow generates motions in unseen scenes, for unseen tasks, and can transfer across embodiments. To learn more, we are joined by @Hongyu_Lii and @jiahuifu_carol from RAI. Watch Episode #63 of RoboPapers with @chris_j_paxton and @micoolcho now to learn more!
English
1
4
53
20.7K