Danny Driess

164 posts

Danny Driess

Danny Driess

@DannyDriess

Research Scientist @physical_int. Formerly Google DeepMind

Katılım Ağustos 2021
334 Takip Edilen3.9K Takipçiler
Sabitlenmiş Tweet
Danny Driess
Danny Driess@DannyDriess·
How to build vision-language-action models that train fast, run fast & generalize? In our new paper, we formalize & analyze the approach of our π-0.5 model & further improve it with a single stage recipe. Blog: pi.website/research/knowl… Paper: pi.website/download/pi05_…
English
6
28
220
19K
Danny Driess retweetledi
Physical Intelligence
Physical Intelligence@physical_int·
We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.
English
14
190
1.5K
163.9K
Danny Driess retweetledi
Kyle Vedder
Kyle Vedder@KyleVedder·
this robustness allows the policy to do diverse, long horizon tasks in unseen environments for example, the demo kitchen was built *after* the potatoes policy was fully trained — I just wrote the high level prompt to tell it where to go look for items and it did the rest
Kyle Vedder tweet media
English
1
2
33
2.4K
Danny Driess retweetledi
Marcel Torné
Marcel Torné@marceltornev·
We equipped PI policies with memory! And taught our robots to do long-horizon real world tasks such as preparing the items for a recipe, cooking a grilled cheese and cleaning the kitchen!
Physical Intelligence@physical_int

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English
7
15
83
8.4K
Danny Driess retweetledi
Karl Pertsch
Karl Pertsch@KarlPertsch·
This one has been a long time coming: today we’re introducing MEM, an approach for giving VLAs short-term and long-term memory. Memory is such an obvious capability, but adding it isn’t easy (most VLAs today are memory-less). A short thread on challenges, solutions, and the new capabilities MEM unlocks for us.
English
7
11
109
8.8K
Danny Driess
Danny Driess@DannyDriess·
One aspect I am particularly excited about is that memory enables the model to adapt its strategy while solving the task, something we can coin “in-context adaptation”. In this example, it is unclear from a single image whether the fridge opens from the left or the right. Hence, a model without memory (left) might fail to open the fridge repeatedly. In contrast, with memory (right), our model learns “in-context” that the fridge opens differently, and adjusts its strategy accordingly.
English
0
0
3
338
Danny Driess
Danny Driess@DannyDriess·
The key idea behind Multi-Scale Embodied Memory (MEM): use different modalities to represent memory at different time scales. 📹 For short horizon memory, we developed an efficient video encoder that lets the model remember fine-grained details about its recent interactions. 📜 For long horizon memory, we train the model to summarize events in text, allowing it to remember events for up to 15 min.
Danny Driess tweet media
English
1
0
3
430
Danny Driess
Danny Driess@DannyDriess·
Many real-world tasks require memory to be successful. Yet, most robots don’t have any form of memory. Today, we are going to change that. We developed a system called MEM that introduces memory into VLAs on multiple scales
Physical Intelligence@physical_int

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English
5
12
64
5.1K
Danny Driess retweetledi
Physical Intelligence
Physical Intelligence@physical_int·
General-purpose AI models are behind some of the most exciting applications we now can't live without. We envision that an analogous “physical intelligence layer” built with models like π0.6 will similarly spur a new wave of applications for the physical world. We’ve recently begun working with a handful of companies that have deployed their robots to do real-world, useful things. pi.website/blog/partner/?…
English
10
91
738
166.5K
Danny Driess
Danny Driess@DannyDriess·
What I like about this: If I want to explain someone how to solve a task, I rarely use language alone, I might point at things, wave in the air, without restricting myself to only one interface to communicate my intent. This work brings this idea into VLAs.
English
1
0
1
234
Danny Driess
Danny Driess@DannyDriess·
Check out our latest work on steerable policies. Instead of having only language as the interface to a VLA, steerable policies follow point queries, motion traces, atomic subtasks and more, which allows us to make better use of VLMs controlling them. More in @verityw_'s thread
Danny Driess tweet media
Will Chen@verityw_

How can robot policies be trained to best leverage VLMs' CoT reasoning and in-context learning for generalization? The key is Steerable Policies: vision-language-action models that can be flexibly controlled in many ways! steerable-policies.github.io 1/9

English
1
1
9
683
Danny Driess
Danny Driess@DannyDriess·
The idea behind significantly improving the performance on hard real-world tasks is to train a value function, condition the model on advantages computed from the value function, and running an iterative improvement loop where the model learns from it’s own data.
Danny Driess tweet media
English
1
0
6
326
Danny Driess retweetledi
Russ Tedrake
Russ Tedrake@RussTedrake·
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the technology, and to share a lot of details for how we're achieving it. youtube.com/watch?v=BEXFnr…
YouTube video
YouTube
English
8
108
488
86.3K
Danny Driess
Danny Driess@DannyDriess·
Had a blast on the Unsupervised Learning Podcast with @hausman_k! We covered the past, present, and future of robot learning 🤖 Big thanks to @jacobeffron for being a fantastic host!
Jacob Effron@jacobeffron

New Unsupervised Learning with @hausman_k & @DannyDriess (@physical_int) on building generalist robotics foundation models and: - What’s next in AI x robotics - Biggest outstanding questions - How they 10x’d model training speed - Open sourcing π 0 - Breakthroughs in generalization Spotify: bit.ly/4lG8Xf3 Apple: bit.ly/3TvQp5g YouTube: youtu.be/cpGQa5Q4yII

English
1
0
29
1.8K
Sergey Levine
Sergey Levine@svlevine·
Fun project at PI: knowledge insulation for VLAs. We figured out how to train VLAs with cont. actions much more effectively by insulating the VLM and training it with discrete actions, while action expert learns on top. 5-7x faster, and importantly way better language following 👇
English
5
63
466
35.8K