Karl Pertsch

454 posts

Karl Pertsch

Karl Pertsch

@KarlPertsch

Robot Foundation Models @physical_int

Katılım Temmuz 2015
280 Takip Edilen4.2K Takipçiler
Sabitlenmiş Tweet
Karl Pertsch
Karl Pertsch@KarlPertsch·
This one has been a long time coming: today we’re introducing MEM, an approach for giving VLAs short-term and long-term memory. Memory is such an obvious capability, but adding it isn’t easy (most VLAs today are memory-less). A short thread on challenges, solutions, and the new capabilities MEM unlocks for us.
English
7
11
109
8.8K
Karl Pertsch retweetledi
Physical Intelligence
Physical Intelligence@physical_int·
We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.
English
10
131
1.1K
90.8K
Karl Pertsch retweetledi
Karl Pertsch
Karl Pertsch@KarlPertsch·
Jup, tho off the shelf VLMs today are often not well suited as HL policies for more complex tasks (many papers have shown this, they struggle with finegrained interaction understanding, failures etc) and robot fine tuned models so far need to be taught to remember explicitly. Agree tho that in the future this will hopefully be bridged
English
0
0
7
523
Ryan Punamiya
Ryan Punamiya@ryan_punamiya·
@KarlPertsch Thanks for the clarification Karl, I wonder if there is room to discuss whether this could just be an added line to the base prompt to the high level VLM i.e summarize sub task history
English
1
0
1
1.1K
Karl Pertsch retweetledi
Danny Driess
Danny Driess@DannyDriess·
Many real-world tasks require memory to be successful. Yet, most robots don’t have any form of memory. Today, we are going to change that. We developed a system called MEM that introduces memory into VLAs on multiple scales
Physical Intelligence@physical_int

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English
5
12
64
5.1K
Karl Pertsch retweetledi
Marcel Torné
Marcel Torné@marceltornev·
We equipped PI policies with memory! And taught our robots to do long-horizon real world tasks such as preparing the items for a recipe, cooking a grilled cheese and cleaning the kitchen!
Physical Intelligence@physical_int

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English
7
15
83
8.3K
Karl Pertsch retweetledi
Physical Intelligence
Physical Intelligence@physical_int·
We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇
English
49
263
2.1K
432.5K
Karl Pertsch
Karl Pertsch@KarlPertsch·
This was one of the longest-running research projects at pi — adding memory to your models stretches all parts of your infra and needs innovation on the whole stack. The project started as @HomerWalke's internship project with @DannyDriess, but had lots of help from countless people at pi to get over the finish line. Special shoutout to @marceltornev who worked tirelessly to teach our models the long-horizon behaviors you saw in the videos above! For more details, check out our blog & paper: pi.website/research/memory
English
0
0
6
369
Karl Pertsch
Karl Pertsch@KarlPertsch·
Finally, many prior works have reported that policies get *worse* on dexterous tasks when adding memory (because of spurious correlations, causal confusion etc). We find that by equipping pi06 with MEM and training it on our most diverse data mix, we can match pi06 performance on tasks that do not require memory (while clearly outperforming on memory tasks). This is IMO one of the biggest results here: we have a recipe for adding memory to VLAs without significant tradeoffs, both in terms of latency and performance!
Karl Pertsch tweet media
English
1
0
4
354
Karl Pertsch
Karl Pertsch@KarlPertsch·
This one has been a long time coming: today we’re introducing MEM, an approach for giving VLAs short-term and long-term memory. Memory is such an obvious capability, but adding it isn’t easy (most VLAs today are memory-less). A short thread on challenges, solutions, and the new capabilities MEM unlocks for us.
English
7
11
109
8.8K
Yu Xiang
Yu Xiang@YuXiang_IRVL·
Has anyone built a URDF for the Panda arm + Robotiq 2F-85 gripper setup used in the DROID dataset? Thanks! 🙏
Yu Xiang tweet media
English
5
4
48
4.6K
Karl Pertsch
Karl Pertsch@KarlPertsch·
Check out Will's new project! By increasing the "prompting surface" of a VLA (keypoints, language at multiple levels of abstraction) we can get VLMs to steer them much more effectively. This allows us to "import" lots of useful VLM capabilities (reasoning, in-context learning) and push overall system performance!
Will Chen@verityw_

How can robot policies be trained to best leverage VLMs' CoT reasoning and in-context learning for generalization? The key is Steerable Policies: vision-language-action models that can be flexibly controlled in many ways! steerable-policies.github.io 1/9

English
2
7
75
7.5K
Karl Pertsch
Karl Pertsch@KarlPertsch·
For a bit of context: RoboArena runs real-world evals — you train a policy and host a server that receives camera images and returns robot actions, then the RoboArena system connects your policy server to real robot stations of volunteers around the world who compare your policy to others in pairwise, double blind comparisons. We aggregate all those comparisons into a global leaderboard :) Check out robo-arena.github.io/results to see the eval rollouts as they are uploaded!
English
0
0
4
82
eigenron
eigenron@eigenron·
robot farms where you can simply SSH into a robotic arm or an embodied system to test your VLM/VLA/robot policies. who's building this?
English
37
9
324
33.3K
Karl Pertsch
Karl Pertsch@KarlPertsch·
Check out Tony's new work on training and evaluating robot reward models -- accurate, scalable rewards are key for learning from online robot experience! In RoboReward we introduce a benchmark & new reward models that significantly outperform off-the-shelf VLMs in their ability to provide accurate rewards. Check out Tony's thread for all the details!
Tony Lee@tonyh_lee

Reliable rewards are a bottleneck for real-world RL for robotics: human labels are costly, and handcrafted rewards are brittle. In RoboReward 🤖💰, we study VLMs as reward models and find they are unreliable across tasks, embodiments, and scenes. Paper: arxiv.org/abs/2601.00675

English
1
1
13
2.6K