Karl Pertsch

11

109

8.8K

Karl Pertsch retweetledi

Physical Intelligence@physical_int·7h

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

English

10

131

1.1K

90.8K

Karl Pertsch retweetledi

Paul Zhou@zhiyuan_zhou_·11 Mar

late but: RETAIN will be presented at #ICLR2026 in Rio! The code is also out at github.com/yajatyadav/RET…, though all you really need is this one line

Paul Zhou@zhiyuan_zhou_

Do you ever find finetuning VLA overfits to the target task, to the point where generalist ability is lost and even minor deviations beyond the SFT data break the policy? We found an extremely simple solution: directly merge the base and finetuned policy in weight space 🤯 👇🧵

English

3

25

3.5K

Karl Pertsch@KarlPertsch·4 Mar

Jup, tho off the shelf VLMs today are often not well suited as HL policies for more complex tasks (many papers have shown this, they struggle with finegrained interaction understanding, failures etc) and robot fine tuned models so far need to be taught to remember explicitly. Agree tho that in the future this will hopefully be bridged

English

7

523

Ryan Punamiya@ryan_punamiya·4 Mar

@KarlPertsch Thanks for the clarification Karl, I wonder if there is room to discuss whether this could just be an added line to the base prompt to the high level VLM i.e summarize sub task history

English

0

1

1.1K

Ryan Punamiya@ryan_punamiya·4 Mar

me personally, I don’t think I would publish a paper which is very close to an ablation of a prior work (MemER). Experiments are pretty cool tho!

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English

2

102

23.2K

Karl Pertsch retweetledi

Danny Driess@DannyDriess·4 Mar

Many real-world tasks require memory to be successful. Yet, most robots don’t have any form of memory. Today, we are going to change that. We developed a system called MEM that introduces memory into VLAs on multiple scales

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English

5

12

64

5.1K

Karl Pertsch retweetledi

Marcel Torné@marceltornev·4 Mar

We equipped PI policies with memory! And taught our robots to do long-horizon real world tasks such as preparing the items for a recipe, cooking a grilled cheese and cleaning the kitchen!

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English

15

83

8.3K

Karl Pertsch retweetledi

Physical Intelligence@physical_int·4 Mar

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English

49

263

2.1K

432.5K

Karl Pertsch@KarlPertsch·4 Mar

This was one of the longest-running research projects at pi — adding memory to your models stretches all parts of your infra and needs innovation on the whole stack. The project started as @HomerWalke's internship project with @DannyDriess, but had lots of help from countless people at pi to get over the finish line. Special shoutout to @marceltornev who worked tirelessly to teach our models the long-horizon behaviors you saw in the videos above! For more details, check out our blog & paper: pi.website/research/memory

English

6

369

Karl Pertsch@KarlPertsch·4 Mar

Finally, many prior works have reported that policies get *worse* on dexterous tasks when adding memory (because of spurious correlations, causal confusion etc). We find that by equipping pi06 with MEM and training it on our most diverse data mix, we can match pi06 performance on tasks that do not require memory (while clearly outperforming on memory tasks). This is IMO one of the biggest results here: we have a recipe for adding memory to VLAs without significant tradeoffs, both in terms of latency and performance!

English

0

4

354

Karl Pertsch@KarlPertsch·4 Mar

This one has been a long time coming: today we’re introducing MEM, an approach for giving VLAs short-term and long-term memory. Memory is such an obvious capability, but adding it isn’t easy (most VLAs today are memory-less). A short thread on challenges, solutions, and the new capabilities MEM unlocks for us.

English

11

109

8.8K

Karl Pertsch@KarlPertsch·26 Şub

Use DROID data and PolaRiS sim evals to test your ideas on strong generalist policies! Congrats to the CoVer-VLA team!

Jacky Kwok@jackyk02

🧵(6) DROID Eval CoVer-VLA achieves 14% gains in task progress and 9% in success rate on the challenging red-team PolaRiS benchmark. In the pan cleaning task, π₀.₅ shows incorrect intent, grasping the pan handle. In contrast, CoVer-VLA correctly uses sponge to scrub the pan.

English

0

25

2.9K

Karl Pertsch@KarlPertsch·25 Şub

Very exciting to see first steps of our models doing useful things in the world! Thanks to Ultra and Weave for being great partners in these deployments!

General-purpose AI models are behind some of the most exciting applications we now can't live without. We envision that an analogous “physical intelligence layer” built with models like π0.6 will similarly spur a new wave of applications for the physical world. We’ve recently begun working with a handful of companies that have deployed their robots to do real-world, useful things. pi.website/blog/partner/?…

English

31

1.8K

Karl Pertsch@KarlPertsch·20 Şub

@YuXiang_IRVL Check github.com/arhanjain/sim-… for the robot+gripper model Also check polaris-evals.github.io if you want nice environments to test in!

English

How can robot policies be trained to best leverage VLMs' CoT reasoning and in-context learning for generalization? The key is Steerable Policies: vision-language-action models that can be flexibly controlled in many ways! steerable-policies.github.io 1/9

7

634

Yu Xiang@YuXiang_IRVL·20 Şub

Has anyone built a URDF for the Panda arm + Robotiq 2F-85 gripper setup used in the DROID dataset? Thanks! 🙏

English

5

4

48

4.6K

Karl Pertsch@KarlPertsch·16 Şub

Check out Will's new project! By increasing the "prompting surface" of a VLA (keypoints, language at multiple levels of abstraction) we can get VLMs to steer them much more effectively. This allows us to "import" lots of useful VLM capabilities (reasoning, in-context learning) and push overall system performance!

Will Chen@verityw_

English

2

7

75

7.5K

Karl Pertsch@KarlPertsch·6 Şub

For a bit of context: RoboArena runs real-world evals — you train a policy and host a server that receives camera images and returns robot actions, then the RoboArena system connects your policy server to real robot stations of volunteers around the world who compare your policy to others in pairwise, double blind comparisons. We aggregate all those comparisons into a global leaderboard :) Check out robo-arena.github.io/results to see the eval rollouts as they are uploaded!

English

Andrew Wagenmaker@ajwagenmaker

4

82

eigenron@eigenron·6 Şub

@dhruvsheth_ aren’t these sim evals

English

2

0

6

1.2K

eigenron@eigenron·6 Şub

robot farms where you can simply SSH into a robotic arm or an embodied system to test your VLM/VLA/robot policies. who's building this?

English

37

9

324

33.3K

Karl Pertsch@KarlPertsch·7 Oca

Also shoutout to Andrew (@ajwagenmaker) for mentoring Tony throughout this project and providing his RL expertise! :) x.com/ajwagenmaker/s…

Reliable rewards are critical for effective RL, yet in most robotic applications obtaining such rewards requires significant task-specific human effort. Can we do better? Check out RoboReward, our new generalist, language-conditioned reward model for real-world robot RL!

English

Reliable rewards are a bottleneck for real-world RL for robotics: human labels are costly, and handcrafted rewards are brittle. In RoboReward 🤖💰, we study VLMs as reward models and find they are unreliable across tasks, embodiments, and scenes. Paper: arxiv.org/abs/2601.00675

1

4

1.1K

Karl Pertsch@KarlPertsch·7 Oca

Check out Tony's new work on training and evaluating robot reward models -- accurate, scalable rewards are key for learning from online robot experience! In RoboReward we introduce a benchmark & new reward models that significantly outperform off-the-shelf VLMs in their ability to provide accurate rewards. Check out Tony's thread for all the details!

Tony Lee@tonyh_lee

English