jpizarrom

596 posts

jpizarrom

jpizarrom

@jpizarrom

Germany Katılım Ekim 2009
174 Takip Edilen184 Takipçiler
Sabitlenmiş Tweet
jpizarrom
jpizarrom@jpizarrom·
🤖 early-stage experiment finetuning SmolVLA + RECAP-style advantage signals (inspired on π*0.6 paper) as BC actor + one-step flow actor with AWR on SO100 @LeRobotHF linkedin.com/feed/update/ur… previous experiments ACFQL + HIL-SERL x.com/jpizarrom/stat… x.com/jpizarrom/stat…
jpizarrom@jpizarrom

@qiyang_li @zhiyuan_zhou_ @svlevine This policy was trained with ACFQL/QC-FQL on @LeRobotHF @huggingface on top of HIL-SERL github.com/huggingface/le… linkedin.com/posts/jpizarro…

English
3
8
51
16.3K
jpizarrom
jpizarrom@jpizarrom·
@q17224 @LeRobotHF My current experiments focus more on the feasibility of learning. The next steps are reliability.
English
0
0
1
25
jpizarrom
jpizarrom@jpizarrom·
🤖 early-stage experiment finetuning SmolVLA + RECAP-style advantage signals (inspired on π*0.6 paper) as BC actor + one-step flow actor with AWR on SO100 @LeRobotHF linkedin.com/feed/update/ur… previous experiments ACFQL + HIL-SERL x.com/jpizarrom/stat… x.com/jpizarrom/stat…
jpizarrom@jpizarrom

@qiyang_li @zhiyuan_zhou_ @svlevine This policy was trained with ACFQL/QC-FQL on @LeRobotHF @huggingface on top of HIL-SERL github.com/huggingface/le… linkedin.com/posts/jpizarro…

English
3
8
51
16.3K
jpizarrom
jpizarrom@jpizarrom·
@RohanSeelan @LeRobotHF Recently i did found this paper Refined Policy Distillation: From VLA Generalists to RL Experts arxiv.org/abs/2503.05833 Essentially, i am trying to check whether is feasible to do something similar but within ACFQL + VLA-BC finetuning with RECAP signals + one-step distillation.
English
1
0
2
62
jpizarrom
jpizarrom@jpizarrom·
@DrorIfrah @FRANKAROBOTICS @LeRobotHF Next steps focus on improving reliability and evaluation, as I was primarily focussed on feasibility of learning. Additionally, plan to try update ACFQL to support multi-task, longer-horizon scenarios. x.com/jpizarrom/stat…
jpizarrom@jpizarrom

🤖 early-stage experiment finetuning SmolVLA + RECAP-style advantage signals (inspired on π*0.6 paper) as BC actor + one-step flow actor with AWR on SO100 @LeRobotHF linkedin.com/feed/update/ur… previous experiments ACFQL + HIL-SERL x.com/jpizarrom/stat… x.com/jpizarrom/stat…

English
0
0
1
40
jpizarrom
jpizarrom@jpizarrom·
@svlevine Thanks a lot for sharing such amazing work I am implementing SmolVLA with RECAP-style advantage signal as BC Actor in ACFQL + HIL-SERL on @LeRobotHF It seems to stabilize training, allowing the one-step actor to start learning x.com/jpizarrom/stat…
jpizarrom@jpizarrom

🤖 early-stage experiment finetuning SmolVLA + RECAP-style advantage signals (inspired on π*0.6 paper) as BC actor + one-step flow actor with AWR on SO100 @LeRobotHF linkedin.com/feed/update/ur… previous experiments ACFQL + HIL-SERL x.com/jpizarrom/stat… x.com/jpizarrom/stat…

English
0
0
1
142
Sergey Levine
Sergey Levine@svlevine·
To read about π*0.6 and Recap, check out the official blog post: pi.website/blog/pistar06 The blog post also links to a full research paper about the method, as well as a model card for the π0.6 model.
English
3
8
114
9K
Sergey Levine
Sergey Levine@svlevine·
We just released results for our newest VLA from Physical Intelligence: π*0.6. This one is trained with RL, and it makes it quite a bit better: often doubles throughput, enables real-world tasks like folding real laundry and making espresso drinks at the office.
English
46
192
1.7K
297.5K
jpizarrom retweetledi
lil’km
lil’km@_lilkm_·
Thrilled to have worked on integrating the Earth Rover from @frodobots into @LeRobotHF! Everything's fully open source—dataset, software, hardware. Super excited to start training models on this massive 7K-hour dataset from 40+ cities. democratizing urban nav research for all!
FrodoBots@frodobots

We're open-sourcing our Earth Rover platform with @huggingface & @sigrobotics! 🤖 Integrated hardware (electronics, software, 3D files) with @LeRobotHF 🌎 7,000 hours of driving data from 40+ cities, curated by UC Berkeley researchers Thread ↓

English
1
2
10
7.1K
jpizarrom retweetledi
Remi Cadene
Remi Cadene@RemiCadene·
Grab your ticket to the Hardware Meetup at our event place ~Neon Noire~ A chance to meet the team at UMA! Speakers: Steven Palma (LeRobot Hugging Face), Aurélien Cord (CTO Stanley Robotics), Roch Molléro (CTO Gobano Robotics), & myself luma.com/y48iwply
Remi Cadene tweet media
Remi Cadene@RemiCadene

Humanity is at a turning point. I am launching UMA to build general-purpose mobile and humanoid robots from Europe. Proud to start with people I admired for years, and grateful for all your support! Reach out to us @UMA_Robots ❤️

English
3
20
141
26.1K
Qiyang (Colin) Li
Qiyang (Colin) Li@qiyang_li·
Excited to be in San Diego attending NeurIPS this week! Come check out our paper "Reinforcement Learning with Action Chunking" (w/ @zhiyuan_zhou_, @svlevine) in Exhibit Hall C,D,E (#415) on Thursday 12/04, 11a - 2p. x.com/qiyang_li/stat…
Qiyang (Colin) Li tweet media
Qiyang (Colin) Li@qiyang_li

Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! colinqiyangli.github.io/qc/ The recipe to achieve this is incredibly simple. 🧵 1/N

English
3
10
70
13K