Joseph Amigo

12 posts

Joseph Amigo

@Jsphamigo

PhD Candidate at NYU & LAAS-CNRS - Robotics, Reinforcement Learning, Deep Learning

New York Beigetreten Şubat 2018

77 Folgt18 Follower

Joseph Amigo@Jsphamigo·17 Kas

@xing_rui12683 Meanwhile ICRA forbids using AI to reformulate your reviews…

English

600

Rising Zhang (张瑞星)@xing_rui12683·16 Kas

I check my reviews. Result in 2 moderate AI edited, 2 heavily and 1 light. Because I wrote review in Chinese and gpt helps me to translate into English. It is not a surprising result. But I think i am a responsible Reviewer :)

Graham Neubig@gneubig

ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?

English

155

42.1K

Joseph Amigo@Jsphamigo·6 Eyl

@Rk4342R We’re in the process of cleaning up the code and pushing it to GitHub. I believe it should be available by the end of next week.

English

128

Joseph Amigo@Jsphamigo·6 Eyl

Introducing our new work DMO: Decoupled Model-based policy Optimization! First-order gradient RL that unrolls trajectories with high-fidelity sims & computes gradients via learned models. Paper & demos: machines-in-motion.github.io/DMO/ #CoRL2025 w/ @Rk4342R

English

Joseph Amigo@Jsphamigo·6 Eyl

@KyleMorgenstein @Rk4342R Reducing the size of the replay buffer generally impeded learning (however, for the Go2 walking exp, it was beneficial to have 1e5 instead of 1e6).

English

Kyle🤖🚀🦭@KyleMorgenstein·6 Eyl

@Jsphamigo @Rk4342R Did you try training the forward model without the replay buffer? I’ve trained forward models purely online with RL policies before but usually with far far more envs so larger batch size.

English

Joseph Amigo@Jsphamigo·6 Eyl

@KyleMorgenstein @Rk4342R I see, very interesting!

English

Kyle🤖🚀🦭@KyleMorgenstein·6 Eyl

@Jsphamigo @Rk4342R yes, that’s my labs interest in it! Inspired by this + RSLs recent work with diff physics, we’re working on formulating a standard set of differentiable rewards that are less restrictive than eg raibert heuristic tracking that still transfers well to hardware

English

Joseph Amigo@Jsphamigo·6 Eyl

@KyleMorgenstein @Rk4342R We generally used 4 mini/batches.

English

Kyle🤖🚀🦭@KyleMorgenstein·6 Eyl

@Jsphamigo @Rk4342R How did you scale mini/batch size with the number of environments? starting with a vanilla implementation but then trying to explicitly optimize for time efficiency.

English

Joseph Amigo@Jsphamigo·6 Eyl

@KyleMorgenstein @Rk4342R "coming from PPO it’s shocking to see so few get such good performance" -> the price, however, for now, is the need for a differentiable reward function.

English

Kyle🤖🚀🦭@KyleMorgenstein·6 Eyl

@Jsphamigo @Rk4342R Thanks for the pointer! I was wondering where some of the choices came from like adamw for sapo. what is the bottleneck for number of parallel envs, vram? coming from PPO it’s shocking to see so few get such good performance lol

English

Joseph Amigo@Jsphamigo·6 Eyl

@KyleMorgenstein @Rk4342R If my speculation is correct, I believe so!

English

Kyle🤖🚀🦭@KyleMorgenstein·6 Eyl

@Jsphamigo @Rk4342R I see, but with a larger diversity of trajectories there may still be gains to be had with scaling batch size

English

Joseph Amigo@Jsphamigo·6 Eyl

@KyleMorgenstein @Rk4342R Thank you! For the value function, we use regular TD-lambda. For the dynamics model, the std for each feature in the obs is different. Depending on the task, I also recommend using design choices I to V of the "4.2 Design Choices" section in arxiv.org/pdf/2412.12089.

English

Kyle🤖🚀🦭@KyleMorgenstein·6 Eyl

@Jsphamigo @Rk4342R Nice work! For learning the value function, do you just do regular TD-lambda stuff over the rollout states? And for the dynamics model, do you use a single std expanded to the obs size or is the std for each row in the obs different?

English

122

Joseph Amigo retweetet

C's Robotics Paper Notes@RoboReading·4 Eyl

First Order Model-Based RL through Decoupled Backpropagation (DMO) machines-in-motion.github.io/DMO/ Simulation rollouts, learned model for first order optimization

English

109

6.5K

Entdecken

@xing_rui12683 @Rk4342R @KyleMorgenstein @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates