Joseph Amigo

12 posts

Joseph Amigo

Joseph Amigo

@Jsphamigo

PhD Candidate at NYU & LAAS-CNRS - Robotics, Reinforcement Learning, Deep Learning

New York Beigetreten Şubat 2018
77 Folgt18 Follower
Rising Zhang (张瑞星)
Rising Zhang (张瑞星)@xing_rui12683·
I check my reviews. Result in 2 moderate AI edited, 2 heavily and 1 light. Because I wrote review in Chinese and gpt helps me to translate into English. It is not a surprising result. But I think i am a responsible Reviewer :)
Graham Neubig@gneubig

ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?

English
8
8
155
42.1K
Joseph Amigo
Joseph Amigo@Jsphamigo·
@Rk4342R We’re in the process of cleaning up the code and pushing it to GitHub. I believe it should be available by the end of next week.
English
0
0
1
128
Joseph Amigo
Joseph Amigo@Jsphamigo·
@KyleMorgenstein @Rk4342R Reducing the size of the replay buffer generally impeded learning (however, for the Go2 walking exp, it was beneficial to have 1e5 instead of 1e6).
English
1
0
0
31
Kyle🤖🚀🦭
Kyle🤖🚀🦭@KyleMorgenstein·
@Jsphamigo @Rk4342R Did you try training the forward model without the replay buffer? I’ve trained forward models purely online with RL policies before but usually with far far more envs so larger batch size.
English
1
0
0
37
Kyle🤖🚀🦭
Kyle🤖🚀🦭@KyleMorgenstein·
@Jsphamigo @Rk4342R yes, that’s my labs interest in it! Inspired by this + RSLs recent work with diff physics, we’re working on formulating a standard set of differentiable rewards that are less restrictive than eg raibert heuristic tracking that still transfers well to hardware
English
1
0
1
25
Kyle🤖🚀🦭
Kyle🤖🚀🦭@KyleMorgenstein·
@Jsphamigo @Rk4342R How did you scale mini/batch size with the number of environments? starting with a vanilla implementation but then trying to explicitly optimize for time efficiency.
English
1
0
0
17
Joseph Amigo
Joseph Amigo@Jsphamigo·
@KyleMorgenstein @Rk4342R "coming from PPO it’s shocking to see so few get such good performance" -> the price, however, for now, is the need for a differentiable reward function.
English
1
0
1
19
Kyle🤖🚀🦭
Kyle🤖🚀🦭@KyleMorgenstein·
@Jsphamigo @Rk4342R Thanks for the pointer! I was wondering where some of the choices came from like adamw for sapo. what is the bottleneck for number of parallel envs, vram? coming from PPO it’s shocking to see so few get such good performance lol
English
3
0
0
41
Kyle🤖🚀🦭
Kyle🤖🚀🦭@KyleMorgenstein·
@Jsphamigo @Rk4342R I see, but with a larger diversity of trajectories there may still be gains to be had with scaling batch size
English
1
0
0
33
Joseph Amigo
Joseph Amigo@Jsphamigo·
@KyleMorgenstein @Rk4342R Thank you! For the value function, we use regular TD-lambda. For the dynamics model, the std for each feature in the obs is different. Depending on the task, I also recommend using design choices I to V of the "4.2 Design Choices" section in arxiv.org/pdf/2412.12089.
English
1
0
1
72
Kyle🤖🚀🦭
Kyle🤖🚀🦭@KyleMorgenstein·
@Jsphamigo @Rk4342R Nice work! For learning the value function, do you just do regular TD-lambda stuff over the rollout states? And for the dynamics model, do you use a single std expanded to the obs size or is the std for each row in the obs different?
English
2
0
1
122