Abhinav Moudgil

89 posts

Abhinav Moudgil

Abhinav Moudgil

@amoudgl

PhD student @Mila_Quebec

Katılım Temmuz 2014
394 Takip Edilen291 Takipçiler
Abhinav Moudgil
Abhinav Moudgil@amoudgl·
@Antrunt @BorisAKnyazev @ebelilov Thanks! It's the same as in standard optimizers: performance degrades smoothly as you deviate more from optimal LR. 1e-4 or 1/20x tuned AdamW LR are good starting points for Celo2. More on LR sensitivity in appendix:
Abhinav Moudgil tweet media
English
0
0
1
263
Abhinav Moudgil
Abhinav Moudgil@amoudgl·
Introducing Celo2: Towards Learned Optimization Free Lunch We show that learned optimizers can generalize to practical tasks like GPT-3 1.3B pretraining and several out-of-distribution vision/RL tasks from limited meta-training (~4.5 GPU hours)! 🧵
Abhinav Moudgil tweet media
English
3
21
99
7.7K
Abhinav Moudgil
Abhinav Moudgil@amoudgl·
Finally, I'd like to acknowledge the Google TPU research cloud program that made this research possible and sincerely thank @mrtnm @kvfrans @_chris_lu_ for their open-source contributions by releasing clean jax codebases with TPU/FSDP support.
English
1
0
4
620