Abhinav Moudgil

@amoudgl

PhD student @Mila_Quebec

Katılım Temmuz 2014

394 Takip Edilen291 Takipçiler

Abhinav Moudgil@amoudgl·4d

@Antrunt @BorisAKnyazev @ebelilov Thanks! It's the same as in standard optimizers: performance degrades smoothly as you deviate more from optimal LR. 1e-4 or 1/20x tuned AdamW LR are good starting points for Celo2. More on LR sensitivity in appendix:

English

263

Antrunt@Antrunt·4d

@amoudgl @BorisAKnyazev @ebelilov Nice work! How important is the LR compared to schedule free optimizers like Prodigy or the default ones like AdamW?

English

231

Abhinav Moudgil@amoudgl·4d

Introducing Celo2: Towards Learned Optimization Free Lunch We show that learned optimizers can generalize to practical tasks like GPT-3 1.3B pretraining and several out-of-distribution vision/RL tasks from limited meta-training (~4.5 GPU hours)! 🧵

English

7.7K

Abhinav Moudgil@amoudgl·4d

Finally, I'd like to acknowledge the Google TPU research cloud program that made this research possible and sincerely thank @mrtnm @kvfrans @_chris_lu_ for their open-source contributions by releasing clean jax codebases with TPU/FSDP support.

English

620

Abhinav Moudgil@amoudgl·4d

Work done with @BorisAKnyazev and @ebelilov. If you are interested in this line of research or related topics, our lab is hiring: x.com/ebelilov/statu…

Eugene Belilovsky@ebelilov

I have open positions including Postdocs, PhD, master's students, and PhD interns. For more information eugenium.github.io/Projects/postd…

English

1.1K

Keşfet

@Antrunt @BorisAKnyazev @ebelilov @mrtnm @kvfrans @_chris_lu_ @elonmusk @BarackObama