Sabitlenmiş Tweet

Sharing some highlights from our work on small-scale proxies for large-scale Transformer training instabilities: arxiv.org/abs/2309.14322
With fantastic collaborators @peterjliu, @Locchiu, @_katieeverett, many others (see final tweet!), @hoonkp, @jmgilmer, @skornblith!
(1/15)

English


























