Katie Everett

89 posts

Katie Everett

Katie Everett

@_katieeverett

Machine learning researcher @GoogleDeepMind + PhD student @MIT. Opinions are my own.

Katılım Ağustos 2013
632 Takip Edilen2.6K Takipçiler
Katie Everett retweetledi
Poetiq
Poetiq@poetiq_ai·
Poetiq has officially shattered the ARC-AGI-2 SOTA 🚀 @arcprize has officially verified our results: - 54% Accuracy – first to break the 50% barrier! - $30.57 / problem – less than half the cost of the previous best! We are now #1 on the leaderboard for ARC-AGI-2!
Poetiq tweet media
English
111
261
2.4K
471.5K
Katie Everett
Katie Everett@_katieeverett·
@damien_ferbach @cypaquette @poseypaquet @gauthier_gidel Arxiv refs: * Hestness et al 2017: 1712.00409 * Kaplan et al 2020: 2001.08361 * Shen et al 2024: 2406.16690 * Beck et al 2024: 2405.04517 * Bahri et al 2021: 2102.06701 * Sorscher et al 2022: 2206.14486 * Brandfonbrener et al 2024: 2411.12925
Dansk
2
2
28
7K
Katie Everett
Katie Everett@_katieeverett·
1. We often observe power laws between loss and compute: loss = a * flops ^ b + c 2. Models are rapidly becoming more efficient, i.e. use less compute to reach the same loss But: which innovations actually change the exponent in the power law (b) vs change only the constant (a)?
English
16
92
626
246K
Katie Everett retweetledi
Damien Ferbach
Damien Ferbach@damien_ferbach·
It's very difficult to improve the *exponent* in scaling laws for loss vs compute, especially by changing the optimizer! Our new paper shows that scaling momentum correctly can *provably* improve the scaling exponent on a theoretical model. Empirically, it works on LSTMs too!
Damien Ferbach tweet media
English
12
58
312
55.5K
Katie Everett
Katie Everett@_katieeverett·
There were so many great replies to this thread, let's do a Part 2! For scaling laws between loss and compute, where loss = a * flops ^ b + c, which factors change primarily the constant (a) and which factors can actually change the exponent (b)? x.com/_katieeverett/…
Katie Everett@_katieeverett

1. We often observe power laws between loss and compute: loss = a * flops ^ b + c 2. Models are rapidly becoming more efficient, i.e. use less compute to reach the same loss But: which innovations actually change the exponent in the power law (b) vs change only the constant (a)?

English
5
30
251
89.3K