Sabitlenmiş Tweet
Nicholas Lourie
55 posts

Nicholas Lourie
@NickLourie
Better empirical methods for deep learning. PhD at @nyuniversity (@CILVRatNYU). Advised by @kchonyc and @hhexiy. Prev: @allen_ai. I build things. 🤖
New York, NY Katılım Mart 2014
818 Takip Edilen1.5K Takipçiler
Nicholas Lourie retweetledi
Nicholas Lourie retweetledi

Tuning AI models no longer needs to rely on expensive guesswork.
Courant PhD student @NickLourie, CDS Professor @kchonyc, and CDS Associate Professor @hhexiy reveal an important new statistical tool to estimate a model's best possible performance.
nyudatascience.medium.com/taking-the-gue…
English
Nicholas Lourie retweetledi

Come to Nick's poster if you're at #COLM2025 and learn about how to run LLM experiments the scientific way!
Nicholas Lourie@NickLourie
LLMs are expensive—experiments cost a lot, mistakes even more. How do you make experiments cheap and reliable? By using hyperparameters' empirical structure. @kchonyc, @hhexiy, and I show you how in Hyperparameter Loss Surfaces Are Simple Near their Optima at #COLM2025! 🧵1/9
English

If you're at #COLM2025, come say hi! We're presenting as Poster 67 at Poster Session 4 this afternoon!
English

Deep learning is an empirical science, its progress depends on empirical tools.
We hope these tools help make progress in your research! Get them with just a `pip install opda`.
paper: arxiv.org/abs/2510.02721…
code: github.com/nicholaslourie…
docs: nicholaslourie.github.io/opda/
🧵9/9
English

@NickLourie @michahu8 @kchonyc I'm maybe a little skeptical we can, since there are works which have non-emergent MMLU 5-shot scaling w.r.t. compute.
One my own:
arxiv.org/abs/2501.11747
One not my own:
arxiv.org/abs/2503.10061
English

📢 today's scaling laws often don't work for predicting downstream task performance. For some pretraining setups, smooth and predictable scaling is the exception, not the rule.
a quick read about scaling law fails:
📜arxiv.org/abs/2507.00885
🧵1/5👇

English

Interesting question! Switching from loss to compute would change the curve's shape, but it wouldn't make it linear (and easy to extrapolate). The curve's derivative is discontinuous at the breaks, and smoothly changing the x-axis preserves that discontinuity since (fg)' (x) = f'(g(x)) g'(x) and a discontinuous function times anything besides zero is still discontinuous at that point.
English

@NickLourie @michahu8 @kchonyc Can we interpret this figure as showing emergence since the X axis is pretraining loss and not compute?
Kaplan shows pretraining loss scales as a power law in terms of compute, so a small shift in that X-axis could be a large shift in compute space.
English

@WilliamBarrHeld @michahu8 @kchonyc Even with continuous metrics, there are stubbornly emergent phenomena. For example, this figure from arxiv.org/abs/2411.16035. Scaling shows several structural breaks even when you look at a continuous metric like P(correct answer). It's a tough problem, but we're making progress!

English

@WilliamBarrHeld @michahu8 @kchonyc Great question. 🙂 We only looked at downstream scaling laws in terms of pretraining loss, and a big conclusion is that we need more work like this! I'd guess that it'll take a few tricks to make downstream scaling laws reliable and intermediate task losses could certainly one.
English

