Nicholas Lourie

55 posts

Nicholas Lourie

@NickLourie

Better empirical methods for deep learning. PhD at @nyuniversity (@CILVRatNYU). Advised by @kchonyc and @hhexiy. Prev: @allen_ai. I build things. 🤖

New York, NY Katılım Mart 2014

818 Takip Edilen1.5K Takipçiler

Sabitlenmiş Tweet

Nicholas Lourie@NickLourie·8 Eki

LLMs are expensive—experiments cost a lot, mistakes even more. How do you make experiments cheap and reliable? By using hyperparameters' empirical structure. @kchonyc, @hhexiy, and I show you how in Hyperparameter Loss Surfaces Are Simple Near their Optima at #COLM2025! 🧵1/9

GIF

English

14.1K

Nicholas Lourie retweetledi

Michael Hu@michahu8·28 Oca

if you truly believe in the bitter lesson, then why hand design scaling laws? introducing: neural neural scaling laws (NeuNeu), a neural network - trained on open-source LM trajectories - that predicts LMs' future downstream task performance 🧵👇

English

205

19.3K

Nicholas Lourie retweetledi

NYU Center for Data Science@NYUDataScience·23 Oca

Tuning AI models no longer needs to rely on expensive guesswork. Courant PhD student @NickLourie, CDS Professor @kchonyc, and CDS Associate Professor @hhexiy reveal an important new statistical tool to estimate a model's best possible performance. nyudatascience.medium.com/taking-the-gue…

English

859

Nicholas Lourie retweetledi

He He@hhexiy·8 Eki

Come to Nick's poster if you're at #COLM2025 and learn about how to run LLM experiments the scientific way!

Nicholas Lourie@NickLourie

English

9.1K

Nicholas Lourie@NickLourie·8 Eki

@hbouammar @kchonyc @hhexiy Glad to hear! Hit me up if you have any questions. 😄

English

Haitham Bou Ammar@hbouammar·8 Eki

@NickLourie @kchonyc @hhexiy Very cool! Added on my to-read :D

English

Nicholas Lourie@NickLourie·8 Eki

GIF

English

14.1K

Nicholas Lourie@NickLourie·8 Eki

If you're at #COLM2025, come say hi! We're presenting as Poster 67 at Poster Session 4 this afternoon!

English

292

Nicholas Lourie@NickLourie·8 Eki

Deep learning is an empirical science, its progress depends on empirical tools. We hope these tools help make progress in your research! Get them with just a `pip install opda`. paper: arxiv.org/abs/2510.02721… code: github.com/nicholaslourie… docs: nicholaslourie.github.io/opda/ 🧵9/9

English

345

Nicholas Lourie@NickLourie·3 Tem

Thanks for the references!😁 This gets at the heart of our message: even for a fixed task, sometimes downstream scaling is predictable, other times it isn't, and we don't know why. What factors in your experiment made scaling laws work? It's a question we should try to answer.

English

Will Held@WilliamBarrHeld·3 Tem

@NickLourie @michahu8 @kchonyc I'm maybe a little skeptical we can, since there are works which have non-emergent MMLU 5-shot scaling w.r.t. compute. One my own: arxiv.org/abs/2501.11747 One not my own: arxiv.org/abs/2503.10061

English

Michael Hu@michahu8·2 Tem

📢 today's scaling laws often don't work for predicting downstream task performance. For some pretraining setups, smooth and predictable scaling is the exception, not the rule. a quick read about scaling law fails: 📜arxiv.org/abs/2507.00885 🧵1/5👇

English

276

30.3K

Nicholas Lourie@NickLourie·3 Tem

Interesting question! Switching from loss to compute would change the curve's shape, but it wouldn't make it linear (and easy to extrapolate). The curve's derivative is discontinuous at the breaks, and smoothly changing the x-axis preserves that discontinuity since (fg)' (x) = f'(g(x)) g'(x) and a discontinuous function times anything besides zero is still discontinuous at that point.

English

Will Held@WilliamBarrHeld·3 Tem

@NickLourie @michahu8 @kchonyc Can we interpret this figure as showing emergence since the X axis is pretraining loss and not compute? Kaplan shows pretraining loss scales as a power law in terms of compute, so a small shift in that X-axis could be a large shift in compute space.

English

Nicholas Lourie@NickLourie·3 Tem

If we could understand when and why perplexity captures downstream performance, then it would be a powerful tool indeed. When the context allows it, we could compare language models on perplexity alone---without the need to run difficult, downstream evaluations.

English

142

Nicholas Lourie@NickLourie·3 Tem

A standard approach has yet to emerge (a great area for research!) Task-specific losses are interesting, we share a few papers on them in our related work. Still, a task-agnostic loss has one big advantage: it gives one number to compare LLMs, regardless of the downstream task.

English

145

Nicholas Lourie@NickLourie·3 Tem

@WilliamBarrHeld @michahu8 @kchonyc Even with continuous metrics, there are stubbornly emergent phenomena. For example, this figure from arxiv.org/abs/2411.16035. Scaling shows several structural breaks even when you look at a continuous metric like P(correct answer). It's a tough problem, but we're making progress!

English

180

Nicholas Lourie@NickLourie·3 Tem

@WilliamBarrHeld @michahu8 @kchonyc Great question. 🙂 We only looked at downstream scaling laws in terms of pretraining loss, and a big conclusion is that we need more work like this! I'd guess that it'll take a few tricks to make downstream scaling laws reliable and intermediate task losses could certainly one.

English

206

Keşfet

@kchonyc @hhexiy @hbouammar @michahu8 @WilliamBarrHeld @elonmusk @BarackObama @taylorswift13