
Jeremy Cohen
1.2K posts

Jeremy Cohen
@deepcohen
Research fellow at Flatiron Institute, working on understanding optimization in deep learning. Previously: PhD in machine learning at Carnegie Mellon.
New York, NY Katılım Eylül 2011
987 Takip Edilen6.1K Takipçiler
Sabitlenmiş Tweet

Part 1: How does gradient descent work?
centralflows.github.io/part1/
Part 2: A simple adaptive optimizer
centralflows.github.io/part2/
Part 3: How does RMSProp work?
centralflows.github.io/part3/
English
Jeremy Cohen retweetledi

I have spent 4 years making LLMs generalize better without more data or compute. I'm looking for a Research role in industry. Here's what I've built:
1/ Early Weight Averaging → First paper (2023) to apply weight averaging during LM pre-training. Now widely used in many pre-training pipelines. arxiv.org/abs/2306.03241
2/ Attention Collapse → Diagnosed attention collapse in LLMs and proposed a training fix.arxiv.org/abs/2404.08634
3/ Curriculum Finetuning → Upweight easy samples and downweight hard ones during finetuning to reduce forgetting. arxiv.org/abs/2502.02797
I am a PhD student at UT Austin. I have interned at DeepMind, LightningAI, and Amazon Alexa.
If you're hiring or know someone who is, please DM or email (sanyal.sunny@utexas.edu).
Web: sites.google.com/view/sunnysany…
#MachineLearning #LLM #NLP #PhD #AIJobs #OpenToWork
GIF
English

Jeremy Cohen retweetledi
Jeremy Cohen retweetledi
Jeremy Cohen retweetledi

Sharing our recent work on understanding the mechanisms underlying the empirical success of hyperparameter transfer using μP! (1/11)
with Denny Wu and @albertobietti

English
Jeremy Cohen retweetledi

1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence
arxiv.org/abs/2601.03220
with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils

English

@cjmaddison @tylerfarghly I agree that theory will probably never give us a closed form expression for the test error of resnet-50 on ImageNet, or eliminate all hyperpameters from deep learning, if that’s what is meant by “the big things”
English

@cjmaddison @tylerfarghly IMO, theory could give us a *language for reasoning* about deep learning. Even with good theory, you’d probably still have to run some experiments, but much fewer than we do now, since you’d learn much more from each one.
English

@mj_theory We aspired to meet this criterion in our the research that we wrote up here: arxiv.org/abs/2410.24206.
English

@deepcohen Do you have examples of deep learning theory research that satisfy this criterion? If not, what specific directions do you have in mind?
English
Jeremy Cohen retweetledi

I'm hiring a Student Researcher to work on scaling laws at Google DeepMind! Project is for 16 weeks, starting spring/summer '26, in-person in SF (pic from the amazing office). If you're interested, fill out this form: forms.gle/MsgPfJumTLLobN…

English
Jeremy Cohen retweetledi

Presenting our NeurIPS poster today (11am, #4111): a unified view of saddle-to-saddle dynamics and how neural nets learn modular addition
With @giovannimarchet & @FCHEN_AI Dhruva Karkada Jamie Simon Mike DeWeese @SuryaGanguli @ninamiolane
Paper: arxiv.org/abs/2506.06489

English

This is a cultural issue. Historically, the field of optimization views itself as a subfield of mathematics, and is only willing to use an idea if it can be justified rigorously from first principles. This is a totally unworkable attitude when it comes to deep learning.
Aaron Defazio@aaron_defazio
It’s still common belief that the loss landscapes of neural networks are too non-convex for averaging of model weights to work. Meanwhile, the empirical evidence is overwhelming. It works! There is a whole tutorial on it this year. Adapt your beliefs or fall behind.
English
Jeremy Cohen retweetledi

In Science of Scaling we will focus on three pillars: understanding LLM training dynamics at scale, the role of real and synthetic data, and the science of RL. I am especially excited to pursue this mission together with @MishaLaskin and @real_ioannis at Reflection.
I am building a small, high trust team that cares deeply about open research, careful measurement, and engineering excellence. If you are interested in the science of pretraining, data, and RL at scale and want to help push the frontier with a focused, tight knit group, my DMs are open. I will also be at NeurIPS this week (calendly.com/b-ghorbani-bg/…).
English

