Jeremy Cohen

1.2K posts

Jeremy Cohen banner
Jeremy Cohen

Jeremy Cohen

@deepcohen

Research fellow at Flatiron Institute, working on understanding optimization in deep learning. Previously: PhD in machine learning at Carnegie Mellon.

New York, NY Katılım Eylül 2011
987 Takip Edilen6.1K Takipçiler
Jeremy Cohen retweetledi
Sunny Sanyal
Sunny Sanyal@SunnySanyal9·
I have spent 4 years making LLMs generalize better without more data or compute. I'm looking for a Research role in industry. Here's what I've built: 1/ Early Weight Averaging → First paper (2023) to apply weight averaging during LM pre-training. Now widely used in many pre-training pipelines. arxiv.org/abs/2306.03241 2/ Attention Collapse → Diagnosed attention collapse in LLMs and proposed a training fix.arxiv.org/abs/2404.08634 3/ Curriculum Finetuning → Upweight easy samples and downweight hard ones during finetuning to reduce forgetting. arxiv.org/abs/2502.02797 I am a PhD student at UT Austin. I have interned at DeepMind, LightningAI, and Amazon Alexa. If you're hiring or know someone who is, please DM or email (sanyal.sunny@utexas.edu). Web: sites.google.com/view/sunnysany… #MachineLearning #LLM #NLP #PhD #AIJobs #OpenToWork
GIF
English
3
10
95
20.8K
Jeremy Cohen
Jeremy Cohen@deepcohen·
Isn’t it a little ironic that this argument for why LLMs aren’t truly intelligent is based on … pattern matching?
Big Brain AI@realBigBrainAI

AMI Labs founder Yann LeCun on why LLMs are fooling us the same way AI has for decades: He argues that every generation of AI scientists has made the same mistake: confusing task performance with real intelligence. LeCun's core challenge to the current hype: "We're fooled into thinking those machines are intelligent because they can manipulate language. And we're used to the fact that people who can manipulate language very well are implicitly smart." He's clear that LLMs are useful, but being a useful tool and being intelligent are two very different things. The real insight is the historical pattern he's lived through. Since the 1950s, wave after wave of AI researchers have claimed their breakthrough was the path to human-level intelligence. Marvin Minsky. Herbert Simon. Frank Rosenblatt — who invented the perceptron, the first learning machine, in the 1950s — all predicted machines as smart as humans within a decade. "They were all wrong." LeCun has personally witnessed three of these cycles of hype and disappointment. And his verdict on the current one is blunt: "This generation with LLMs is also wrong. It's just another example of being fooled." The pattern: A new technique emerges → machines get good at specific tasks → we assume general intelligence The question worth asking: are we impressed by these tools because they're intelligent, or because they sound like they are?

English
1
0
12
1.7K
Jeremy Cohen retweetledi
Samip
Samip@industriaalist·
Introducing Q Labs, a research lab focused on solving generalization. Alongside others (SSI, Flapping Airplanes), we see data efficiency as the key problem, but we're taking an unconventional approach to solve it: a new learning algorithm approximating Solomonoff induction.
English
36
52
678
155.6K
Jeremy Cohen retweetledi
Nikhil Ghosh
Nikhil Ghosh@nikhilghosh101·
Sharing our recent work on understanding the mechanisms underlying the empirical success of hyperparameter transfer using μP! (1/11) with Denny Wu and @albertobietti
Nikhil Ghosh tweet media
English
2
32
145
18.8K
Jeremy Cohen
Jeremy Cohen@deepcohen·
@cjmaddison @tylerfarghly I agree that theory will probably never give us a closed form expression for the test error of resnet-50 on ImageNet, or eliminate all hyperpameters from deep learning, if that’s what is meant by “the big things”
English
1
0
2
785
Jeremy Cohen
Jeremy Cohen@deepcohen·
@cjmaddison @tylerfarghly IMO, theory could give us a *language for reasoning* about deep learning. Even with good theory, you’d probably still have to run some experiments, but much fewer than we do now, since you’d learn much more from each one.
English
1
0
10
808
Jeremy Cohen
Jeremy Cohen@deepcohen·
The goal of deep learning theory/science is to guide practice. But most practical questions are >1 paper away from being legitimately answered by theory. How, then, can we make progress, without access to the ideal reward signal of “does this theory give us a SOTA algorithm?” …
English
6
25
182
26.9K
Min Jae Song
Min Jae Song@mj_theory·
@deepcohen Do you have examples of deep learning theory research that satisfy this criterion? If not, what specific directions do you have in mind?
English
1
0
2
185
Jeremy Cohen
Jeremy Cohen@deepcohen·
So, we should focus on theories that can reliably predict “the small things” about deep learning, and gradually broaden the scope of what we can predict, until we have theory that can reliably predict “the big things” about deep learning too.
English
2
5
49
2.7K
Jeremy Cohen
Jeremy Cohen@deepcohen·
A lot of DL theory work gets rightfully criticized for being “postdictive” — always giving an elegant retroactive explanation for SOTA, while somehow never anticipating it. But the real issue isn’t that such theories can’t predict SOTA, it’s that they can’t predict anything.
English
1
2
52
2.5K
Jeremy Cohen retweetledi
Spencer Frei
Spencer Frei@sfrei_·
I'm hiring a Student Researcher to work on scaling laws at Google DeepMind! Project is for 16 weeks, starting spring/summer '26, in-person in SF (pic from the amazing office). If you're interested, fill out this form: forms.gle/MsgPfJumTLLobN…
Spencer Frei tweet media
English
19
69
758
73.3K
Jeremy Cohen
Jeremy Cohen@deepcohen·
People with this attitude shouldn't be working in deep learning in the first place: in the worst case, SGD could get stuck in a terrible local minimum. The fact that it won't is only known empirically. If you don't trust empirical evidence, you should stick to kernel methods.
English
0
1
29
1.1K
Jeremy Cohen
Jeremy Cohen@deepcohen·
This is a cultural issue. Historically, the field of optimization views itself as a subfield of mathematics, and is only willing to use an idea if it can be justified rigorously from first principles. This is a totally unworkable attitude when it comes to deep learning.
Aaron Defazio@aaron_defazio

It’s still common belief that the loss landscapes of neural networks are too non-convex for averaging of model weights to work. Meanwhile, the empirical evidence is overwhelming. It works! There is a whole tutorial on it this year. Adapt your beliefs or fall behind.

English
3
8
143
16.4K
Jeremy Cohen retweetledi
Behrooz Ghorbani
Behrooz Ghorbani@_ghorbani·
In Science of Scaling we will focus on three pillars: understanding LLM training dynamics at scale, the role of real and synthetic data, and the science of RL. I am especially excited to pursue this mission together with @MishaLaskin and @real_ioannis at Reflection. I am building a small, high trust team that cares deeply about open research, careful measurement, and engineering excellence. If you are interested in the science of pretraining, data, and RL at scale and want to help push the frontier with a focused, tight knit group, my DMs are open. I will also be at NeurIPS this week (calendly.com/b-ghorbani-bg/…).
English
3
3
33
3.6K
Jeremy Cohen
Jeremy Cohen@deepcohen·
I'll be at NeurIPS from Wednesday to Saturday. I'm happy to chat with anyone, regardless of whether we've met before. DM or email me to set up!
English
0
3
106
11.7K