Daniel Kunin

81 posts

Daniel Kunin banner
Daniel Kunin

Daniel Kunin

@KuninDaniel

postdoc @UCB_MillerInst PhD @ICMEStanford creator @SeeingTheory

UC Berkeley Katılım Aralık 2020
272 Takip Edilen776 Takipçiler
Daniel Kunin retweetledi
Surya Ganguli
Surya Ganguli@SuryaGanguli·
Our new paper "Deriving neural scaling laws from the statistics of natural language" arxiv.org/abs/2602.07488 lead by @Fraccagnetta & @AllanRaventos w/ Matthieu Wyart makes a breakthrough! We can predict data-limited neural scaling law exponents from first principles using the structure of natural language itself for the very first time! If you give us two properties of your natural language dataset: 1) How conditional entropy of the next token decays with conditioning length. 2) How pairwise token correlations decay with time separation. Then we can give you the exponent of the neural scaling law (loss versus data amount) through a simple formula! The key idea is that as you increase the amount of training data, models can look further back in the past to predict, and as long as they do this well, the conditional entropy of the next token, conditioned on all tokens up to this data-dependent prediction time horizon, completely governs the loss! This gets us our simple formula for the neural scaling law!
Surya Ganguli tweet media
English
20
117
571
59.6K
Daniel Kunin retweetledi
Nina Miolane 🦋 @ninamiolane.bsky.social
Arrived #NeurIPS2025 ☀️ If you're interested in the interplay of Geometry, Topology, Algebra w/ Neuroscience & AI, I'll give 2 talks on Sunday: 🌐11:30am: The Algebra of Spatial Navigation (Groups & grid cells) 🍩1:30pm: Topological Deep Learning (Complexes & Graphs) More👇
Nina Miolane 🦋 @ninamiolane.bsky.social tweet media
English
3
25
138
12.3K
Daniel Kunin
Daniel Kunin@KuninDaniel·
Bottom line: AGF provides a unified, mechanistic account of how two-layer networks build internal structure, one feature at a time, from vanishing initialization But its an ansatz, can we prove a general conjecture? Can we extend to deeper settings? Stay tuned for new works!
English
0
1
1
613
Daniel Kunin
Daniel Kunin@KuninDaniel·
Applied to quadratic networks trained on modular addition, AGF predicts the emergence of Fourier features in decreasing Fourier-coefficient order — providing a first-principles derivation of @NeelNanda5’s observations
Daniel Kunin tweet media
English
1
1
3
573
Daniel Kunin retweetledi
Clémentine Dominé, Phd 🍊@NeurIPS
🚀 Exciting news! Our paper "From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks" has been accepted at ICLR 2025! arxiv.org/abs/2409.14623 A thread on how relative weight initialization shapes learning dynamics in deep networks. 🧵 (1/9)
Clémentine Dominé, Phd 🍊@NeurIPS tweet media
English
3
61
232
20.9K
Daniel Kunin retweetledi
TTIC
TTIC@TTIC_Connect·
Wednesday, April 9th at 11AM: TTIC's Young Researcher Seminar Series presents Daniel Kunin (@KuninDaniel) of @StanfordEng with a talk titled "Learning Mechanics of Neural Networks: Conservation Laws, Implicit Biases, and Feature Learning." Please join us in Room 530, 5th floor.
TTIC tweet media
English
0
1
3
475
Daniel Kunin retweetledi
FENG CHEN
FENG CHEN@FCHEN_AI·
1/ Our new paper: “Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning” on how to change training to better exploit test-time compute! co-led by @AllanRaventos, w/ Nan Cheng, @SuryaGanguli & @ShaulDr arxiv.org/abs/2502.07154
English
1
5
18
9K
Daniel Kunin retweetledi
Mason Kamb
Mason Kamb@MasonKamb·
Excited to finally share this work w/ @SuryaGanguli. Tl;dr: we find the first closed-form analytical theory that replicates the outputs of the very simplest diffusion models, with median pixel wise r^2 values of 90%+. arxiv.org/abs/2412.20292
Mason Kamb tweet media
English
20
149
928
152.8K
Ravid Shwartz Ziv
Ravid Shwartz Ziv@ziv_ravid·
I agree. I think the weight matrix rank is more global, providing insight into each layer's overall parameter complexity. In contrast, local rank measures the dimensionality of feature manifolds at specific input points, capturing how the network transforms and compresses information locally in the input space. It would be interesting to understand their connection
English
1
0
2
174
Ravid Shwartz Ziv
Ravid Shwartz Ziv@ziv_ravid·
1/5 🚨 Alert! Deep neural networks are secret compression masters 🚨 Our new paper, "Learning to Compress: Local Rank and Information Compression in Deep Neural Networks" reveals how they learn efficient representations during training @NiketPatel91154 arxiv.org/abs/2410.07687
Ravid Shwartz Ziv tweet media
English
18
52
349
38.8K
Daniel Kunin
Daniel Kunin@KuninDaniel·
Also, big shoutout to @yasamanbb, @CPehlevan, and @HSompolinsky for coordinating last year's 'Deep Learning from Physics and Neuroscience' program @KITP_UCSB. Our amazing team met there, and this project is a direct result of the conversations we had!
English
0
1
6
1.1K
Daniel Kunin
Daniel Kunin@KuninDaniel·
We provide empirical evidence that an unbalanced rich regime drives feature learning in deep networks, promotes interpretability of early layers in CNNs, reduces sample complexity of learning hierarchical data, and decreases time to grokking in modular arithmetic
Daniel Kunin tweet media
English
1
1
17
1.6K