Neil Mallinar

115 posts

Neil Mallinar banner
Neil Mallinar

Neil Mallinar

@nmallinar

PhD student @ UCSD, prior to that: Research Intern @ Google Research & MSR NE, Research Engineer at Pryon Inc & IBM Watson.

New York, NY Katılım Haziran 2009
665 Takip Edilen277 Takipçiler
Neil Mallinar retweetledi
Jamie Simon
Jamie Simon@learning_mech·
coauthors + I couldn't get a poster printed in Rio in time, so here's me ad-libbing at @iclr_conf
Jamie Simon tweet media
English
11
14
314
25.5K
Neil Mallinar retweetledi
Jamie Simon
Jamie Simon@learning_mech·
1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics! We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics. 🔨 arxiv.org/pdf/2604.21691 🔧
Jamie Simon tweet media
English
53
293
1.5K
301.8K
Neil Mallinar
Neil Mallinar@nmallinar·
Super excited to share that we have an Oral presentation for this paper next week at ICML! It will be on Tuesday at 10am (Oral 1E) in West Ballroom D, I'll be presenting 4th at 10:45am :) Our poster will be on Wednesday at 11am and I encourage you to stop by and chat!
Neil Mallinar tweet media
English
1
3
18
1.2K
Neil Mallinar
Neil Mallinar@nmallinar·
@matthistory Maybe they are going to do a heist together, or sing karaoke! I cannot wait to find out
English
0
0
1
59
Miss Mineragua
Miss Mineragua@emanouks·
Threes a coven fours a crowd
English
1
0
2
153
Neil Mallinar retweetledi
amirhesam abedsoltan
amirhesam abedsoltan@Amirhesam_A·
Two generalization regimes in ICL: (1) context-scaling, where performance improves with more in-context examples, and (2) task-scaling, where performance improves with more pre-training tasks. While MLPs show task-scaling but not context-scaling, arxiv.org/abs/2410.12783
English
1
2
3
338
Neil Mallinar
Neil Mallinar@nmallinar·
Consider my beautiful day uninterrupted 🥲 Alas the research work calls me back
Neil Mallinar tweet media
English
0
0
3
215
Neil Mallinar
Neil Mallinar@nmallinar·
@thdbui @pfau Anyway I enjoyed your paper and would love to get a chance to discuss these topics further sometime and hear more about your observations!
English
0
0
2
107
Neil Mallinar
Neil Mallinar@nmallinar·
@thdbui @pfau Another difference we see compared to grokking in low-rank settings like k-parity is that the circulant features we learn for modular arithmetic (MA) are full rank! It wasn't obvious to us that you could do MA with kernels as the MA experiments we see all use neural nets still
English
1
0
0
109
Neil Mallinar retweetledi
Daniel Beaglehole
Daniel Beaglehole@dbeagleholeCS·
Iterating kernel ridgeless regression with AGOP computation groks modular arithmetic… and this grokking is remarkably similar to the phenomenon in neural networks. I found these results very surprising!
Neil Mallinar@nmallinar

Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:

English
1
3
31
3K
Neil Mallinar
Neil Mallinar@nmallinar·
In our setting, grokking appears to occur solely due to feature learning. We decouple from neural architectures and gradient-descent based optimization by using kernels equipped with feature learning through AGOP and find many of the same phenomena as observed in neural networks.
English
1
0
3
235