Neil Mallinar

110 posts

Neil Mallinar banner
Neil Mallinar

Neil Mallinar

@nmallinar

PhD student @ UCSD, prior to that: Research Intern @ Google Research & MSR NE, Research Engineer at Pryon Inc & IBM Watson.

New York, NY Katılım Haziran 2009
659 Takip Edilen275 Takipçiler
Sabitlenmiş Tweet
Neil Mallinar
Neil Mallinar@nmallinar·
Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:
Neil Mallinar tweet media
English
2
15
85
18K
Neil Mallinar
Neil Mallinar@nmallinar·
Super excited to share that we have an Oral presentation for this paper next week at ICML! It will be on Tuesday at 10am (Oral 1E) in West Ballroom D, I'll be presenting 4th at 10:45am :) Our poster will be on Wednesday at 11am and I encourage you to stop by and chat!
Neil Mallinar tweet media
English
1
3
18
1.2K
Neil Mallinar
Neil Mallinar@nmallinar·
Happy to share that we got a spotlight at ICML for this work, see y'all there!!
English
0
0
14
461
Neil Mallinar
Neil Mallinar@nmallinar·
@matthistory Maybe they are going to do a heist together, or sing karaoke! I cannot wait to find out
English
0
0
1
59
Miss Mineragua
Miss Mineragua@emanouks·
Threes a coven fours a crowd
English
1
0
2
152
Neil Mallinar retweetledi
amirhesam abedsoltan
amirhesam abedsoltan@Amirhesam_A·
Two generalization regimes in ICL: (1) context-scaling, where performance improves with more in-context examples, and (2) task-scaling, where performance improves with more pre-training tasks. While MLPs show task-scaling but not context-scaling, arxiv.org/abs/2410.12783
English
1
2
3
331
Neil Mallinar
Neil Mallinar@nmallinar·
Consider my beautiful day uninterrupted 🥲 Alas the research work calls me back
Neil Mallinar tweet media
English
0
0
3
209
Neil Mallinar
Neil Mallinar@nmallinar·
@thdbui @pfau Anyway I enjoyed your paper and would love to get a chance to discuss these topics further sometime and hear more about your observations!
English
0
0
2
107
Neil Mallinar
Neil Mallinar@nmallinar·
@thdbui @pfau Another difference we see compared to grokking in low-rank settings like k-parity is that the circulant features we learn for modular arithmetic (MA) are full rank! It wasn't obvious to us that you could do MA with kernels as the MA experiments we see all use neural nets still
English
1
0
0
109
@·
You can do grokking with kernel methods too.
English
1
1
16
2.8K
@·
@nmallinar The kernel lore deepens
English
1
0
2
98
Neil Mallinar
Neil Mallinar@nmallinar·
Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:
Neil Mallinar tweet media
English
2
15
85
18K
Neil Mallinar retweetledi
Daniel Beaglehole
Daniel Beaglehole@dbeagleholeCS·
Iterating kernel ridgeless regression with AGOP computation groks modular arithmetic… and this grokking is remarkably similar to the phenomenon in neural networks. I found these results very surprising!
Neil Mallinar@nmallinar

Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:

English
1
3
31
3K
Neil Mallinar
Neil Mallinar@nmallinar·
In our setting, grokking appears to occur solely due to feature learning. We decouple from neural architectures and gradient-descent based optimization by using kernels equipped with feature learning through AGOP and find many of the same phenomena as observed in neural networks.
English
1
0
3
233
Neil Mallinar
Neil Mallinar@nmallinar·
By the relation between circulant matrices and the Discrete Fourier Transform, we theoretically show that a quadratic kernel equipped with circulant features implements the same generalizing solution as neural networks - the Fourier Multiplication Algorithm found in prior work.
English
1
1
1
219
Neil Mallinar
Neil Mallinar@nmallinar·
The relationship between the NFM and neural network AGOP has been noted in prior work. arxiv.org/abs/2212.13881 In settings where weight decay, or trace(NFM), induces grokking we find that AGOP regularization, or trace(AGOP), does the same.
Neil Mallinar tweet media
English
1
0
2
208
Neil Mallinar
Neil Mallinar@nmallinar·
As before, initializing features in the neural network using a random circulant dramatically reduces the time-to-generalization.
Neil Mallinar tweet media
English
1
0
2
209
Neil Mallinar
Neil Mallinar@nmallinar·
We additionally find that instantiating features using random circulant matrices leads to generalization in standard Gaussian and quadratic kernels, suggesting that no additional structure beyond a general, asymmetric circulant is necessary to solve modular arithmetic.
Neil Mallinar tweet media
English
1
0
2
412
Neil Mallinar
Neil Mallinar@nmallinar·
The progress measures of circulant deviation and AGOP alignment tend to steadily improve in the early iterations of neural networks as well, suggesting that feature learning is taking place in spite of unchanging test loss and accuracy.
Neil Mallinar tweet media
English
1
0
1
253