Neil Mallinar

110 posts

Neil Mallinar banner
Neil Mallinar

Neil Mallinar

@nmallinar

PhD student @ UCSD, prior to that: Research Intern @ Google Research & MSR NE, Research Engineer at Pryon Inc & IBM Watson.

New York, NY انضم Haziran 2009
659 يتبع275 المتابعون
Neil Mallinar
Neil Mallinar@nmallinar·
Super excited to share that we have an Oral presentation for this paper next week at ICML! It will be on Tuesday at 10am (Oral 1E) in West Ballroom D, I'll be presenting 4th at 10:45am :) Our poster will be on Wednesday at 11am and I encourage you to stop by and chat!
Neil Mallinar tweet media
English
1
3
18
1.2K
Neil Mallinar
Neil Mallinar@nmallinar·
@matthistory Maybe they are going to do a heist together, or sing karaoke! I cannot wait to find out
English
0
0
1
59
Miss Mineragua
Miss Mineragua@emanouks·
Threes a coven fours a crowd
English
1
0
2
152
Neil Mallinar أُعيد تغريده
amirhesam abedsoltan
amirhesam abedsoltan@Amirhesam_A·
Two generalization regimes in ICL: (1) context-scaling, where performance improves with more in-context examples, and (2) task-scaling, where performance improves with more pre-training tasks. While MLPs show task-scaling but not context-scaling, arxiv.org/abs/2410.12783
English
1
2
3
331
Neil Mallinar
Neil Mallinar@nmallinar·
Consider my beautiful day uninterrupted 🥲 Alas the research work calls me back
Neil Mallinar tweet media
English
0
0
3
209
Neil Mallinar
Neil Mallinar@nmallinar·
@thdbui @pfau Anyway I enjoyed your paper and would love to get a chance to discuss these topics further sometime and hear more about your observations!
English
0
0
2
107
Neil Mallinar
Neil Mallinar@nmallinar·
@thdbui @pfau Another difference we see compared to grokking in low-rank settings like k-parity is that the circulant features we learn for modular arithmetic (MA) are full rank! It wasn't obvious to us that you could do MA with kernels as the MA experiments we see all use neural nets still
English
1
0
0
109
Neil Mallinar
Neil Mallinar@nmallinar·
@avishvj In a hole in the ground there lived a kernel...
English
0
0
0
82
Neil Mallinar أُعيد تغريده
Daniel Beaglehole
Daniel Beaglehole@dbeagleholeCS·
Iterating kernel ridgeless regression with AGOP computation groks modular arithmetic… and this grokking is remarkably similar to the phenomenon in neural networks. I found these results very surprising!
Neil Mallinar@nmallinar

Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:

English
1
3
31
3K
Neil Mallinar
Neil Mallinar@nmallinar·
In our setting, grokking appears to occur solely due to feature learning. We decouple from neural architectures and gradient-descent based optimization by using kernels equipped with feature learning through AGOP and find many of the same phenomena as observed in neural networks.
English
1
0
3
233
Neil Mallinar
Neil Mallinar@nmallinar·
By the relation between circulant matrices and the Discrete Fourier Transform, we theoretically show that a quadratic kernel equipped with circulant features implements the same generalizing solution as neural networks - the Fourier Multiplication Algorithm found in prior work.
English
1
1
1
219
Neil Mallinar
Neil Mallinar@nmallinar·
The relationship between the NFM and neural network AGOP has been noted in prior work. arxiv.org/abs/2212.13881 In settings where weight decay, or trace(NFM), induces grokking we find that AGOP regularization, or trace(AGOP), does the same.
Neil Mallinar tweet media
English
1
0
2
208
Neil Mallinar
Neil Mallinar@nmallinar·
As before, initializing features in the neural network using a random circulant dramatically reduces the time-to-generalization.
Neil Mallinar tweet media
English
1
0
2
209
Neil Mallinar
Neil Mallinar@nmallinar·
We additionally find that instantiating features using random circulant matrices leads to generalization in standard Gaussian and quadratic kernels, suggesting that no additional structure beyond a general, asymmetric circulant is necessary to solve modular arithmetic.
Neil Mallinar tweet media
English
1
0
2
412
Neil Mallinar
Neil Mallinar@nmallinar·
The progress measures of circulant deviation and AGOP alignment tend to steadily improve in the early iterations of neural networks as well, suggesting that feature learning is taking place in spite of unchanging test loss and accuracy.
Neil Mallinar tweet media
English
1
0
1
253