Sabitlenmiş Tweet
Neil Mallinar
110 posts

Neil Mallinar
@nmallinar
PhD student @ UCSD, prior to that: Research Intern @ Google Research & MSR NE, Research Engineer at Pryon Inc & IBM Watson.
New York, NY Katılım Haziran 2009
659 Takip Edilen275 Takipçiler

Shout-out to my amazing coauthors @dbeagleholeCS @BusyZhu @PartheP Adit and Misha!! Check out all of their other papers they are brilliant researchers
English

The only grok I’m concerned with
Neil Mallinar@nmallinar
Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:
English

@matthistory Maybe they are going to do a heist together, or sing karaoke! I cannot wait to find out
English

@matthistory I'd like to order one duck riding on top of a horse please
English

@tacobellhoarder It’s Tito’s turn to pick a movie and he picked minions 3 rise of gru
English
Neil Mallinar retweetledi

Two generalization regimes in ICL: (1) context-scaling, where performance improves with more in-context examples, and (2) task-scaling, where performance improves with more pre-training tasks. While MLPs show task-scaling but not context-scaling,
arxiv.org/abs/2410.12783
English
Neil Mallinar retweetledi

Iterating kernel ridgeless regression with AGOP computation groks modular arithmetic… and this grokking is remarkably similar to the phenomenon in neural networks.
I found these results very surprising!
Neil Mallinar@nmallinar
Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:
English

Please check out our paper here: arxiv.org/abs/2407.20199
This was an amazing collaboration with Daniel Beaglehole (@dbeagleholeCS), Libin Zhu (@BusyZhu), Adit Radhakrishnan, Parthe Pandit (@PartheP), and Misha Belkin.
English

The relationship between the NFM and neural network AGOP has been noted in prior work. arxiv.org/abs/2212.13881
In settings where weight decay, or trace(NFM), induces grokking we find that AGOP regularization, or trace(AGOP), does the same.

English





