Neil Mallinar

1

34

Jamie Simon@learning_mech·26 Nis

@ashleydzhang @iclr_conf oh. well I'm screwed then

English

0

71

Neil Mallinar retweetledi

Jamie Simon@learning_mech·25 Nis

coauthors + I couldn't get a poster printed in Rio in time, so here's me ad-libbing at @iclr_conf

English

11

14

314

25.5K

Neil Mallinar retweetledi

Jamie Simon@learning_mech·26 Nis

@iclr_conf shoutout to @eaboix and @nmallinar for actually doing the work that made this paper happen, and Misha Belkin for advising. paper here: arxiv.org/abs/2507.05644

English

3

10

1.7K

Neil Mallinar@nmallinar·26 Nis

@learning_mech @iclr_conf 🙏🏾🙏🏾🙏🏾 thank you Jamie & many apologies!! Looks like a fun time though!

English

1

545

Neil Mallinar retweetledi

Jamie Simon@learning_mech·24 Nis

1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics! We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics. 🔨 arxiv.org/pdf/2604.21691 🔧

English

53

293

1.5K

301.8K

Neil Mallinar@nmallinar·11 Tem

Shout-out to my amazing coauthors @dbeagleholeCS @BusyZhu @PartheP Adit and Misha!! Check out all of their other papers they are brilliant researchers

English

2

147

Neil Mallinar@nmallinar·11 Tem

Super excited to share that we have an Oral presentation for this paper next week at ICML! It will be on Tuesday at 10am (Oral 1E) in West Ballroom D, I'll be presenting 4th at 10:45am :) Our poster will be on Wednesday at 11am and I encourage you to stop by and chat!

English

3

18

1.2K

Neil Mallinar@nmallinar·4 May

@emanouks 🙏🏾❤️

QME

30

Miss Mineragua@emanouks·3 May

The only grok I’m concerned with

Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:

English

0

2

157

Neil Mallinar@nmallinar·2 May

Happy to share that we got a spotlight at ICML for this work, see y'all there!!

Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:

English

14

467

Neil Mallinar@nmallinar·5 Kas

@matthistory Maybe they are going to do a heist together, or sing karaoke! I cannot wait to find out

English

1

59

Neil Mallinar@nmallinar·5 Kas

@matthistory I'd like to order one duck riding on top of a horse please

English

0

2

79

Neil Mallinar@nmallinar·5 Kas

@emanouks @tacobellhoarder Banana

Türkçe

2

33

Miss Mineragua@emanouks·5 Kas

@tacobellhoarder It’s Tito’s turn to pick a movie and he picked minions 3 rise of gru

English

0

3

44

Miss Mineragua@emanouks·5 Kas

Threes a coven fours a crowd

English

0

2

153

Neil Mallinar retweetledi

amirhesam abedsoltan@Amirhesam_A·18 Eki

Two generalization regimes in ICL: (1) context-scaling, where performance improves with more in-context examples, and (2) task-scaling, where performance improves with more pre-training tasks. While MLPs show task-scaling but not context-scaling, arxiv.org/abs/2410.12783

English

2

3

338

Neil Mallinar@nmallinar·10 Ağu

Consider my beautiful day uninterrupted 🥲 Alas the research work calls me back

English

3

215

Neil Mallinar@nmallinar·31 Tem

@thdbui @pfau Anyway I enjoyed your paper and would love to get a chance to discuss these topics further sometime and hear more about your observations!

English

2

107

Neil Mallinar@nmallinar·31 Tem

@thdbui @pfau Another difference we see compared to grokking in low-rank settings like k-parity is that the circulant features we learn for modular arithmetic (MA) are full rank! It wasn't obvious to us that you could do MA with kernels as the MA experiments we see all use neural nets still

English

0

109

David Pfau@pfau·31 Tem

You can do grokking with kernel methods too.

Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:

English

16

2.8K

Neil Mallinar@nmallinar·31 Tem

@avishvj In a hole in the ground there lived a kernel...

English

82

Avish Vijayaraghavan@avishvj·31 Tem

@nmallinar The kernel lore deepens

English

Stat.ML Papers@StatMLPapers

0

2

98

Neil Mallinar@nmallinar·30 Tem

Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product ift.tt/mCS5hTE

English

2

15

85

18.1K

Neil Mallinar retweetledi

Daniel Beaglehole@dbeagleholeCS·30 Tem

Iterating kernel ridgeless regression with AGOP computation groks modular arithmetic… and this grokking is remarkably similar to the phenomenon in neural networks. I found these results very surprising!

Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks. Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:

English

3

31

3K

Neil Mallinar@nmallinar·30 Tem

Please check out our paper here: arxiv.org/abs/2407.20199 This was an amazing collaboration with Daniel Beaglehole (@dbeagleholeCS), Libin Zhu (@BusyZhu), Adit Radhakrishnan, Parthe Pandit (@PartheP), and Misha Belkin.

English

5

351

Neil Mallinar@nmallinar·30 Tem

In our setting, grokking appears to occur solely due to feature learning. We decouple from neural architectures and gradient-descent based optimization by using kernels equipped with feature learning through AGOP and find many of the same phenomena as observed in neural networks.

English