Daniel Beaglehole

230 posts

Daniel Beaglehole

@dbeagleholeCS

ML PhD @UCSanDiego. Deep learning from first principles (+ applications) Topics include xRFM, AGOP, Colonel Blotto

Katılım Ağustos 2021

555 Takip Edilen418 Takipçiler

Sabitlenmiş Tweet

Daniel Beaglehole@dbeagleholeCS·21 Şub

What is the nature of feature learning in deep networks? We propose that neural networks recover a statistic known as the average gradient outer product (AGOP). Github: github.com/aradha/recursi… arXiv: arxiv.org/abs/2212.13881

English

119

20.4K

Daniel Beaglehole@dbeagleholeCS·17 May

@KrzakalaF Nice work! We also have a Deep RFM algorithm, which you may find interesting. We show Deep RFM exhibits Deep Neural Collapse (arxiv.org/abs/2402.13728). Basically DNC is induced over depth by the linear structure captured in the top eigenvector of the AGOP here

English

Krzakala Florent@KrzakalaF·14 May

What if a theory of deep learning could be built from iterated kernel spectral methods? Feature learning, advantage of depth, emergence of concepts, convnets filters.... and a new backprop-free algorithm too! We have it all! Introducing Neural LoFi 🧵 arxiv.org/abs/2605.13612

English

385

43.3K

Daniel Beaglehole retweetledi

Patrick Shafto@patrickshafto·20 Şub

The @DARPA AIQ program! Mathematical foundations for AI evaluation. Great work by @dbeagleholeCS, Adityanarayanan Radhakrishnan, Enric Boix-Adserà, and Misha Belkin science.org/eprint/URRDNX9…

Català

2.9K

Daniel Beaglehole retweetledi

Shubhendu Trivedi@_onionesque·13 Oca

The idea of using locality has appeared from time to time in the multi-index literature. Here is a nice operationalization, using a "local expected gradient product (EGOP)"-based continuously varying index subspace. arxiv.org/abs/2601.07061

English

729

Damek@damekdavis·3 Ara

New paper studies when spectral gradient methods (e.g., Muon) help in deep learning: 1. We identify a pervasive form of ill-conditioning in DL: post-activations matrices are low-stable rank. 2. We then explain why spectral methods can perform well despite this. Long thread

English

337

98.8K

Daniel Beaglehole@dbeagleholeCS·3 Ara

@damekdavis Seems really satisfying! Does the intuition carry over to Adam? Also have you tried training kernel ridge regression?

English

Damek@damekdavis·3 Ara

# Improving training by starting SpecGD later By the way, knowing this nr / st speedup condition suggests a natural idea: run a few steps of gradient descent first before running SpecGD. In experiments, nr(G) along the GD trajectory is usually large. Testing this, starting SpecGD from iteration 3 or from the peak nr along GD, does help. In random feature models, we can actually justify this strategy. Indeed, we show that although nr(G) can be O(1) at the first step of gradient descent, it becomes Θ(d) after a single step. Moreover, it stays that large for at least Θ(d) steps (as is visible in the figure where d = 200). This means we can indeed use the two-step strategy above: 1. Take steps of gradient descent until nr(G) gets large. 2. Once it’s large, take a SpecGD (or Muon-style) step and enjoy an Ω(d) speedup over GD. So the nr > st condition seems useful in random feature models. The question is how to generalize it to MLPs and transformers.

GIF

English

1.9K

Daniel Beaglehole@dbeagleholeCS·17 Kas

See the nice benchmark from the TALENT group that produced these results: github.com/LAMDA-Tabular/…

English

161

Daniel Beaglehole@dbeagleholeCS·17 Kas

Really nice to see independent validation of xRFM on tabular data. On a benchmark of 300 datasets, xRFM outperforms all Gradient Boosted Trees - XGBoost, CatBoost, LightGBM, etc. - and is in-line with the top 1-2 neural networks (such as TabPFNv2) arxiv.org/abs/2407.00956

English

948

Daniel Beaglehole@dbeagleholeCS·17 Kas

For the base implementation you can try xRFM at: github.com/dmbeaglehole/x…. We also offer xRFM with automatic preprocessing and hyperparameter tuning through pytabkit: github.com/dholzmueller/p… (thanks @DHolzmueller)

English

121

Daniel Beaglehole@dbeagleholeCS·17 Kas

Our method combines tree-based spatial partitions with feature learning kernel machines (tabular optimized based Recursive Feature Machine models, arxiv.org/abs/2508.10053).

English

151

Daniel Beaglehole@dbeagleholeCS·23 Eki

Really cool application of steering for music generation!

Daniel Zhao@astradzhao

We found a way to steer AI music gen toward specific notes, chords, and tempos, without retraining the model or significantly sacrificing audio quality! Introducing MusicRFM 🎵 Paper: arxiv.org/abs/2510.19127 Audio: astradzhao.github.io/MusicRFMPage/ Code: github.com/astradzhao/mus… (1/5)

English

425

Damek@damekdavis·7 Eki

Update: we were able to close the gap between neural networks and reweighted kernel methods on sparse hierarchical functions with hypercube data. Interestingly the kernel methods outperform carefully tuned networks in our tests.

Damek@damekdavis

we wrote a paper about learning 'sparse' and 'hierarchical' functions with data dependent kernel methods. you just 'iteratively reweight' the coordinates by the gradients of the prediction function. typically 5 iterations suffices.

English

235

19.7K

Daniel Beaglehole@dbeagleholeCS·8 Eki

@damekdavis I'd be curious to see if other kernels from our xRFM paper can help btw. The generalized L_p^q kernels like $$k(x,z)=exp(-||x-z||_p^q)$$ for 0 < p < q improve performance quite a bit. I suspect they help learning sparse coordinates as they break rotational invariance.

English

345

Daniel Beaglehole@dbeagleholeCS·8 Eki

@damekdavis This reflects what we see in practice on tabular data across the widely-used benchmarks, where xRFM pretty clearly and consistently outperforms MLPs with tuning :) x.com/dbeagleholeCS/…

Daniel Beaglehole@dbeagleholeCS

I highly recommend trying out our new model xRFM, which scales RFM to millions of samples. xRFM is SOTA on regression and among the top few methods for classification on 300 datasets compared to 31 other methods including neural nets / trees like XGBoost, TabPFN-v2, RealMLP, etc

English

373

Daniel Beaglehole@dbeagleholeCS·22 Ağu

@edward_milsom congrats!

English

106

Edward Milsom@edward_milsom·22 Ağu

Excited to announce I'll be starting this September 2025 as a Lecturer (Assistant Professor) at the University of Bath! I will continue my research on deep learning foundations, and am open to ideas for collaborations. (Pictured: Bath. Not pictured: University of Bath)

English

2.9K

Daniel Beaglehole@dbeagleholeCS·15 Ağu

This was an amazing collaboration with @DHolzmueller, Adit Radha (aditradha.com), and Misha Belkin (misha.belkin-wang.org)

English

201

Daniel Beaglehole@dbeagleholeCS·15 Ağu

Our method combines RFMs with tree-based splits to achieve log linear scaling (basically linear but we have use median computation at every tree node) in the number of samples.

English

235

Daniel Beaglehole@dbeagleholeCS·15 Ağu

Stat.ML Papers@StatMLPapers

xRFM: Accurate, scalable, and interpretable feature learning models for tabular data ift.tt/P3pDd9f

English

1.9K

Daniel Beaglehole retweetledi

Stat.ML Papers@StatMLPapers·15 Ağu

xRFM: Accurate, scalable, and interpretable feature learning models for tabular data ift.tt/P3pDd9f

English

Daniel Beaglehole@dbeagleholeCS·11 Ağu

@vikhyatk

QME

vik@vikhyatk·11 Ağu

ok you guys have convinced me that pip is bad. trying out conda

English

183

1.1K

113.1K

Daniel Beaglehole retweetledi

Neil Mallinar@nmallinar·11 Tem

Super excited to share that we have an Oral presentation for this paper next week at ICML! It will be on Tuesday at 10am (Oral 1E) in West Ballroom D, I'll be presenting 4th at 10:45am :) Our poster will be on Wednesday at 11am and I encourage you to stop by and chat!

English

1.2K

Keşfet

@KrzakalaF @DARPA @damekdavis @DHolzmueller @edward_milsom @elonmusk @BarackObama @taylorswift13