Felix Dangel

18 posts

Felix Dangel

Felix Dangel

@f_dangel

Assistant professor at @Concordia and @Mila_Quebec.

Toronto Katılım Ağustos 2021
83 Takip Edilen220 Takipçiler
Felix Dangel retweetledi
Weight Space Symmetries @ ICML 2026
📢Excited to announce the Workshop on Weight-Space Symmetries @icmlconf! We welcome 4-page submissions analysing symmetries, their effects on training and model structure, and practical methods to utilize them. Submission Deadline: April 24 (23:59 AoE) #ICML2026
Weight Space Symmetries @ ICML 2026 tweet media
English
2
36
53
19.1K
Felix Dangel retweetledi
Runa Eschenhagen
Runa Eschenhagen@runame_·
1/14 Is Muon “better” than Shampoo? We argue that their relationship parallels Adam's relationship with Signum. Analogous to @lukas_balles and Hennig’s (2018) decomposition of Adam into element-wise scaled Signum, we can decompose Shampoo as left- and right-adapted Muon.
Runa Eschenhagen tweet media
English
3
45
262
31.9K
Felix Dangel retweetledi
Wu Lin
Wu Lin@LinYorker·
Within an information-geometric framework, we reconnect Shampoo/SOAP with both classical quasi-Newton ideas and Gaussian whitening, and develop practical methods that naturally handle tensor-valued weights in language model pre-training. arxiv.org/abs/2509.03378 opt-ml workshop
Wu Lin tweet media
English
1
7
8
1K
Felix Dangel
Felix Dangel@f_dangel·
🚀 [NeurIPS 2025] jet-for-pytorch (github.com/f-dangel/torch…) is live! From our paper "Collapsing Taylor Mode AD": 🔹 Implements Taylor mode for PyTorch 🔹 Adds collapsing → speedup and memory reduction for PDE operators like the Laplacian Talk to me #NeurIPS or Tim #EurIPS!
English
0
2
5
421
Felix Dangel
Felix Dangel@f_dangel·
🎓 Looking for MSc or PhD opportunities in Machine Learning for Fall 2026? Join my group at @Concordia and @Mila_Quebec! 🔍 Focus: autodiff, second-order optimization, and Hessian-based methods for LLMs & scientific ML. 📅 Apply by Dec 1: mila.quebec/en/prospective…
English
1
16
43
9K
Felix Dangel retweetledi
Bruno Mlodozeniec
Bruno Mlodozeniec@brunorganised·
I would highly recommend using this library for any research on influence functions. Implementing scalable IFs (usually ≡ K-FAC) is a massive pain, especially for modern architectures. With curvlinops, getting plots like the below for diffusion models is relatively easy
Bruno Mlodozeniec tweet media
Runa Eschenhagen@runame_

1/6 Hessian approximations are ubiquitous in deep learning, but working with them can get quite involved. We argue for using a linear operator interface for neural network curvature matrices and implement this in PyTorch in our library curvlinops. arxiv.org/abs/2501.19183/

English
1
3
7
762
Felix Dangel retweetledi
Runa Eschenhagen
Runa Eschenhagen@runame_·
1/6 Hessian approximations are ubiquitous in deep learning, but working with them can get quite involved. We argue for using a linear operator interface for neural network curvature matrices and implement this in PyTorch in our library curvlinops. arxiv.org/abs/2501.19183/
Runa Eschenhagen tweet media
English
4
29
217
16K
Felix Dangel retweetledi
Weronika Ormaniec
Weronika Ormaniec@wormaniec·
Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer components make its loss landscape unique? With @unregularized & @f_dangel, we explore this via the Hessian in our #ICLR2025 spotlight paper! Key insights👇 1/8
Weronika Ormaniec tweet media
English
1
8
26
2.9K
Felix Dangel retweetledi
Wu Lin
Wu Lin@LinYorker·
#ICML2024 Can We Remove the Square-Root in Adaptive Methods? arxiv.org/abs/2402.03496 Root-free (RF) methods are better on CNNs and competitive on Transformers compared to root-based methods (AdamW) Removing the root makes matrix methods faster: Root-free Shampoo in BFloat16 /1
Wu Lin tweet media
English
9
16
60
12.6K
Nikos Bosse
Nikos Bosse@nikosbosse·
My personal hero at the Ellis Machine Learning Symposium printed his poster on a beach towel so he could keep using it afterwards. Absolute genius. PI material.
Nikos Bosse tweet media
English
2
8
102
0
Felix Dangel
Felix Dangel@f_dangel·
Which plane would you board? [#NeurIPS2021] Cockpit: Practical trouble-shooting of DNN training. Empowered by recent advances in autodiff. In collaboration with @frankstefansch1 & @PhilippHennig5.
Felix Dangel tweet media
Frank Schneider@frankstefansch1

📣#NeurIPS2021📄 Why are we still debugging neural nets by staring at loss curves? We present Cockpit, a visual debugger for deep learning. Joint work with @f_dangel & @PhilippHennig5 Paper: arxiv.org/abs/2102.06604 Code: github.com/f-dangel/cockp… Video: youtu.be/wQsjgx3zfkQ 🧵

English
0
1
14
0