Markus Heimerl

11 posts

Markus Heimerl banner
Markus Heimerl

Markus Heimerl

@markusheimerl

Regensburg, Germany Katılım Kasım 2023
45 Takip Edilen76 Takipçiler
Markus Heimerl retweetledi
Jorge Bravo Abad
Jorge Bravo Abad@bravo_abad·
An ADMM-based optimizer that outperforms state-of-the-art methods across vision models, LLMs, and GANs Nearly every deep learning model trained today relies on some variant of SGD. Adam, AdamW, Muon, Shampoo—they differ in how they estimate gradient moments or precondition updates, but they share the same theoretical baggage: bounded gradient assumptions, bounded variance, unbiased gradient estimation, and the expectation that training data are IID. When data are heterogeneous—the norm in federated and distributed settings—these assumptions break down and convergence guarantees evaporate. Shenglong Zhou and coauthors take a fundamentally different path. They build PISA, a preconditioned inexact stochastic ADMM framework that decomposes the training objective into parallelizable subproblems linked through Lagrange multipliers, then solves each inexactly using stochastic gradients and adaptive preconditioning matrices. PISA converges at a linear rate under a single assumption—Lipschitz continuity of the gradient on a bounded region—without requiring bounded variance, bounded gradients, or IID data. Among all stochastic optimizers surveyed, only PISA achieves this combination. The framework supports pluggable preconditioners, yielding two practical variants: SISA (second-moment preconditioning, analogous to Adam-style adaptive rates) and NSISA (Newton–Schulz orthogonalized momentum, inspired by Muon). Both retain the same convergence guarantees. The empirical breadth is notable. Under extreme label skew in federated learning (each client holding one class), SISA reaches 95% on MNIST where FedAvg, FedProx, and Scaffold plateau around 54%. On CIFAR-10, SISA matches or exceeds ten optimizers across ResNet-34, VGG-11, and DenseNet-121. For LLM training, NSISA opens an increasing validation loss gap over Adam, Muon, Shampoo, SOAP, and Adam-mini as model size grows from GPT2-Nano to GPT2-XL, with clear wall-clock advantages at the largest scale. On GAN training—notoriously unstable—SISA achieves the lowest FID scores on both WGAN and WGAN-GP. What makes this compelling beyond any single benchmark is the theoretical clarity. Most optimizer comparisons are purely empirical; here the convergence theory explains why the algorithm handles heterogeneous data gracefully—the ADMM decomposition naturally accommodates non-IID batches without variance reduction over full datasets. The optimization framework itself, not just the learning rate schedule or momentum scheme, can be a decisive design choice for robust training across architectures and data distributions. Paper: nature.com/articles/s4225…
Jorge Bravo Abad tweet media
English
7
47
330
94.2K
Markus Heimerl
Markus Heimerl@markusheimerl·
@NickDesnoyer Any chance you'll sell an educational kit with everything you'd need to reproduce this genetic reprogramming of flowers?
English
1
0
2
139
Nick Desnoyer
Nick Desnoyer@NickDesnoyer·
"Technology always becomes obsolete, but a good aesthetic is, by definition, timeless" -Sheehan Quirke Preview of my flower design toolkit and its visual language.
Nick Desnoyer tweet media
English
6
36
214
9.9K
Markus Heimerl
Markus Heimerl@markusheimerl·
@iskander massive data collection; try to make model animals live as long as you can with any combination of interventions you can think of
English
0
0
5
475
alex rubinsteyn
alex rubinsteyn@iskander·
Let’s dream a bit. How would you dramatically reorg biotech research / drug development to make dramatically faster progress towards curative therapies? (Don’t say “use AI”, hypothesis/candidate generation under current structure isn’t a bottleneck)
English
38
11
95
19.1K
Markus Heimerl
Markus Heimerl@markusheimerl·
so let's solve aging and after that I'm inviting everyone to have cheese cake at my place every tuesday at 3 for the next two thousand years okay?
English
5
10
122
210.6K
Markus Heimerl retweetledi
Physics In History
Physics In History@PhysInHistory·
R. Feynman on awards and honors in science
English
72
684
2.7K
308.9K