Markus Heimerl (@markusheimerl) - Twitter Profili

Markus Heimerl retweetledi

An ADMM-based optimizer that outperforms state-of-the-art methods across vision models, LLMs, and GANs Nearly every deep learning model trained today relies on some variant of SGD. Adam, AdamW, Muon, Shampoo—they differ in how they estimate gradient moments or precondition updates, but they share the same theoretical baggage: bounded gradient assumptions, bounded variance, unbiased gradient estimation, and the expectation that training data are IID. When data are heterogeneous—the norm in federated and distributed settings—these assumptions break down and convergence guarantees evaporate. Shenglong Zhou and coauthors take a fundamentally different path. They build PISA, a preconditioned inexact stochastic ADMM framework that decomposes the training objective into parallelizable subproblems linked through Lagrange multipliers, then solves each inexactly using stochastic gradients and adaptive preconditioning matrices. PISA converges at a linear rate under a single assumption—Lipschitz continuity of the gradient on a bounded region—without requiring bounded variance, bounded gradients, or IID data. Among all stochastic optimizers surveyed, only PISA achieves this combination. The framework supports pluggable preconditioners, yielding two practical variants: SISA (second-moment preconditioning, analogous to Adam-style adaptive rates) and NSISA (Newton–Schulz orthogonalized momentum, inspired by Muon). Both retain the same convergence guarantees. The empirical breadth is notable. Under extreme label skew in federated learning (each client holding one class), SISA reaches 95% on MNIST where FedAvg, FedProx, and Scaffold plateau around 54%. On CIFAR-10, SISA matches or exceeds ten optimizers across ResNet-34, VGG-11, and DenseNet-121. For LLM training, NSISA opens an increasing validation loss gap over Adam, Muon, Shampoo, SOAP, and Adam-mini as model size grows from GPT2-Nano to GPT2-XL, with clear wall-clock advantages at the largest scale. On GAN training—notoriously unstable—SISA achieves the lowest FID scores on both WGAN and WGAN-GP. What makes this compelling beyond any single benchmark is the theoretical clarity. Most optimizer comparisons are purely empirical; here the convergence theory explains why the algorithm handles heterogeneous data gracefully—the ADMM decomposition naturally accommodates non-IID batches without variance reduction over full datasets. The optimization framework itself, not just the learning rate schedule or momentum scheme, can be a decisive design choice for robust training across architectures and data distributions. Paper: nature.com/articles/s4225…

English

330

94.2K

Markus Heimerl@markusheimerl·18 Kas

@NickDesnoyer Any chance you'll sell an educational kit with everything you'd need to reproduce this genetic reprogramming of flowers?

English

139

Nick Desnoyer@NickDesnoyer·17 Kas

"Technology always becomes obsolete, but a good aesthetic is, by definition, timeless" -Sheehan Quirke Preview of my flower design toolkit and its visual language.

English

214

9.9K

Markus Heimerl@markusheimerl·13 May

@iskander massive data collection; try to make model animals live as long as you can with any combination of interventions you can think of

English

475

alex rubinsteyn@iskander·13 May

Let’s dream a bit. How would you dramatically reorg biotech research / drug development to make dramatically faster progress towards curative therapies? (Don’t say “use AI”, hypothesis/candidate generation under current structure isn’t a bottleneck)

English

19.1K

Markus Heimerl@markusheimerl·7 May

so let's solve aging and after that I'm inviting everyone to have cheese cake at my place every tuesday at 3 for the next two thousand years okay?

English

122

210.6K

Markus Heimerl retweetledi

main@main_horse·11 May

@KennethCassel manipulation.csail.mit.edu

QME

1.9K

Markus Heimerl retweetledi

Physics In History@PhysInHistory·18 Ara

R. Feynman on awards and honors in science

English

684

2.7K

308.9K

Markus Heimerl

Keşfet