Christophe Roux

22 posts

Christophe Roux

Christophe Roux

@chrisrx13

PhD Student in Optimization/ML, @ZuseInstitute and @TUBerlin

Katılım Nisan 2020
230 Takip Edilen43 Takipçiler
Christophe Roux retweetledi
Sebastian Pokutta
Sebastian Pokutta@spokutta·
For a decade it was open whether Frank-Wolfe's O(1/√ε) rate on strongly convex sets is tight. We show it is: Ω(1/√ε), even for a simple quadratic on a unit ball. With J. Halbey, D. Deza, @maxzimmerberlin, @chrisrx13, @b_stellato. 1/2
English
1
9
84
8.1K
Christophe Roux retweetledi
Ben Grimmer
Ben Grimmer@prof_grimmer·
There is (at least one) strange thing in accelerated convex optimization theory: Since the 80s, in unconstrained minimization by gradient methods, smoothness is known to allow a fast O(1/T^2) convergence rate by Nesterov. Nemirovski and Yudin give matching lower bounds.
English
2
7
84
11.8K
Christophe Roux retweetledi
Louis Schiekiera
Louis Schiekiera@LJS_Berlin·
🚨 New preprint available! 🚨 We test how much of an LLM's internal semantic geometry can be recovered from behavior alone. Across 8 LLMs and 17.5M trials, forced-choice tasks align with hidden-state structure much better than free association. Preprint: arxiv.org/pdf/2602.00628
Louis Schiekiera tweet mediaLouis Schiekiera tweet mediaLouis Schiekiera tweet mediaLouis Schiekiera tweet media
English
1
2
2
149
Christophe Roux retweetledi
Berkant Turan
Berkant Turan@BerkantTuran_·
Join our poster session *today* at #ICML2024 in the TF2M Workshop. Looking forward to the inspiring discussions.
Berkant Turan@BerkantTuran_

Excited to be at #ICML2024! Grzegorz Gluch, @SaiGaneshNagar1, and I will present our paper "Unified Taxonomy in AI Safety: Watermarks, Adversarial Defenses, and Transferable Attacks" at the Workshop on Theoretical Foundations of Foundation Models (TF2M). DM me to grab a coffee!☕️

English
0
2
7
325
Christophe Roux retweetledi
Max Zimmer
Max Zimmer@maxzimmerberlin·
A good time to share our #ICLR2023 paper: How I Learned to Stop Worrying and Love Retraining We explore sparsity-adaptive LR schedules and show that with proper LR care, simple pruning can outperform complex methods that 'learn' the sparsity. 📜 arxiv.org/abs/2111.00843 🧵1/n
Lucas Beyer (bl16)@giffmana

I'm not really an expert on sparsity, but I enjoy using this template, and reminding about learning-rate, whenever I can. So I will:

English
1
9
38
4.7K
Christophe Roux
Christophe Roux@chrisrx13·
Removing the assumption of bounded iterates uncovers a complex landscape of tradeoffs between oracle complexity, bounds on D, efficient computability of updates, and whether prior knowledge of the initial distance to the optimizer is needed. 7/8
Christophe Roux tweet media
English
1
0
0
39
Christophe Roux
Christophe Roux@chrisrx13·
In our paper "Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point" at #ICML2024, we examine a blind spot in the Riemannian opt literature: Most works simply 𝘢𝘴𝘴𝘶𝘮𝘦 that the iterates stay in a bounded set. This is a problem because 1/8
Christophe Roux tweet media
English
1
2
3
178
Christophe Roux retweetledi
Max Zimmer
Max Zimmer@maxzimmerberlin·
🌟 Join our Team in Berlin 🌟 We are seeking highly motivated PhD students to work on (efficient) Deep Learning, preferably with strong math/CS background and PyTorch experience. Happy to answer questions here, via DM or at #icml2024! Apply at iol.zib.de/openings! Please RT
Max Zimmer tweet media
English
5
7
14
1.8K
Christophe Roux retweetledi
Max Zimmer
Max Zimmer@maxzimmerberlin·
On my way to Vienna for #ICLR2024 with our paper "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging". We address the challenge of creating Model Soups from Sparse Neural Networks while preserving their sparsity patterns! arXiv: arxiv.org/abs/2306.16788 🧵1/n
Max Zimmer tweet media
English
2
29
116
15.1K
Yen-Huan Li
Yen-Huan Li@yenhuan_li·
Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point
arxiv.org/abs/2403.10429 Reverse em-problem based on Bregman divergence and its application to classical and quantum information theory
arxiv.org/abs/2403.09252 (2/n)
English
2
1
6
404