Georgios Vlassis

30 posts

Georgios Vlassis

Georgios Vlassis

@gvlassis98

Zurich, Switzerland Katılım Eylül 2019
40 Takip Edilen9 Takipçiler
Georgios Vlassis retweetledi
Amir Joudaki
Amir Joudaki@AmirJoudaki·
Neural nets don’t just forget. Sometimes, after long training, they lose the ability to learn at all. In our #ICLR2026 poster, we model Loss of Plasticity as gradient dynamics trapped in invariant manifolds: 🔴 frozen units, 🔵 cloned units. The video makes the traps visible.
English
16
52
611
100.4K
Georgios Vlassis retweetledi
Dan Alistarh
Dan Alistarh@DAlistarh·
Speedrunning GPT-2 is now routine thanks to @karpathy. But can we speedrun GPT3-175B? We attempted to match accuracy on a <$10K budget; while we didn't quite reach it, our first results show that quality data, engineering, and native FP4 can get close. Details in 🧵
Dan Alistarh tweet media
English
4
22
170
12.4K
Saleh Ashkboos
Saleh Ashkboos@AshkboosSaleh·
Happy to share our new study on the interaction between #optimizers and #quantization! We show how optimizer choice affects quantized model quality and why outlier-based metrics (like Kurtosis and MMR) often fail to predict performance. Paper: arxiv.org/pdf/2509.23500 [1/5]
Saleh Ashkboos tweet media
English
3
8
30
5.3K
Georgios Vlassis
Georgios Vlassis@gvlassis98·
@JamesWhate89993 @_arohan_ However, when you use Shampoo, you never actually use (1), or (2). And, in practice, the behavior that you get is very different from Muon, both in terms of loss, as well as in terms of error propagation behavior (eg see my figure above)
English
0
0
1
43
Georgios Vlassis
Georgios Vlassis@gvlassis98·
@JamesWhate89993 @_arohan_ E.g. There are two conditions under which Shampoo is exactly the same as Muon. 1) If you assume the one sided version with β2=0, 2) If you use the two sided version with an exponent of 1/4 instead of 1/2.
English
1
0
0
34
James MMatrix
James MMatrix@JamesWhate89993·
Interesting result: AdamW has strong performance in terms of quantized model quality - outperforming soap/scion/muon etc. Would be interesting to verify if this observation is correct, and if it holds at larger scale, as I thought Adam trained model is harder to quantize.
Saleh Ashkboos@AshkboosSaleh

Happy to share our new study on the interaction between #optimizers and #quantization! We show how optimizer choice affects quantized model quality and why outlier-based metrics (like Kurtosis and MMR) often fail to predict performance. Paper: arxiv.org/pdf/2509.23500 [1/5]

English
1
0
2
308
Georgios Vlassis
Georgios Vlassis@gvlassis98·
@JamesWhate89993 Which makes sense if you realize that networks trained with different optimizers might propagate noise differently. A nice visualization of this is Figure 3 (X might compress this).
Georgios Vlassis tweet media
English
2
0
3
59
Georgios Vlassis
Georgios Vlassis@gvlassis98·
@JamesWhate89993 Nevertheless, to me, the most interesting observation is that the Max-to-median ratio of the activations, which is used in a lot of quantization studies, is a bad predictor of quantization performance when you use different optimizers.
English
1
0
0
49
Georgios Vlassis
Georgios Vlassis@gvlassis98·
@evaninwords @HessianFree Nevertheless, we are meeting Omead later today to see which newer version of PSGD we should try. If you have any input/feedback/suggestions we would be more than happy to hear it too :).
English
0
0
3
47
Georgios Vlassis
Georgios Vlassis@gvlassis98·
@evaninwords @HessianFree ii) Most of the quantization error actually comes from activation quantization, not weight quantization. In our paper, we find that the spectral norm of the weights (which we link to quantization error propagation) barely changes after INT4 weight quantization for all optimizers.
English
1
0
3
44
Evan Walters
Evan Walters@evaninwords·
Very interesting paper out of ETH! The most interesting takeaway for me is that different optimizers result in distinct error propagation signatures throughout the model after quantization.
Saleh Ashkboos@AshkboosSaleh

Happy to share our new study on the interaction between #optimizers and #quantization! We show how optimizer choice affects quantized model quality and why outlier-based metrics (like Kurtosis and MMR) often fail to predict performance. Paper: arxiv.org/pdf/2509.23500 [1/5]

English
1
2
5
917
Georgios Vlassis
Georgios Vlassis@gvlassis98·
@evaninwords IMO the logical next step would be to design an optimizer/architecture/method that explicitly takes that into account. If you go through the maths in section 3.2, you will see that the quantity of interest is the "gain".
English
0
0
1
20
Georgios Vlassis
Georgios Vlassis@gvlassis98·
@evaninwords Hello Evan! Glad you like the idea! I completely agree that the most interesting finding is that the quantization error propagation profiles are different.
English
1
0
1
20