Pascal Tikeng (migrant)

10 posts

Pascal Tikeng (migrant)

Pascal Tikeng (migrant)

@PTikeng

Learning something ... at @Mila_Quebec

Québec, Canada Katılım Temmuz 2020
87 Takip Edilen31 Takipçiler
Pascal Tikeng (migrant)
Happy to be recognized as an ICML Gold Reviewer this year.
Pascal Tikeng (migrant) tweet media
ICML Conference@icmlconf

Reviewers & ACs for #ICML2026 have been recognized for their service! - Reviewers: 4439 Gold (free registration), 4437 Silver. 17749 total reviewers were assigned >= 1 paper - ACs: 1647 receive free registration, out of 1691 who were assigned >= 1 paper TY for your hard work!

English
0
0
0
80
Pascal Tikeng (migrant) retweetledi
Marawan Gamal
Marawan Gamal@mrremila·
If you want to semantically search through ICLR papers this week, check out papers.app
English
0
1
3
66
Pascal Tikeng (migrant) retweetledi
Marawan Gamal
Marawan Gamal@mrremila·
New preprint! Introducing ACTMat: Model Merging via Data-Free Covariance Estimation Work done w/ @dtredsox13, @PTikeng, Colin Raffel and Guillaume Rabusseau TL;DR: It's RegMean, but without needing data for covariance estimation (C ≈ Δᵀ Δ) 📄 arxiv.org/pdf/2604.01329 🧵(1/6)
Marawan Gamal tweet media
Română
6
6
12
1.2K
Julia Kempe
Julia Kempe@KempeLab·
8/ Please go have a look. All comments are welcome — and we especially invite others to check the work carefully.
English
3
0
2
1.7K
Pascal Tikeng (migrant) retweetledi
Suffiyan Malik
Suffiyan Malik@suffiyanmalikk·
@dwarkesh_sp 9pm-3am is also incredible focus time
English
3
2
61
7.3K
Pascal Tikeng (migrant)
Pascal Tikeng (migrant)@PTikeng·
@introspection @Mila_Quebec 2) We also show that the commonly used L2 norm is not a reliable proxy for explaining grokking. 3) Finally, we investigate how factors such as training dataset choice and model overparameterization impact grokking delay The figure below summarizes our contribution fairly well
Pascal Tikeng (migrant) tweet media
English
0
0
1
38
Pascal Tikeng (migrant)
Pascal Tikeng (migrant)@PTikeng·
@introspection @Mila_Quebec 1) We show that grokking time scales proportionally to 1/(α * β) when minimizing composite objectives of the form f = g + βh using gradient descent with learning rate α, where g is the training error and h is any regularizer that enforces an inductive bias toward generalization.
English
1
0
1
50
Guillaume Dumas
Guillaume Dumas@introspection·
Proud of Pascal Notsawo, presenting at #ICML2025 about our research on #Grokking. He demonstrated how this phase transition to generalization depends on training objectives & regularization, and how standard metrics may overlook key dynamics. 📍Check out Poster W-811 on July 16!
Guillaume Dumas tweet media
English
2
3
19
1.9K
Pascal Tikeng (migrant) retweetledi
Guillaume Dumas
Guillaume Dumas@introspection·
📝New preprint out!🔍 We used dynamical systems theory to predict "Grokking" — the phase transition leading to "Generalization Beyond Overfitting" discovered at @OpenAI last year— Big kudos to our @Mila_Quebec dream team, especially @PTikeng!
AI Safety Papers@safe_paper

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok Pascal Jr. Tikeng Notsawo , Hattie Zhou (@oh_that_hat), Mohammad Pezeshki (@mpezeshki91), Irina Rish (@irinarish), Guillaume Dumas (@introspection) arxiv.org/abs/2306.13253 Tags: grokking, science of DL, mechanistic interpretability, Mila/Meta If you want to know if your model will eventually grok without training it for a long time, the authors suggest looking at the loss during early epochs. In brief, low-frequency oscillations of the loss can be a good proxy for upcoming grokking. The authors contextualize this with other findings of grokking (specifically the sling-shot mechanism) and loss landscape research. I think the biggest appeal this paper had for me was how this research fits into the larger "science of DL" research landscape, and I will probably revisit it because it is so helpful to see how a lot of these concepts agree or disagree.

English
0
12
50
9K