Pascal Tikeng (migrant)

10 posts

Pascal Tikeng (migrant)

@PTikeng

Learning something ... at @Mila_Quebec

Québec, Canada Katılım Temmuz 2020

87 Takip Edilen31 Takipçiler

Pascal Tikeng (migrant)@PTikeng·4d

Happy to be recognized as an ICML Gold Reviewer this year.

ICML Conference@icmlconf

Reviewers & ACs for #ICML2026 have been recognized for their service! - Reviewers: 4439 Gold (free registration), 4437 Silver. 17749 total reviewers were assigned >= 1 paper - ACs: 1647 receive free registration, out of 1691 who were assigned >= 1 paper TY for your hard work!

English

Pascal Tikeng (migrant) retweetledi

Marawan Gamal@mrremila·23 Nis

If you want to semantically search through ICLR papers this week, check out papers.app

English

Pascal Tikeng (migrant) retweetledi

Marawan Gamal@mrremila·16 Nis

New preprint! Introducing ACTMat: Model Merging via Data-Free Covariance Estimation Work done w/ @dtredsox13, @PTikeng, Colin Raffel and Guillaume Rabusseau TL;DR: It's RegMean, but without needing data for covariance estimation (C ≈ Δᵀ Δ) 📄 arxiv.org/pdf/2604.01329 🧵(1/6)

Română

1.2K

Pascal Tikeng (migrant)@PTikeng·12 Şub

@KempeLab How to provide feedbacks? ("Openreview" style)

English

Julia Kempe@KempeLab·10 Şub

8/ Please go have a look. All comments are welcome — and we especially invite others to check the work carefully.

English

1.7K

Julia Kempe@KempeLab·10 Şub

1/ #1stProof : Announcing our attempt at Problem 10. Joint with @scottnarmstrong @MunosRemi

English

152

23K

Pascal Tikeng (migrant) retweetledi

Suffiyan Malik@suffiyanmalikk·6 Ara

@dwarkesh_sp 9pm-3am is also incredible focus time

English

7.3K

Pascal Tikeng (migrant)@PTikeng·14 Tem

@introspection @Mila_Quebec 2) We also show that the commonly used L2 norm is not a reliable proxy for explaining grokking. 3) Finally, we investigate how factors such as training dataset choice and model overparameterization impact grokking delay The figure below summarizes our contribution fairly well

English

Pascal Tikeng (migrant)@PTikeng·14 Tem

@introspection @Mila_Quebec 1) We show that grokking time scales proportionally to 1/(α * β) when minimizing composite objectives of the form f = g + βh using gradient descent with learning rate α, where g is the training error and h is any regularizer that enforces an inductive bias toward generalization.

English

Guillaume Dumas@introspection·14 Tem

Proud of Pascal Notsawo, presenting at #ICML2025 about our research on #Grokking. He demonstrated how this phase transition to generalization depends on training objectives & regularization, and how standard metrics may overlook key dynamics. 📍Check out Poster W-811 on July 16!

English

1.9K

Pascal Tikeng (migrant)@PTikeng·11 Mar

@ferezola @tassingremi @arol_ketch J’avais acheté le mien ici amazon.ca/dp/2492170160?…

Français

BAMITIFUL@ferezola·10 Mar

les révélations de Jean fochivé @tassingremi @arol_ketch Svp vous pourriez disposer de ce livres 🤲🤲

Français

243

Pascal Tikeng (migrant) retweetledi

Guillaume Dumas@introspection·26 Haz

📝New preprint out!🔍 We used dynamical systems theory to predict "Grokking" — the phase transition leading to "Generalization Beyond Overfitting" discovered at @OpenAI last year— Big kudos to our @Mila_Quebec dream team, especially @PTikeng!

AI Safety Papers@safe_paper

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok Pascal Jr. Tikeng Notsawo , Hattie Zhou (@oh_that_hat), Mohammad Pezeshki (@mpezeshki91), Irina Rish (@irinarish), Guillaume Dumas (@introspection) arxiv.org/abs/2306.13253 Tags: grokking, science of DL, mechanistic interpretability, Mila/Meta If you want to know if your model will eventually grok without training it for a long time, the authors suggest looking at the loss during early epochs. In brief, low-frequency oscillations of the loss can be a good proxy for upcoming grokking. The authors contextualize this with other findings of grokking (specifically the sling-shot mechanism) and loss landscape research. I think the biggest appeal this paper had for me was how this research fits into the larger "science of DL" research landscape, and I will probably revisit it because it is so helpful to see how a lot of these concepts agree or disagree.

English

Keşfet

@dtredsox13 @KempeLab @scottnarmstrong @MunosRemi @dwarkesh_sp @introspection @Mila_Quebec @ferezola