Alessandro Favero

133 posts

Alessandro Favero

Alessandro Favero

@alesfav

Physics-AI fellow @Cambridge_Uni explaining the scientific principles behind AI. Formerly @EPFL, @Amazon AI Labs.

Cambridge, London, UK Katılım Eylül 2020
891 Takip Edilen439 Takipçiler
Sabitlenmiş Tweet
Alessandro Favero
Alessandro Favero@alesfav·
Amazed by language diffusion models like Mercury? So are we 🤯 But how do they actually learn to generate coherent (and creative!) text from scratch? We dig into it using simple formal grammars and statistical physics. Just accepted @icmlconf 2025 🎉
Alessandro Favero tweet media
English
1
4
31
2.3K
Alessandro Favero retweetledi
Helen Qu
Helen Qu@_helenqu·
physical systems (orbits/fluid mechanics) may look complex, but are often governed by simple equations/few parameters. can current self-supervised methods learn the underlying physics? our new paper finds that learning in latent space may be the key! arxiv.org/abs/2603.13227🧵
Helen Qu tweet media
English
25
97
663
56K
Alessandro Favero retweetledi
Matthieu wyart
Matthieu wyart@MatthieuWyart·
This paper asks: What controls the scaling laws of LLMs? Two key ideas: (i) as the training set size increases, correlations are detected on a longer context scale and (ii) on this scale, LLMs function optimally: the loss is ~ the next-token conditional entropy.
Surya Ganguli@SuryaGanguli

Our new paper "Deriving neural scaling laws from the statistics of natural language" arxiv.org/abs/2602.07488 lead by @Fraccagnetta & @AllanRaventos w/ Matthieu Wyart makes a breakthrough! We can predict data-limited neural scaling law exponents from first principles using the structure of natural language itself for the very first time! If you give us two properties of your natural language dataset: 1) How conditional entropy of the next token decays with conditioning length. 2) How pairwise token correlations decay with time separation. Then we can give you the exponent of the neural scaling law (loss versus data amount) through a simple formula! The key idea is that as you increase the amount of training data, models can look further back in the past to predict, and as long as they do this well, the conditional entropy of the next token, conditioned on all tokens up to this data-dependent prediction time horizon, completely governs the loss! This gets us our simple formula for the neural scaling law!

English
1
7
39
7.7K
Alessandro Favero retweetledi
Matthieu wyart
Matthieu wyart@MatthieuWyart·
"Physics" approach to LLMs studied how synthetic languages are parsed after training, but the mechanism of learning how to parse was not known. Which correlations in data are used, and how many data are needed for that? This is answered here for a class of context-free languages.
Francesco Cagnetta@Fraccagnetta

❓ How do LLMs learn hierarchical structure from sentences alone? 🚨 We build PCFG-like synthetic datasets with two knobs---hierarchy + ambiguity---and derive a correlation-based learning mechanism that predicts the sample complexity of deep nets. Results 👇

English
1
10
55
5.6K
Alessandro Favero retweetledi
Surya Ganguli
Surya Ganguli@SuryaGanguli·
Our new paper "Deriving neural scaling laws from the statistics of natural language" arxiv.org/abs/2602.07488 lead by @Fraccagnetta & @AllanRaventos w/ Matthieu Wyart makes a breakthrough! We can predict data-limited neural scaling law exponents from first principles using the structure of natural language itself for the very first time! If you give us two properties of your natural language dataset: 1) How conditional entropy of the next token decays with conditioning length. 2) How pairwise token correlations decay with time separation. Then we can give you the exponent of the neural scaling law (loss versus data amount) through a simple formula! The key idea is that as you increase the amount of training data, models can look further back in the past to predict, and as long as they do this well, the conditional entropy of the next token, conditioned on all tokens up to this data-dependent prediction time horizon, completely governs the loss! This gets us our simple formula for the neural scaling law!
Surya Ganguli tweet media
English
20
117
576
60.2K
Alessandro Favero retweetledi
Francesco Cagnetta
Francesco Cagnetta@Fraccagnetta·
❓ How do LLMs learn hierarchical structure from sentences alone? 🚨 We build PCFG-like synthetic datasets with two knobs---hierarchy + ambiguity---and derive a correlation-based learning mechanism that predicts the sample complexity of deep nets. Results 👇
Francesco Cagnetta tweet media
English
3
16
105
16.2K
Alessandro Favero retweetledi
Andrew Gordon Wilson
Andrew Gordon Wilson@andrewgwils·
This is an annual reminder that the no free lunch theorems are irrelevant. The assumptions they make are completely divorced from the world we live in. They should have no bearing on model construction. Let's make this a monthly mantra.
English
17
17
301
50.5K
Alessandro Favero retweetledi
Daniel Korchinski
Daniel Korchinski@DanKorchinski·
I’m excited to present my work at #NeurIPS with Dhruva Karkada, @yasamanbb and Matthieu Wyart tomorrow. If you want to understand why analogical reasoning emerges geometrically in simple language models, come check out our poster (# 3209) Friday afternoon at 16:30!
Daniel Korchinski tweet media
English
2
3
26
3.7K
Alessandro Favero retweetledi
Surya Ganguli
Surya Ganguli@SuryaGanguli·
We have 14 survey lectures for our @SimonsFdn Collaboration on the Physics of Learning and Neural Computation! All videos available at: physicsoflearning.org/webinar-series Here is the list: @zdeborova: Attention-based models and how to solve them using tools from quadratic networks and matrix denoising @KempeLab: Recent lessons from LLM reasoning @MBarkeshli: Sharpness dynamics in neural network training @KrzakalaF: How Do Neural Networks Learn Simple Functions with Gradient Descent? Michael Douglas: Mathematics, Economics and AI Yuhai Tu: Towards a Physics-based Theoretical Foundation for Deep Learning: Stochastic Learning Dynamics and Generalization @SuryaGanguli: An analytic theory of creativity for convolutional diffusion models Eva Silverstein: Hamiltonian dynamics for stabilizing neural simulation-based inference @adnarim066: Generation with Unified Diffusion Bernd Rosenow: Random matrix analysis of neural networks: distinguishing noise from learned information @jhhalverson Nerual networks and conformal field theory @KempeLab Synthetic data: friend or foe in the age of scaling @WyartMatthieu Learning hierarchical representations with deep architectures @CPehlevan Mean-field theory of deep network learning dynamics and applications to neural scaling laws
English
2
57
250
22.1K
Alessandro Favero retweetledi
Stefano Ermon
Stefano Ermon@StefanoErmon·
Tired of chasing references across dozens of papers? This monograph distills it all: the principles, intuition, and math behind diffusion models. Thrilled to share!
Chieh-Hsin (Jesse) Lai@JCJesseLai

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading. 🧵You’ll find the link and a few highlights in the thread. We’d love to hear your thoughts and join some discussions! ⚡ Stay tuned for our markdown version, where you can drop your comments!

English
13
133
1.1K
126.7K
Alessandro Favero retweetledi
Alessandro Favero retweetledi
Abdellah Rahmani
Abdellah Rahmani@arahmani_AR·
🎉 Thrilled to share: our paper FANTOM with Prof. @pafrossard, Flow-based approach for Dynamic Temporal Causal models with non-Gaussian or Heteroscedastic Noises, has been accepted at NeurIPS 2025! (1/6)
English
1
8
16
1.3K
Alessandro Favero retweetledi
Alex Hägele
Alex Hägele@haeggee·
Long in the making, finally released: Apertus-8B and Apertus-70B, trained on 15T tokens of open data from over 1800 languages. Unique opportunity in academia to work on and train LLMs across the full-stack. We managed to pull off a pretraining run with some fun innovations, ...
Alex Hägele tweet media
CSCS Lugano@cscsch

@EPFL , @ETH_en and #CSCS today released Apertus, Switzerland's first large-scale, multilingual language model (LLM). As a fully open LLM, it serves as a building block for developers and organizations to create their own applications: cscs.ch/science/comput… #Apertus #AI

English
9
30
231
78K