Alessandro Favero (@alesfav) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Amazed by language diffusion models like Mercury? So are we 🤯 But how do they actually learn to generate coherent (and creative!) text from scratch? We dig into it using simple formal grammars and statistical physics. Just accepted @icmlconf 2025 🎉

English

1

4

31

2.3K

Alessandro Favero retweetledi

Helen Qu@_helenqu·19 Mar

physical systems (orbits/fluid mechanics) may look complex, but are often governed by simple equations/few parameters. can current self-supervised methods learn the underlying physics? our new paper finds that learning in latent space may be the key! arxiv.org/abs/2603.13227🧵

English

25

97

663

56K

Alessandro Favero retweetledi

Matthieu wyart@MatthieuWyart·17 Şub

What governs the geometry of time and space embeddings in LLMs? We show it follows from translation symmetry in language statistics. With Dhruva Karkada, @DanKorchinski, Andres Nava, @yasamanbb arxiv.org/abs/2602.15029

English

7

60

352

25.8K

Alessandro Favero retweetledi

Matthieu wyart@MatthieuWyart·16 Şub

This paper asks: What controls the scaling laws of LLMs? Two key ideas: (i) as the training set size increases, correlations are detected on a longer context scale and (ii) on this scale, LLMs function optimally: the loss is ~ the next-token conditional entropy.

Surya Ganguli@SuryaGanguli

Our new paper "Deriving neural scaling laws from the statistics of natural language" arxiv.org/abs/2602.07488 lead by @Fraccagnetta & @AllanRaventos w/ Matthieu Wyart makes a breakthrough! We can predict data-limited neural scaling law exponents from first principles using the structure of natural language itself for the very first time! If you give us two properties of your natural language dataset: 1) How conditional entropy of the next token decays with conditioning length. 2) How pairwise token correlations decay with time separation. Then we can give you the exponent of the neural scaling law (loss versus data amount) through a simple formula! The key idea is that as you increase the amount of training data, models can look further back in the past to predict, and as long as they do this well, the conditional entropy of the next token, conditioned on all tokens up to this data-dependent prediction time horizon, completely governs the loss! This gets us our simple formula for the neural scaling law!

English

1

7

39

7.7K

Alessandro Favero retweetledi

Matthieu wyart@MatthieuWyart·16 Şub

"Physics" approach to LLMs studied how synthetic languages are parsed after training, but the mechanism of learning how to parse was not known. Which correlations in data are used, and how many data are needed for that? This is answered here for a class of context-free languages.

Francesco Cagnetta@Fraccagnetta

❓ How do LLMs learn hierarchical structure from sentences alone? 🚨 We build PCFG-like synthetic datasets with two knobs---hierarchy + ambiguity---and derive a correlation-based learning mechanism that predicts the sample complexity of deep nets. Results 👇

English

1

10

55

5.6K

Alessandro Favero retweetledi

Surya Ganguli@SuryaGanguli·10 Şub

Our new paper "Deriving neural scaling laws from the statistics of natural language" arxiv.org/abs/2602.07488 lead by @Fraccagnetta & @AllanRaventos w/ Matthieu Wyart makes a breakthrough! We can predict data-limited neural scaling law exponents from first principles using the structure of natural language itself for the very first time! If you give us two properties of your natural language dataset: 1) How conditional entropy of the next token decays with conditioning length. 2) How pairwise token correlations decay with time separation. Then we can give you the exponent of the neural scaling law (loss versus data amount) through a simple formula! The key idea is that as you increase the amount of training data, models can look further back in the past to predict, and as long as they do this well, the conditional entropy of the next token, conditioned on all tokens up to this data-dependent prediction time horizon, completely governs the loss! This gets us our simple formula for the neural scaling law!

English

20

117

576

60.2K

Alessandro Favero@alesfav·10 Şub

Super cool work predicting the scaling exponents of pre-trained LLMs from language statistics alone.

Francesco Cagnetta@Fraccagnetta

🚨 We derive data-limited neural scaling exponents directly from measurable corpus statistics. No synthetic data models, only two ingredients: -decay of token-token correlations with separation; -decay of next-token conditional entropy with context length.

English

0

5

293

Alessandro Favero retweetledi

Physics of Complex Systems Lab@pcsl_epfl·10 Şub

Check out our latest work on how neural language models learn to parse PCFGs from local stats!

Francesco Cagnetta@Fraccagnetta

❓ How do LLMs learn hierarchical structure from sentences alone? 🚨 We build PCFG-like synthetic datasets with two knobs---hierarchy + ambiguity---and derive a correlation-based learning mechanism that predicts the sample complexity of deep nets. Results 👇

English

0

1

6

549

Alessandro Favero retweetledi

Francesco Cagnetta@Fraccagnetta·9 Şub

❓ How do LLMs learn hierarchical structure from sentences alone? 🚨 We build PCFG-like synthetic datasets with two knobs---hierarchy + ambiguity---and derive a correlation-based learning mechanism that predicts the sample complexity of deep nets. Results 👇

English

3

16

105

16.2K

Alessandro Favero retweetledi

Andrew Gordon Wilson@andrewgwils·23 Ara

This is an annual reminder that the no free lunch theorems are irrelevant. The assumptions they make are completely divorced from the world we live in. They should have no bearing on model construction. Let's make this a monthly mantra.

English

17

301

50.5K

Alessandro Favero retweetledi

Daniel Korchinski@DanKorchinski·5 Ara

I’m excited to present my work at #NeurIPS with Dhruva Karkada, @yasamanbb and Matthieu Wyart tomorrow. If you want to understand why analogical reasoning emerges geometrically in simple language models, come check out our poster (# 3209) Friday afternoon at 16:30!

English

2

3

26

3.7K

Alessandro Favero retweetledi

Surya Ganguli@SuryaGanguli·15 Kas

We have 14 survey lectures for our @SimonsFdn Collaboration on the Physics of Learning and Neural Computation! All videos available at: physicsoflearning.org/webinar-series Here is the list: @zdeborova: Attention-based models and how to solve them using tools from quadratic networks and matrix denoising @KempeLab: Recent lessons from LLM reasoning @MBarkeshli: Sharpness dynamics in neural network training @KrzakalaF: How Do Neural Networks Learn Simple Functions with Gradient Descent? Michael Douglas: Mathematics, Economics and AI Yuhai Tu: Towards a Physics-based Theoretical Foundation for Deep Learning: Stochastic Learning Dynamics and Generalization @SuryaGanguli: An analytic theory of creativity for convolutional diffusion models Eva Silverstein: Hamiltonian dynamics for stabilizing neural simulation-based inference @adnarim066: Generation with Unified Diffusion Bernd Rosenow: Random matrix analysis of neural networks: distinguishing noise from learned information @jhhalverson Nerual networks and conformal field theory @KempeLab Synthetic data: friend or foe in the age of scaling @WyartMatthieu Learning hierarchical representations with deep architectures @CPehlevan Mean-field theory of deep network learning dynamics and applications to neural scaling laws

English

2

57

250

22.1K

Alessandro Favero retweetledi

Stefano Ermon@StefanoErmon·29 Eki

Tired of chasing references across dozens of papers? This monograph distills it all: the principles, intuition, and math behind diffusion models. Thrilled to share!

Chieh-Hsin (Jesse) Lai@JCJesseLai

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core ideas that shaped diffusion modeling and explains how today’s models work, why they work, and where they’re heading. 🧵You’ll find the link and a few highlights in the thread. We’d love to hear your thoughts and join some discussions! ⚡ Stay tuned for our markdown version, where you can drop your comments!

English

13

133

1.1K

126.7K

Alessandro Favero@alesfav·9 Eki

@qinym710 @Cambridge_Uni Thank you :)

English

0

44

Yiming Qin@qinym710·9 Eki

@alesfav @Cambridge_Uni Big congrats!

English

1

0

1

78

Alessandro Favero@alesfav·8 Eki

🎓My PhD thesis is now on arXiv! It follows a thread of compositionality in AI: from locality in CNNs, to the 'grammar' diffusion models learn to be creative, to task composition in foundation models. Officially a Dr 🎉 and starting as a Physics-AI Fellow at DAMTP @Cambridge_Uni

Stat.ML Papers@StatMLPapers

The Physics of Data and Tasks: Theories of Locality and Compositionality in Deep Learning ift.tt/9B0HFnC

English

3

1

22

1.2K

Alessandro Favero@alesfav·8 Eki

@darshilhdoshi1 @Cambridge_Uni Thanks, Darshil!!

English

0

44

Darshil Doshi@darshilhdoshi1·8 Eki

@alesfav @Cambridge_Uni Congratulations Alessandro!

Français

1

0

1

73

Alessandro Favero@alesfav·8 Eki

@SuryaGanguli @pcsl_epfl @pafrossard @ChTheiler Thanks Surya :)

English

0

1

60

Surya Ganguli@SuryaGanguli·8 Eki

@alesfav @pcsl_epfl @pafrossard @ChTheiler Congrats Alessandro!! Excellent thesis!

English

1

0

1

89

Alessandro Favero retweetledi

Ke Wang@wangkeml·3 Eki

Bit late for the announcements but very happy to share that MEMOIR is accepted to Neurips 2025🎉! Great collaboration with @qinym710 @nikdimitriadis, @alesfav, @pafrossard! See you in San diego!

Yiming Qin@qinym710

How can we inject new knowledge into LLMs without full retraining, forgetting, or breaking past edits? We introduce MEMOIR 📖— a scalable framework for lifelong model editing that reliably rewrites thousands of facts sequentially using a residual memory module. 🔥 🧵1/7

English

0

4

8

1.3K

Alessandro Favero retweetledi

Abdellah Rahmani@arahmani_AR·24 Eyl

🎉 Thrilled to share: our paper FANTOM with Prof. @pafrossard, Flow-based approach for Dynamic Temporal Causal models with non-Gaussian or Heteroscedastic Noises, has been accepted at NeurIPS 2025! (1/6)

English

1

8

16

1.3K

Alessandro Favero retweetledi

Alex Hägele@haeggee·2 Eyl

Long in the making, finally released: Apertus-8B and Apertus-70B, trained on 15T tokens of open data from over 1800 languages. Unique opportunity in academia to work on and train LLMs across the full-stack. We managed to pull off a pretraining run with some fun innovations, ...

CSCS Lugano@cscsch

@EPFL , @ETH_en and #CSCS today released Apertus, Switzerland's first large-scale, multilingual language model (LLM). As a fully open LLM, it serves as a building block for developers and organizations to create their own applications: cscs.ch/science/comput… #Apertus #AI

English

9

30

231

78K

Alessandro Favero

Keşfet