Molei Tao

363 posts

Molei Tao

@MoleiTaoMath

Georgia Tech Prof; Tsinghua, Caltech, NYU Courant * deep learning theory * (diffusion) generative model, probabilistic ML * AI4Science * applied & comput. math

Beigetreten Ekim 2021

204 Folgt1.8K Follower

Angehefteter Tweet

Molei Tao@MoleiTaoMath·24 Şub

Does GenAI create new knowledge? arxiv.org/abs/2602.06021 gives * 1st explicit characterization of diffusion model's generalization * more precise than offered by classical stat. learning theory * systematic integration of various inductive biases (training+architecture+inference)

English

165

11.1K

Molei Tao@MoleiTaoMath·5d

Apply at forms.gle/eGX8XzTHJp4FLk… to Deep Learning for Science Summer School July 20-24, 2026 Berkeley, California dl4sci-school.lbl.gov Application deadline: April 26, 2026 AOE

English

416

26.1K

Molei Tao retweetet

Krishna Balasubramanian@krizna_b·3 Nis

🚨 📢 Our team at Amazon AWS has an opening for summer #internship focusing on math and verification. If you have experience in #lean, #verification, #reasoning and #LLM please get in touch via DM. Appreciate RT 🙏

English

206

19.4K

Molei Tao retweetet

Paata Ivanisvili@PI010101·1 Nis

Every subgaussian is a sum of Gaussians -- Antoine Song resolves Talagrand's conjectures arxiv.org/pdf/2602.22342

English

399

38.6K

Molei Tao@MoleiTaoMath·1 Nis

The community and us would appreciate it if you're an expert and interested in reviewing for ICML'26 workshop on Foundations of Deep Generative Models: Understanding Memorization, Generalization, and Reasoning fdgm-workshop.github.io/FDGM_ICML2026/ Sign up at docs.google.com/forms/d/e/1FAI…

English

4.8K

Molei Tao@MoleiTaoMath·30 Mar

If interested in how diffusion model creates new knowledge, check out this talk by @YeHeMath , for the first explicit quantification.

Monte Carlo Seminar@OnlineMCSeminar

Join us this Tuesday, for a talk by Ye He: “Diffusion Model’s Generalization via Data-Dependent Ridge Manifolds” The talk gives a geometric view of what a learned diffusion model generates, why ridge manifolds matter, and how this helps explain inference dynamics.

English

4.2K

Molei Tao@MoleiTaoMath·28 Mar

@nhaghtal @Berkeley_EECS Congrats! Well deserved

English

487

Nika Haghtalab@nhaghtal·28 Mar

This week I was promoted to the rank of Associate Professor at @Berkeley_EECS ! In a remarkable show of enthusiasm, the committee apparently tore a hole in spacetime to make me an Associate Professor 9 months ago!

English

717

53.9K

Molei Tao@MoleiTaoMath·27 Mar

@EmtiyazKhan Great news!

English

Emtiyaz Khan@EmtiyazKhan·25 Mar

Not a news to many, but I will be starting as a professor (W3) at TU Darmstadt in June 2026 and setup another branch of our Adaptive Intelligence group. Also, I am hiring! Please share! 4 PhD positions career.tu-darmstadt.de/tu-darmstadt/j… 2 post-doc positions career.tu-darmstadt.de/tu-darmstadt/j…

English

380

30.5K

Molei Tao@MoleiTaoMath·27 Mar

@ottogin1 Thank you for the very interesting question! I feel analysis with cfg is possible, but we haven't done it yet. I also feel I can understand and agree with your excellent intuition. Hopefully I can reply with something more scientific in the future. Really appreciate your reply.

English

Artem Lukoianov@ottogin1·26 Mar

Hi @MoleiTaoMath ! Thank you for sharing your work, I actually read it in February, very interesting. I was wondering how does CFG plug into your analysis. It seems that CFG should play a crucial role in the alignment phase, and almost feels like it should be bad for alignment (but in practice it is not?). This reminds me of some thoughts from "Guiding a Diffusion Model with a Bad Version of Itself".

English

124

Artem Lukoianov@ottogin1·24 Mar

Join us at @CVPR in Denver for a full-day tutorial about Analytic Understanding of Diffusion Models. The training objective of diffusion models has a closed-form solution -- yet it only memorizes. How do real models generalize? We'll unpack this paradox and the emerging analytical theory behind it. @yuancy @CScarvelis @MasonKamb @WangBinxu @vincesitzmann @JustinMSolomon @SuryaGanguli

English

197

31K

Molei Tao@MoleiTaoMath·20 Mar

RLVR fine tuning of discrete diffusion model (referred to as diffusion language model by many), powered by techniques accumulated from diffusion-based samplers

Yuchen Zhu@YuchenZhu_ZYC

RL is the engine behind reasoning in AR-LLMs. But for diffusion LLMs? Existing methods mostly port AR algorithms over with some modifications — ignoring what makes dLLMs special, and paying the price in speed. We propose 𝗗𝗠𝗣𝗢 (𝗗𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻 𝗠𝗮𝘁𝗰𝗵𝗶𝗻𝗴 𝗣𝗼𝗹𝗶𝗰𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻): an efficient, effective RL method designed for dLLMs from the ground up. Forward-only, off-policy, theoretically grounded ⚡ 🔗 arxiv.org/abs/2510.08233

English

1.9K

Molei Tao@MoleiTaoMath·20 Mar

@thjashin @liyzhen2 arxiv.org/abs/2603.07514

QME

163

Jiaxin Shi@thjashin·20 Mar

@liyzhen2 Haven’t been following lately - would be good to understand the connection!

English

847

yingzhen@liyzhen2·20 Mar

given some recent interests on drifting models, sharing some AI glacier-period papers on kernel-based score estimators incl. shameless plug ☺️ arxiv.org/abs/1404.5028 arxiv.org/abs/1506.02564 arxiv.org/abs/1705.07107 arxiv.org/abs/1806.02925 arxiv.org/abs/2005.10099

English

108

13.2K

Molei Tao@MoleiTaoMath·19 Mar

An interesting connection between drifting model and score-based generative model

Chieh-Hsin (Jesse) Lai@JCJesseLai

[1/D] 🤔 What are drifting models really connected to? 📢 Our new paper, A Unified View of Drifting and Score-Based Models, shows that the bridge to score-based models is clear and precise (w/ team and @mittu1204, @StefanoErmon, @MoleiTaoMath)! ✍️ Main takeaway: drifting is more closely connected to score-based (diffusion) modeling than it may first appear! 🔗 arxiv.org/abs/2603.07514 🎯 Here’s why: Drifting’s mean-shift moves a sample toward the kernel-weighted average of nearby samples. Score function points toward regions of higher density. So both describe local directions that push samples toward where data is denser. We show that this link is exact for Gaussian kernels (Section 4.1): 📌drifting’s mean-shift = a rescaled score-matching field between the Gaussian-smoothed data and model distributions — the vector field underlying score matching (Tweedie!). 📌This also clarifies the bridge to Distribution Matching Distillation (DMD): both use score-based transport directions, but only differ in how the score is realized—drifting does so nonparametrically through kernel neighborhoods, whereas DMD relies on a pretrained diffusion teacher. 🤔 So what happens for the default Laplace kernel used in drifting models? Let’s look below 👇

English

Molei Tao retweetet

Wei Guo@WeiGuo01·16 Mar

How to bring the speed & precision of continuous adjoint matching (AM) to discrete neural samplers? Introducing discrete adjoint Schrödinger bridge sampler (DASBS): a unified framework for authentic discrete AM! 🎲✨ Joint work with @JaemooChoi et al.: 📄arxiv.org/abs/2602.08243

English

3.5K

Molei Tao retweetet

Lenka Zdeborova@zdeborova·7 Mar

🚀 The Applied Maths department at École Polytechnique is hiring aMonge Assistant Professor (Tenure track) in Statistical Learning & AI for Mathematics and Science. 📌 For all details and to apply: candidatures-calliope.polytechnique.fr/calliope-fo/re…

English

6.3K

Molei Tao retweetet

Statistics (Machine Learning) Papers@StatsPapers·6 Mar

How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression? Kuo-Wei Lai, Guanghui Wang, Molei Tao, Vidya Muthukumar arxiv.org/abs/2603.04895 [𝚜𝚝𝚊𝚝.𝙼𝙻 𝚌𝚜.𝙻𝙶 𝚖𝚊𝚝𝚑.𝙾𝙲]

Statistics (Machine Learning) Papers tweet media

English

875

Molei Tao@MoleiTaoMath·6 Mar

New systematic study of RL fine tuning of diffusion models

Jaemoo Choi@JaemooChoi

We proudly present “Rethinking the Design Space of RL for Diffusion Models” showing that ELBO-based likelihood estimation (from the final sample) is the dominant driver of stable, efficient RL fine-tuning. On SD3.5-Medium, we boost GenEval 0.24 → 0.95 in ~90 GPU hours, beating FlowGRPO (4.6×) and DiffusionNFT (2×) efficiency. Great Collab with @YongxinChen1 @YuchenZhu_ZYC @WeiGuo01 @MoleiTaoMath Petr Molodyk, Bo Yuan, Jinbin Bai, Yi Xin 🎥 Video attached 📍Link: arxiv.org/abs/2602.04663 #DiffusionModels #ReinforcementLearning #TextToImage #GenAI

English

4.9K

Molei Tao retweetet

Francis Bach@BachFrancis·5 Mar

Looking for alternatives to quadratic functions for closed-form analysis in optimization? This post explores matrix Riccati dynamics and their applications to neural networks. francisbach.com/closed-form-dy…

GIF

English

158

9.1K

Molei Tao retweetet

Leonardo de Moura@Leonard41111588·3 Mar

AI is writing a growing share of the world's software. No one is formally verifying any of it. New essay: "When AI Writes the World's Software, Who Verifies It?" leodemoura.github.io/blog/2026/02/2…

English

246

1.6K

421.1K

Molei Tao retweetet

arXiv math.PR Probability@mathPRb·27 Şub

Andrea Montanari: Spin Glass Concepts in Computer Science, Statistics, and Learning arxiv.org/abs/2602.23326 arxiv.org/pdf/2602.23326 arxiv.org/html/2602.23326

English

6.1K

Molei Tao@MoleiTaoMath·26 Şub

@josephdviviano Thank you Joseph for the kind words! Glad that we are interested in similar (and important, I think) problems.

English

Joseph Viviano@josephdviviano·25 Şub

@MoleiTaoMath I've been looking high and low for something like this, thank you

English

349

Molei Tao@MoleiTaoMath·24 Şub

English

165

11.1K

Entdecken

@YeHeMath @nhaghtal @Berkeley_EECS @EmtiyazKhan @ottogin1 @CVPR @yuancy @CScarvelis