Neta Shaul

🚀🎬We introduce TMD (Transition Matching Distillation): 480p videos generated from text prompts in < 3 NFEs! 1️⃣Main backbone for feature extraction and lightweight head for iterative refinement 2️⃣Distilled from Wan2.1 14B T2V combining MeanFlow & DMD2 🔗research.nvidia.com/labs/genair/tmd

2

15

1.2K

Neta Shaul@shaulneta·18 Oca

Very cool! Transition matching for fast inference.

Julius Berner@julberner

English

1

7

798

Neta Shaul@shaulneta·4 Ara

Incredible atmosphere at the poster session today! Thanks to everyone who visited 🙌 I’ll be at #NeurIPS2025 until Dec 8. If you missed the poster or just want to chat, my DMs are open. Shoutout to @EliahuHorwitz for the pictures!

#NeurIPS2025 "Transition Matching" explores generative Markov processes with expressive transition kernels, going beyond the Gaussian kernel used in diffusion and flow models. Interested? Let's chat! 📍 Poster #3609 🕒 Wed at 11am - 2pm 📄 arxiv.org/abs/2506.23589

English

4

27

3.8K

Neta Shaul retweetledi

Heli Ben-Hamu@helibenhamu·1 Ara

I'll be at NeurIPS on Dec 3-4. Would be happy to meet up and chat about efficient sampling methods from language models ⚡️ Or, catch me at our EB-Sampler poster on Thursday 4:30pm Joint work with @itai_gat, @_dsevero, Niklas Nolte, Brian Karrer

English

Peter Holderrieth@peholderrieth

8

39

3K

Neta Shaul@shaulneta·2 Ara

#NeurIPS2025 "Transition Matching" explores generative Markov processes with expressive transition kernels, going beyond the Gaussian kernel used in diffusion and flow models. Interested? Let's chat! 📍 Poster #3609 🕒 Wed at 11am - 2pm 📄 arxiv.org/abs/2506.23589

English

3

5

26

5.3K

Neta Shaul@shaulneta·26 Eki

Had a blast talking about Transition Matching at the HUJI Vision Seminar, big thanks to @EliahuHorwitz for inviting me! 🚀 If you like simple visual illustrations of complex ideas, I made a few in my slides: neta93.github.io/slides/transit…

English

3

1

23

2.2K

Neta Shaul@shaulneta·9 Eki

Cool work!

New work: “GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models”. GLASS generates images by sampling stochastic Markov transitions with ODEs - allowing us to boost text-image alignment for large-scale models at inference time! arxiv.org/pdf/2509.25170 [1/7]

English

6

588

Neta Shaul retweetledi

Heli Ben-Hamu@helibenhamu·5 Eyl

Excited to share our work Set Block Decoding! A new paradigm combining next-token-prediction and masked (or discrete diffusion) models, allowing parallel decoding without any architectural changes and with exact KV cache. Arguably one of the simplest ways to accelerate LLMs!

English

5

24

115

25.7K

Neta Shaul@shaulneta·18 Tem

@zamedii_ @urielsinger @itai_gat @lipmanya Thanks, Richard! Note that Appendix B of our paper includes training code snippets for all our TM variants (see DTM in the screenshot below).

English

133

Richard Löwenström@zamedii_·17 Tem

@shaulneta @urielsinger @itai_gat @lipmanya Super cool and very impressive results! 🔥 Can we expect a GitHub repo to read through soon? 🙏

English

0

121

Neta Shaul@shaulneta·2 Tem

[1/n] New paper alert! 🚀 Excited to introduce 𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐌𝐚𝐭𝐜𝐡𝐢𝐧𝐠 (𝐓𝐌)! We're replacing short-timestep kernels from Flow Matching/Diffusion with... a generative model🤯, achieving SOTA text-2-image generation! @urielsinger @itai_gat @lipmanya

GIF

English

5

46

289

85.8K

Neta Shaul@shaulneta·16 Tem

@Fate_10kokoro @Fate_10kokoro here’s the r.v DTM samples x.com/shaulneta/stat…. informally: if k is large and k/T small, then repeatedly sampling the transition kernel k times with step size 1/T is roughly an Euler step of size k/T with the kernel’s expectation as velocity.

If you're curious to dive deeper into Transition Matching (TM)✨🔍, a great starting point is understanding the similarities and differences between 𝐃𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐌𝐚𝐭𝐜𝐡𝐢𝐧𝐠 (𝐃𝐓𝐌) and Flow Matching (FM)💡.

English

3

200

Gaopeng Ren@Fate_10kokoro·16 Tem

@shaulneta does this mean transition matching also learn the distribution of the velocity? and if the number of timesteps is very large, it converges to the expectation value of the velocity, that is, X_T-X_0.

English

0

231

Neta Shaul@shaulneta·15 Tem

DTM vs FM👇 Lots of interest in how Difference Transition Matching (DTM) connects to Flow Matching (FM). Here is a short animation that illustrates Theorem 1 in our paper: For a very small step size (1/T), DTM converges to an Euler step of FM.

GIF

[1/n] New paper alert! 🚀 Excited to introduce 𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐌𝐚𝐭𝐜𝐡𝐢𝐧𝐠 (𝐓𝐌)! We're replacing short-timestep kernels from Flow Matching/Diffusion with... a generative model🤯, achieving SOTA text-2-image generation! @urielsinger @itai_gat @lipmanya

English

46

327

25K

Neta Shaul@shaulneta·8 Tem

@nico_dufour @urielsinger @itai_gat @lipmanya [2/2] Such an approach doesn’t hurt performance—on the contrary, it may offer a path for improvement. We verified that gains aren’t due to “overfitting” the transition kernel with fewer steps.

English

1

114

Neta Shaul@shaulneta·8 Tem

@nico_dufour @urielsinger @itai_gat @lipmanya [1/2] DTM is intrinsically a discrete-time process: different time discretization➡️different process. However, DTM depends only on the current time, allowing to learn a continuous-time model (parameters are shared between processes) and discretization can be selected at inference

English

0

1

129

Neta Shaul@shaulneta·7 Tem

@nico_dufour @urielsinger @itai_gat @lipmanya Thanks Nicolas! FM learns the expected transition, while DTM learns to sample from the full transition distribution (slide). Adding a small MLP to FM didnt help—only when the MLP became a generative model (i.e., DTM) we saw improvements. I'll post more on DTM–FM soon, stay tuned!

English

0

4

300

Nicolas DUFOUR@nico_dufour·7 Tem

@shaulneta @urielsinger @itai_gat @lipmanya Hey nice work! Something i struggle to understand is what part of the improvements come from the framework and which come from the MAR architecture. Have you tried to train DTM without the MAR head with a vanilla DiT? Or is the FM baseline also using MAR? Thanks!

English

0

1

367

Neta Shaul@shaulneta·5 Tem

@CSProfKGD I’m glad you take interest in our work Kosta. Mean flows is indeed exciting work! From TM perspective, they learn large step-size transitions with a deterministic kernel which is very interesting. I don't have a more elaborate answer at the moment, but I plan to look into it.

English

4

144

Kosta Derpanis (sabbatical in Munich 🇩🇪)@CSProfKGD·4 Tem

@shaulneta Thanks for sharing Neta. Curious, what’s the relationship with mean flows from Kaiming and company?

English

0

3

725

Neta Shaul@shaulneta·4 Tem

If you're curious to dive deeper into Transition Matching (TM)✨🔍, a great starting point is understanding the similarities and differences between 𝐃𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐌𝐚𝐭𝐜𝐡𝐢𝐧𝐠 (𝐃𝐓𝐌) and Flow Matching (FM)💡.

[1/n] New paper alert! 🚀 Excited to introduce 𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐌𝐚𝐭𝐜𝐡𝐢𝐧𝐠 (𝐓𝐌)! We're replacing short-timestep kernels from Flow Matching/Diffusion with... a generative model🤯, achieving SOTA text-2-image generation! @urielsinger @itai_gat @lipmanya

English

15

125

14.5K

Neta Shaul@shaulneta·4 Tem

@vtaohu Great question! FM learns to approximate the expectation of the transition kernel, whereas DTM learns to sample from the underlying distribution of transitions. Hence, DTM is more expressive. Note, for a very small step size (1/T), FM's approximation is fully expressive!

English

92

Tao HU@vtaohu·4 Tem

@shaulneta Hi, Neta, could you elaborate why DTM is "a more expressive kernel"? I am confused here. 😀

English

0

1

79

Neta Shaul@shaulneta·3 Tem

Difference Transition Matching (DTM) process is so simple to Illustrate, you can calculate it on a whiteboard! At each step: Draw all lines connecting source and target (shaded) ⬇️ List those intersecting with the current state (yellow) ⬇️ Sample a line from the list (green)

GIF

[1/n] New paper alert! 🚀 Excited to introduce 𝐓𝐫𝐚𝐧𝐬𝐢𝐭𝐢𝐨𝐧 𝐌𝐚𝐭𝐜𝐡𝐢𝐧𝐠 (𝐓𝐌)! We're replacing short-timestep kernels from Flow Matching/Diffusion with... a generative model🤯, achieving SOTA text-2-image generation! @urielsinger @itai_gat @lipmanya

English

16

133

10K

Neta Shaul@shaulneta·3 Tem

@thoma_gu @urielsinger @itai_gat @lipmanya [2/2] DART-FM, however, is not an immediate TM variant according to our formulation. It can be seen as learning a non-markov diffusion kernel, and in a second stage it is composed with a small flow matching kernel to improve expressiveness.

English

3

218

Neta Shaul@shaulneta·3 Tem

@thoma_gu @urielsinger @itai_gat @lipmanya [1/2] Thanks for pointing! Indeed, DART-AR is a variant of TM, so I added a small slide showing the connection. In a nutshell, it uses an independent process but the kernel is Gaussian and hence not fully-expressive. (We will add a reference in the next version)

English