Guy Van den Broeck (@guyvdb) - Twitter 프로필

@trunghlt @IanLi1118 new tokens depend on already unmasked tokens, but they do not depend on each other when unmasking multiple tokens in a single step

English

1

0

6

83

Trung H@trunghlt·4 Mar

@IanLi1118 Before yielding tokens, the last layer has access to previous hidden outputs. Why do we assume it generates tokens independently?

Vietnam 🇻🇳 English

1

0

1

552

Guy Van den Broeck 리트윗함

Ian Li@IanLi1118·4 Mar

One of the biggest promises of Diffusion LLMs is parallel generation: predicting multiple tokens at once to bypass the sequential bottleneck of autoregressive models. However, parallel generation comes with a price. For example: Should the sentence “He is from [MASK] [MASK]” be filled with [New] [York] or [San] [Diego]? If a diffusion model predicts both at the exact same time, it assumes independence and may produce... [San] [York]. 🤦‍♂️ We argue this arises from a structural misspecification: models are restricted to fully factorized outputs because parameterizing the full joint distribution would require a prohibitively massive output head. This is the Factorization Barrier crippling parallel generation. Here is how we broke it with CoDD.

English

8

31

309

21.9K

Guy Van den Broeck@guyvdb·4 Mar

We put probabilistic circuits into diffusion language models and got a big boost in reasoning performance!

Ian Li@IanLi1118

One of the biggest promises of Diffusion LLMs is parallel generation: predicting multiple tokens at once to bypass the sequential bottleneck of autoregressive models. However, parallel generation comes with a price. For example: Should the sentence “He is from [MASK] [MASK]” be filled with [New] [York] or [San] [Diego]? If a diffusion model predicts both at the exact same time, it assumes independence and may produce... [San] [York]. 🤦‍♂️ We argue this arises from a structural misspecification: models are restricted to fully factorized outputs because parameterizing the full joint distribution would require a prohibitively massive output head. This is the Factorization Barrier crippling parallel generation. Here is how we broke it with CoDD.

English

0

11

1K

Guy Van den Broeck 리트윗함

Zilei Shao@zileishao·4 Mar

Check out our most recent work on dLLM!

Ian Li@IanLi1118

One of the biggest promises of Diffusion LLMs is parallel generation: predicting multiple tokens at once to bypass the sequential bottleneck of autoregressive models. However, parallel generation comes with a price. For example: Should the sentence “He is from [MASK] [MASK]” be filled with [New] [York] or [San] [Diego]? If a diffusion model predicts both at the exact same time, it assumes independence and may produce... [San] [York]. 🤦‍♂️ We argue this arises from a structural misspecification: models are restricted to fully factorized outputs because parameterizing the full joint distribution would require a prohibitively massive output head. This is the Factorization Barrier crippling parallel generation. Here is how we broke it with CoDD.

English

0

3

10

1.2K

Guy Van den Broeck 리트윗함

Anji Liu@liu_anji·4 Mar

Check out our recent work offering a principled way to perform parallel prediction (few-step generation) in Diffusion LLMs with minimal performance degradation!

Ian Li@IanLi1118

One of the biggest promises of Diffusion LLMs is parallel generation: predicting multiple tokens at once to bypass the sequential bottleneck of autoregressive models. However, parallel generation comes with a price. For example: Should the sentence “He is from [MASK] [MASK]” be filled with [New] [York] or [San] [Diego]? If a diffusion model predicts both at the exact same time, it assumes independence and may produce... [San] [York]. 🤦‍♂️ We argue this arises from a structural misspecification: models are restricted to fully factorized outputs because parameterizing the full joint distribution would require a prohibitively massive output head. This is the Factorization Barrier crippling parallel generation. Here is how we broke it with CoDD.

English

0

1

9

930

Guy Van den Broeck 리트윗함

Zhihan Yang@zhihanyang_·30 Oca

Join our reading group next Monday! Paper: Planned Diffusion Presenters: Daniel Israel (@danielmisrael), Tian Jin (@jintian)

Discrete Diffusion Reading Group@diffusion_llms

📢Feb 2 (Mon): Planned Diffusion 🙅Diffusion language models are capable of parallelizing text generation but can struggle with coherence in low time-step regimes. 💡Planned Diffusion unlocks a new axis of parallelism: Token-level parallelism ➡️ semantic parallelism ✍️Planned diffusion first generates a structured plan, then diffuses semantically independent spans of text in parallel according to the plan. This Monday, Daniel Israel (UCLA) (@danielmisrael) and Tian Jin (MIT) (@jintian) will discuss their exciting Planned Diffusion paper as joint first authors. Collaborators: Ellie Cheng (elliecheng.com), Guy Van den Broeck (@guyvdb), Aditya Grover (@adityagrover_), Suvinay Subramanian (@suvinay), Michael Carbin (@mcarbin) Paper link: arxiv.org/abs/2510.18087

English

0

5

17

2.5K

Guy Van den Broeck 리트윗함

Discrete Diffusion Reading Group@diffusion_llms·30 Oca

📢Feb 2 (Mon): Planned Diffusion 🙅Diffusion language models are capable of parallelizing text generation but can struggle with coherence in low time-step regimes. 💡Planned Diffusion unlocks a new axis of parallelism: Token-level parallelism ➡️ semantic parallelism ✍️Planned diffusion first generates a structured plan, then diffuses semantically independent spans of text in parallel according to the plan. This Monday, Daniel Israel (UCLA) (@danielmisrael) and Tian Jin (MIT) (@jintian) will discuss their exciting Planned Diffusion paper as joint first authors. Collaborators: Ellie Cheng (elliecheng.com), Guy Van den Broeck (@guyvdb), Aditya Grover (@adityagrover_), Suvinay Subramanian (@suvinay), Michael Carbin (@mcarbin) Paper link: arxiv.org/abs/2510.18087

Discrete Diffusion Reading Group tweet media

English

1

9

48

10K

Guy Van den Broeck 리트윗함

Luis Lamb@luislamb·19 Ara

⁦@RealAAAI⁩ AAAI2021Conference - Neuro-Symbolic AI Panel, during the COVID-19 crisis. With ⁦@kerstingAIML⁩ ⁦@guyvdb⁩ ⁦@mattbotvinick⁩ Marta Kwiatkowska, Leslie Pack Kaelbling. Just a picture 🙂

Română

0

3

8

567

Guy Van den Broeck@guyvdb·8 Ara

@YuanqiD Thanks, you can find the paper here: arxiv.org/abs/2508.03587

English

0

1

132

Yuanqi Du@YuanqiD·7 Ara

Best poster award "Zero-Variance Gradients for Variational Autoencoders" Zilei Shao and @guyvdb openreview.net/forum?id=Yy7jm…)

English

1

6

335

Yuanqi Du@YuanqiD·7 Ara

Kudos to the SPIGM workshop best paper and best poster winners!

English

1

4

13

3.9K

Guy Van den Broeck 리트윗함

NeSy 2026@nesyconf·3 Kas

Recordings of the NeSy 2025 keynotes are now available! 🎥 Check out insightful talks from @guyvdb , @tkipf and @dlmcguinness on our new Youtube channel. Topics include using symbolic reasoning for LLM, and object-centric representations @NeSyconference" target="_blank" rel="nofollow noopener">youtube.com/@NeSyconference

English

0

6

14

2.9K

Guy Van den Broeck@guyvdb·29 Eki

I gave a keynote at @nesyconf on "Symbolic Reasoning in the Age of Large Language Models" Check out the recording if you are curious about neurosymbolic generative AI: youtube.com/watch?v=OtmiJR…

YouTube

English

0

8

40

4.1K

Guy Van den Broeck 리트윗함

Alex Chen@itisalex3·16 Eki

What happens when we compress the KV cache of prompts with multiple instructions? 🤔 Existing compression methods can lead to some instructions being ignored. 🙀 We propose simple changes to KV cache eviction that fix this problem alongside other pitfalls to be aware of. 💯

English

2

17

2.5K

Guy Van den Broeck 리트윗함

Tian Jin@jintian·22 Eki

Plan autoregressively, denoise in parallel!

Daniel Israel@danielmisrael

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

English

0

2

6

1.4K

Guy Van den Broeck 리트윗함

Ellie Cheng@ellieyhc·22 Eki

Diffusion 🤝 Autoregressive Fast high-quality generation

Daniel Israel@danielmisrael

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

English

0

2

633

Guy Van den Broeck 리트윗함

Daniel Israel@danielmisrael·22 Eki

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

English

7

48

320

38.4K

Guy Van den Broeck@guyvdb·21 Eki

@aakaran31 Very cool work. You should probably cite proceedings.mlr.press/v202/shih23a/s… though which makes a lot of the same observations about non-myopic temperature scaling.

English

0

3

247

Aayush Karan@aakaran31·17 Eki

We found a new way to get language models to reason. 🤯 No RL, no training, no verifiers, no prompting. ❌ With better sampling, base models can achieve single-shot reasoning on par with (or better than!) GRPO while avoiding its characteristic loss in generation diversity.

English

73

249

1.7K

266.9K

Guy Van den Broeck 리트윗함

Daniel Israel@danielmisrael·18 Eyl

🔦Adaptive Parallel Decoding (APD) has been accepted as a spotlight paper at @NeurIPSConf ! I thank my collaborators, reviewers, and program organizers for this honor. A thread for those interested 🧵 (1/n)

English

12

25

175

17K

Guy Van den Broeck 리트윗함

NeSy 2026@nesyconf·8 Eyl

@e_giunchiglia @guyvdb How can reverend Bayes help us to incorporate constraints? With NeSy of course 👀 With applications in non-toxic LLM generation and safe AI driving! @guyvdb