Yipeng Zhang

NICE AI Talk@academic_nice

23

80

9.2K

Yipeng Zhang retweetet

Emiliano Penaloza@emilianopp_·1d

Come check out the talk for a deep breakdown of my recent work/blog :)

NICE Talk 148🌟 invites @emilianopp_, a PhD student at Mila-Quebec & Université de Montréal, to discuss how LLMs can learn from privileged information during training — without needing it at test time. 📖 Paper: Privileged Information Distillation for Language Models — [arxiv.org/pdf/2602.04942] ⏰ Time: 3.20 (Fri) 9:00 PM - 10:00 PM EDT 3.20 (Fri) 6:00 PM - 7:00 PM PDT 📌 Register: luma.com/dll9x6f5 📌 Watch live: youtube.com/watch?v=SUb4M7… ✨This talk is hosted by @Haolun_Wu0203, Ph.D. at Mila & McGill What if your model could train with a "cheat sheet" — but still ace the test without it? Emiliano presents Privileged Information Distillation, a unified post-training framework that bridges the gap between hinted training and non-privileged inference. ⭐ Key findings: 🧐 Privileged information during training significantly boosts LLM performance — but design choices matter enormously for generalization; 🤠 A variational framework + on-policy distillation outperforms strong baselines including SFT + GRPO; 🤪 Most surprisingly, not all privileged information is equal — the right hints incentivize generalization, while the wrong ones don't. #AI #LLM #PrivilegedInformation #Distillation #PostTraining #Reasoning #NICE #NexusForIntelligence

English

World Modeling Workshop@worldmodel_conf

4

12

891

Yipeng Zhang retweetet

Benjamin Thérien@benjamintherien·26 Şub

Are frontier LLMs trained across datacenters? One thing is certain: if the pre-training optimizer’s critical batch size is too small, they are NOT! Excited to announce MuLoCo, a pre-training optimizer that can efficiently pre-train across datacenters while having large enough batch sizes to warrant doing so. 🧵1/N

English

3

34

95

16.6K

Yipeng Zhang retweetet

Randall Balestriero@randall_balestr·25 Şub

World Modeling research needs fast iteration, reproducibility, optimized baselines, open-source, and precise zero-shot stress testing. Here comes stable-worldmodel! Paper: arxiv.org/abs/2602.08968 Code: github.com/galilai-group/… Come stress-test your model/idea! DINO-WM results ⬇️

English

21

48

252

41.1K

Yipeng Zhang retweetet

Emiliano Penaloza@emilianopp_·19 Şub

x.com/i/article/2024…

ZXX

4

74

513

149.5K

Yipeng Zhang retweetet

Sébastien Lachapelle@seblachap·6 Şub

I had a lot of fun meeting all the smart people at this workshop and presenting my work "On the Identifiability of Latent Action Policies" as an oral! A huge thanks to the organizers! Paper: arxiv.org/abs/2510.01337

What an awesome first day! Thank you all for joining and listening to our amazing speakers: @SchmidhuberAI, @sherryyangML, @cosmo_shirley, @Yoshua_Bengio, @ylecun, @mido_assran World Models have beautiful days ahead. This is just the beginning 🫡

English

4

25

2.3K

Yipeng Zhang retweetet

Emiliano Penaloza@emilianopp_·6 Şub

Remember all the self-distillation papers that came out last week. Well, we also propose it 😅, but… But alongside something better 😎 π-Distill We show that with this method, you can distill closed-source frontier models even tho their traces are hidden 🔒. Both our methods can reach and even surpass the performance of the industry-standard SFT + RL with access to reasoning traces 🤯. 🔬And we spent ~100,000 hours GPU hours on a comprehensive analysis, not because the method is finicky, but because we wanted to understand why it works so well. 🧵 1/10

English

11

78

428

45.2K

Yipeng Zhang retweetet

World Modeling Workshop@worldmodel_conf·5 Şub

What an awesome first day! Thank you all for joining and listening to our amazing speakers: @SchmidhuberAI, @sherryyangML, @cosmo_shirley, @Yoshua_Bengio, @ylecun, @mido_assran World Models have beautiful days ahead. This is just the beginning 🫡

English

11

69

6.8K

Yipeng Zhang@yipengzz·4 Şub

I'm at @worldmodel_26 now through Friday. Lmk if you want to chat!

Yipeng Zhang@yipengzz

How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. arxiv.org/abs/2602.02381 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵

English

2

11

1.5K

Yipeng Zhang@yipengzz·4 Şub

We hope AdaSSL inspires new ideas for using joint-embedding SSL to learn better representations and world models on naturally structured data. Kudos to my amazing collaborators: @hafezghm @yololulu_ @ShahabBakht @NeuralEnsemble @lcharlin Paper: arxiv.org/abs/2602.02381. Code coming soon. See you in Rio 🇧🇷 🧵/🧵

English

2

127

Yipeng Zhang@yipengzz·4 Şub

Across experiments, AdaSSL-V consistently improves both contrastive and distillation-based SSL. AdaSSL-S reliably improves contrastive SSL, but less so with distillation. Why? Here are the plots of r-space usage (sparsity and diversity of the gated modules), on InfoNCE vs BYOL. With BYOL, AdaSSL-S often underutilizes r, reflected by lower diversity. Hypothesis: (sample- or dimension-)contrastive objectives explicitly regularize information content in the embeddings, which forces the model to use r. Distillation lacks this direct pressure, so r may need extra regularization. Curious to see how this finding affects practical use cases… 10/🧵

English

0

1

117

Yipeng Zhang@yipengzz·3 Şub

How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. arxiv.org/abs/2602.02381 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵

English

23

80

9.2K

Yipeng Zhang retweetet

Lichen Zhang@LichenZlichenz·4 Şub

Attention mechanism usually requires quadratic time in sequence length to form exactly, and many works give nearly linear time algorithms to approximate the attention matrix. Can we design a *sublinear* time algorithm on a quantum computer? 1/12

English

4

5

261

Yipeng Zhang retweetet

World Modeling Workshop@worldmodel_conf·3 Şub

Streaming link, as promised. 🎥 Thanks for your patience, enjoy! We hope to see you all online 🫣. February 4: youtube.com/live/7Gyuar7nM… February 5: youtube.com/live/GO-ct_v0n…

YouTube

GIF

English

5

17

2K

Yipeng Zhang retweetet

Emiliano Penaloza@emilianopp_·3 Şub

Thos work solves one of the biggest bottlenecks in SSL/World models

Yipeng Zhang@yipengzz

How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. arxiv.org/abs/2602.02381 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵

English

1

5

233

Yipeng Zhang retweetet

Hafez Ghaemi@hafezghm·3 Şub

Check out our recent work, accepted to #ICLR2026! We address the challenge of handling uncertainty in world modeling with joint-embedding SSL.

Yipeng Zhang@yipengzz

How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. arxiv.org/abs/2602.02381 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵

English