Yipeng Zhang

57 posts

Yipeng Zhang

Yipeng Zhang

@yipengzz

phd student @mila_quebec

เข้าร่วม Ekim 2021
58 กำลังติดตาม60 ผู้ติดตาม
ทวีตที่ปักหมุด
Yipeng Zhang
Yipeng Zhang@yipengzz·
How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. arxiv.org/abs/2602.02381 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵
English
2
23
80
9.2K
Yipeng Zhang รีทวีตแล้ว
Emiliano Penaloza
Emiliano Penaloza@emilianopp_·
Come check out the talk for a deep breakdown of my recent work/blog :)
NICE AI Talk@academic_nice

NICE Talk 148🌟 invites @emilianopp_, a PhD student at Mila-Quebec & Université de Montréal, to discuss how LLMs can learn from privileged information during training — without needing it at test time. 📖 Paper: Privileged Information Distillation for Language Models — [arxiv.org/pdf/2602.04942] ⏰ Time: 3.20 (Fri) 9:00 PM - 10:00 PM EDT 3.20 (Fri) 6:00 PM - 7:00 PM PDT 📌 Register: luma.com/dll9x6f5 📌 Watch live: youtube.com/watch?v=SUb4M7… ✨This talk is hosted by @Haolun_Wu0203, Ph.D. at Mila & McGill What if your model could train with a "cheat sheet" — but still ace the test without it? Emiliano presents Privileged Information Distillation, a unified post-training framework that bridges the gap between hinted training and non-privileged inference. ⭐ Key findings: 🧐 Privileged information during training significantly boosts LLM performance — but design choices matter enormously for generalization; 🤠 A variational framework + on-policy distillation outperforms strong baselines including SFT + GRPO; 🤪 Most surprisingly, not all privileged information is equal — the right hints incentivize generalization, while the wrong ones don't. #AI #LLM #PrivilegedInformation #Distillation #PostTraining #Reasoning #NICE #NexusForIntelligence

English
2
4
12
895
Yipeng Zhang รีทวีตแล้ว
Benjamin Thérien
Benjamin Thérien@benjamintherien·
Are frontier LLMs trained across datacenters? One thing is certain: if the pre-training optimizer’s critical batch size is too small, they are NOT! Excited to announce MuLoCo, a pre-training optimizer that can efficiently pre-train across datacenters while having large enough batch sizes to warrant doing so. 🧵1/N
Benjamin Thérien tweet media
English
3
34
95
16.6K
Yipeng Zhang รีทวีตแล้ว
Randall Balestriero
Randall Balestriero@randall_balestr·
World Modeling research needs fast iteration, reproducibility, optimized baselines, open-source, and precise zero-shot stress testing. Here comes stable-worldmodel! Paper: arxiv.org/abs/2602.08968 Code: github.com/galilai-group/… Come stress-test your model/idea! DINO-WM results ⬇️
English
21
48
252
41.1K
Yipeng Zhang รีทวีตแล้ว
Sébastien Lachapelle
Sébastien Lachapelle@seblachap·
I had a lot of fun meeting all the smart people at this workshop and presenting my work "On the Identifiability of Latent Action Policies" as an oral! A huge thanks to the organizers! Paper: arxiv.org/abs/2510.01337
World Modeling Workshop@worldmodel_conf

What an awesome first day! Thank you all for joining and listening to our amazing speakers: @SchmidhuberAI, @sherryyangML, @cosmo_shirley, @Yoshua_Bengio, @ylecun, @mido_assran World Models have beautiful days ahead. This is just the beginning 🫡

English
1
4
25
2.3K
Yipeng Zhang รีทวีตแล้ว
Emiliano Penaloza
Emiliano Penaloza@emilianopp_·
Remember all the self-distillation papers that came out last week. Well, we also propose it 😅, but… But alongside something better 😎 π-Distill We show that with this method, you can distill closed-source frontier models even tho their traces are hidden 🔒. Both our methods can reach and even surpass the performance of the industry-standard SFT + RL with access to reasoning traces 🤯. 🔬And we spent ~100,000 hours GPU hours on a comprehensive analysis, not because the method is finicky, but because we wanted to understand why it works so well. 🧵 1/10
English
11
78
428
45.2K
Yipeng Zhang
Yipeng Zhang@yipengzz·
I'm at @worldmodel_26 now through Friday. Lmk if you want to chat!
Yipeng Zhang@yipengzz

How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. arxiv.org/abs/2602.02381 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵

English
0
2
11
1.5K
Yipeng Zhang
Yipeng Zhang@yipengzz·
Across experiments, AdaSSL-V consistently improves both contrastive and distillation-based SSL. AdaSSL-S reliably improves contrastive SSL, but less so with distillation. Why? Here are the plots of r-space usage (sparsity and diversity of the gated modules), on InfoNCE vs BYOL. With BYOL, AdaSSL-S often underutilizes r, reflected by lower diversity. Hypothesis: (sample- or dimension-)contrastive objectives explicitly regularize information content in the embeddings, which forces the model to use r. Distillation lacks this direct pressure, so r may need extra regularization. Curious to see how this finding affects practical use cases… 10/🧵
Yipeng Zhang tweet media
English
1
0
1
117
Yipeng Zhang
Yipeng Zhang@yipengzz·
How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. arxiv.org/abs/2602.02381 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵
English
2
23
80
9.2K
Yipeng Zhang รีทวีตแล้ว
Lichen Zhang
Lichen Zhang@LichenZlichenz·
Attention mechanism usually requires quadratic time in sequence length to form exactly, and many works give nearly linear time algorithms to approximate the attention matrix. Can we design a *sublinear* time algorithm on a quantum computer? 1/12
English
1
4
5
261
Yipeng Zhang รีทวีตแล้ว
Emiliano Penaloza
Emiliano Penaloza@emilianopp_·
Thos work solves one of the biggest bottlenecks in SSL/World models
Yipeng Zhang@yipengzz

How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. arxiv.org/abs/2602.02381 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵

English
0
1
5
233
Yipeng Zhang รีทวีตแล้ว
Hafez Ghaemi
Hafez Ghaemi@hafezghm·
Check out our recent work, accepted to #ICLR2026! We address the challenge of handling uncertainty in world modeling with joint-embedding SSL.
Yipeng Zhang@yipengzz

How can we predict multiple plausible targets from a single context in joint-embedding self-supervised learning (SSL)? Check out our paper titled “Self-Supervised Learning from Structural Invariance” accepted at #ICLR2026! Previously Best Paper Award at @unireps 2025. arxiv.org/abs/2602.02381 We introduce AdaSSL, which models the target uncertainty and relaxes the standard assumption that the positive pair share the same semantic features. Derived from first principles, we realize @ylecun’s JEPA with a learned latent variable for jointly learning better representations and world models, extending SSL’s utility to a broader range of data types. 1/🧵

English
0
3
14
1.6K