Leitian Tao@NeurIPS 2025

55 posts

Leitian Tao@NeurIPS 2025 banner
Leitian Tao@NeurIPS 2025

Leitian Tao@NeurIPS 2025

@LeitianT

3rd Machine Learning PhD student at @WisconsinCS | research scientist intern @AIatMeta FAIR | |ex Research intern @Adobe | BS 23' @WHU_1893

เข้าร่วม Kasım 2021
656 กำลังติดตาม206 ผู้ติดตาม
ทวีตที่ปักหมุด
Leitian Tao@NeurIPS 2025
Leitian Tao@NeurIPS 2025@LeitianT·
Excited to share our new work on Hybrid Reinforcement (HERO) — combining verifiable and reward-model signals for reasoning RL. Verifiers are precise but brittle. Reward models are rich but noisy. In our new paper HERO, we show how to combine both!
Jason Weston@jaseweston

Hybrid Reinforcement (HERO): When Reward Is Sparse, It’s Better to Be Dense 🦸‍♂️ 💪 📝: arxiv.org/abs/2510.07242 - HERO bridges 0–1 verifiable rewards and dense reward models into one 'hybrid' RL method - Tackles the brittleness of binary signals and the noise of pure reward models -> better results! ✔️ Stratified normalization anchors dense scores within verifier groups ✔️ Variance-aware weighting emphasizes harder, high-variance prompts ✔️ Stable + informative rewards, no drift 📈 Results: 🔥 +11.7 pts vs RM-only, +9.2 pts vs verifier-only on hard-to-verify reasoning tasks 🔥 Generalizes across Qwen and OctoThinker models 🔥 Works well when training with easy-to-verify/hard-to-verify/mixed samples. Hybrid reward → stable, dense, reliable supervision, advancing reasoning RL 🧵(1/5)

English
4
7
101
10.9K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Sharon Li
Sharon Li@SharonYixuanLi·
Your LLM agent just mass-deleted a production database because it was confident it understood the task. It didn't. Avoiding these irreversible mistakes requires uncertainty quantification, a pressing open problem in the era of LLM agents. Check out our #ACL2026 paper: "Uncertainty Quantification in LLM Agents: Foundations, Emerging Challenges, and Opportunities" 🔍 Why this matters: LLM agents now book flights, modify databases, and execute code autonomously. Yet most UQ research still measures a single-turn QA setup. In contrast, agents follow multi-turn trajectories in which they interact with users, call tools, and receive environmental feedback. The gap between how we study UQ and how agents actually operate is enormous. ⚙️ A unified formulation: We present the first unified formulation of Agent UQ. It models the full trajectory (actions, observations, states) and decomposes uncertainty per turn via the chain rule. Under this formulation, single-step LLM UQ and multi-step reasoning UQ fall out as special cases. 🚧 Challenges: We identify four core challenges: from selecting the right UQ estimator when existing methods all break down in agentic settings to handling heterogeneous uncertainty sources (user, tools, environment) to the near-total lack of fine-grained agent benchmarks (we survey 44 and find that turn-level evaluation is extremely rare). 🌍 Implications and open problems: Agent UQ is the missing safety layer for healthcare agents triaging patients, SWE agents pushing code to prod, and agents controlling cyber-physical systems. We also surface open problems around solution multiplicity, multi-agent UQ, and self-evolving systems. We release code and data to help the community build on this. 📄 Paper: arxiv.org/abs/2602.05073 🌐 Project: agentuq.github.io 💻 Code: github.com/deeplearning-w… Huge shoutout to @changdaeoh, who spearheaded this effort. When we started the work, agent UQ was a loosely defined space with scattered ideas; Changdae brought the clarity, structure, and rigor that the field needed to move forward. Also thanks to all the collaborators: @seongheon_96 , To Eun Kim, @JiatongLi0418, @Wendi_Li_ , @Samuel861025 @xuefeng_du, Hamed Hassani, Paul Bogdan, Dawn Song
Sharon Li tweet media
English
4
41
243
14.5K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Sharon Li
Sharon Li@SharonYixuanLi·
We've been in GRPO-tweaking mode for months (entropy bonuses, clipping hacks, length penalties). But what if the entire objective is wrong? Today, we're releasing LAD (Learning Advantage Distributions), the most elegant rethink of RL for LLM reasoning I've seen this year. #ACL2026 Here's the idea, how it works, and why we think it changes things. 🧵 The problem we kept hitting GRPO, DAPO, RLOO, and many other variants do the same thing at their core: maximize expected reward. And when you do that, your policy can collapse onto a single dominant reasoning path. Entrop regularization can act as a bolt onto the framework, but it doesn't fundamentally fix it from the ground up. The key insight 💡Stop maximizing. Start matching. We reframe the policy update as a distribution matching problem. Instead of pushing toward the single best response, we make the policy's output distribution match the full advantage-weighted target distribution by minimizing an f-divergence between the two (see our theory in Section 3.1). When you match the full advantage distribution, you naturally preserve probability mass across multiple valid reasoning paths. High-advantage responses get upweighted, yes, but the objective also suppresses overconfident probability growth on any single mode. Collapse prevention isn't an afterthought. What validated the theory We tested six divergence families. The result that convinced us we were on the right track: - Strict divergences (Total Variation, Hellinger, Jensen-Shannon) that enforce exact distributional matching consistently outperform weaker ones (such as KL). - The more faithfully you learn the full advantage distribution, the better the reasoning. This is exactly what the framework predicts. The results - In a controlled bandit setting. LAD recovers multiple-mode advantage distributions (see plot below). GRPO fundamentally cannot. This is the clearest demonstration that the paradigm difference is real, not just theoretical - In math and code reasoning tasks across multiple LLM backbones. LAD consistently outperforms GRPO on both accuracy AND generative diversity across benchmarks. Why this matters beyond benchmarks Pass@k scaling: If your model knows 5 valid reasoning paths instead of 1, sampling at inference becomes massively more effective. Simplicity: Instead of stacking "GRPO + entropy hack," you get one principled objective. Diversity preservation comes by design. Paper: arxiv.org/abs/2602.20132 Code is available; link in the paper. Huge credit to my amazing student @Wendi_Li_, who drove this work, thinks boldly, and made things happen.
Sharon Li tweet media
English
7
48
373
30.8K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Jason Weston
Jason Weston@jaseweston·
🧮 Principia: Training LLMs to Reason over Mathematical Objects 📐 We release: - PrincipiaBench, a new eval for *mathematical objects* (not just numerical values or MCQ) - Principia Collection: training data that improves reasoning across the board. For models to help with scientific and mathematical work, you need to train on such data & test whether they can derive things like equations, sets, matrices, intervals, and piecewise functions. We show that this ends up improving the overall reasoning ability of your model for all tasks. Read more in the blog post: facebookresearch.github.io/RAM/blogs/prin…
Jason Weston tweet media
English
0
34
127
12.3K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Sharon Li
Sharon Li@SharonYixuanLi·
HERO has been accepted by #ICLR2026 - congratulations to @LeitianT and all co-authors!
Jason Weston@jaseweston

Hybrid Reinforcement (HERO): When Reward Is Sparse, It’s Better to Be Dense 🦸‍♂️ 💪 📝: arxiv.org/abs/2510.07242 - HERO bridges 0–1 verifiable rewards and dense reward models into one 'hybrid' RL method - Tackles the brittleness of binary signals and the noise of pure reward models -> better results! ✔️ Stratified normalization anchors dense scores within verifier groups ✔️ Variance-aware weighting emphasizes harder, high-variance prompts ✔️ Stable + informative rewards, no drift 📈 Results: 🔥 +11.7 pts vs RM-only, +9.2 pts vs verifier-only on hard-to-verify reasoning tasks 🔥 Generalizes across Qwen and OctoThinker models 🔥 Works well when training with easy-to-verify/hard-to-verify/mixed samples. Hybrid reward → stable, dense, reliable supervision, advancing reasoning RL 🧵(1/5)

English
0
9
94
12.9K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Sharon Li
Sharon Li@SharonYixuanLi·
When evaluating LVLMs, should we really be asking: “Did the model get the right answer?” or rather “Did the model truly integrate the visual input?” LVLMs can rely on shortcuts learned from the underlying language model, aka language prior. In our #ICLR2026 paper, we attempt to understand this phenomenon at a deeper, representation-level. 📄 “Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding”. arxiv.org/abs/2509.23050 ------- 1/ Problem: LVLMs often ignore visual evidence While LVLMs perform well on many benchmarks, they sometimes rely on language patterns rather than actual images. A simple example: show a model a green banana, and it may confidently describe it as “ripe and yellow” ---because that’s the most common linguistic pattern it has learned. 🍌 This raises a central question: Where inside the model does visual information begin to influence its reasoning? 2/ Motivation: Output-level probes fall short Most analyses inspect outputs, e.g., by removing the image or comparing predictions. But these methods cannot reveal when the model starts integrating vision and how strongly visual signals affect internal states. To address this, we need a representation-driven perspective. 🔍 3/ Approach: Contrasting Chain-of-Embedding (CoE) We trace hidden representations across the model’s depth for the same prompt: •once with the image •once without the image By comparing these trajectories layer by layer, we identify the exact point where visual input begins shaping the model’s internal computation. This leads to the discovery of the Visual Integration Point (VIP) ✨--- the layer at which the model “starts seeing.” We then define Total Visual Integration (TVI), a metric that quantifies how much visual influence accumulates after the VIP. 4/ Findings across 10 LVLMs and 6 benchmarks Across 60 evaluation settings, we observe: • VIP consistently appears across diverse architectures • Pre-VIP → representations behave like a language-only model • Post-VIP → visual signals increasingly reshape the embedding pathway • TVI correlates strongly with actual visual reasoning performance • TVI outperforms attention- and output-based proxies at identifying language prior TVI thus offers a more principled indicator of whether a model actually uses the image. 5/ Impact: A new lens on multimodal behavior Our framework has a few practical benefits. It enables (1) diagnosing over-reliance on language prior, (2) comparing LVLM architectures more rigorously, (3) informing better training and alignment strategies, and (4) improving robustness and grounding in real-world tasks. Shout out to my students for this insightful work: Lin Long, @Changdae_Oh, @seongheon_96 🌻 Please check out our paper for more details!
Sharon Li tweet media
English
2
34
227
14.6K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Minghao Yan
Minghao Yan@Minghao__Yan·
🚀 Thrilled to introduce PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution. We show how to push LLM self-evolution beyond short, unstable improvements and into consistent, long-horizon gains. 🧵👇
Minghao Yan tweet media
English
1
8
26
2.5K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Shawn Im
Shawn Im@shawnim00·
Excited to share our recent work selected as an ICLR Oral! 
We work towards answering how models learn to associate tokens and build semantic concepts. We find that early-stage features in attention-based models can be written as compositions of three basis features.
Shawn Im tweet media
English
2
29
162
54.3K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Yuxi Xiao
Yuxi Xiao@YuxiXiaohenry·
Are spatial abilities in MLLMs flat — or hierarchical? 🤔 And how do we systematically scale them across pre-training, SFT, and RL?🤔 🎯 We propose SpatialTree: How Spatial Abilities Branch Out in MLLMs 🔗 spatialtree.github.io
English
1
6
25
1.7K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Jason Weston
Jason Weston@jaseweston·
Our co-improvement position paper is now on arXiv! (We've updated it, covering more existing work.) 📝: arxiv.org/abs/2512.05356 After >27 years of research, my first position paper! Short 🧵 (1/5) follows 👇 Synopsis: it's about building AI that collaborates on AI research *with us* to solve AI faster, and to help fix the alignment problem together. How? Build the AI with those collab skills (i.e., we create benchmarks! training data! methods! etc. for that). I've been personally inspired by @Yoshua_Bengio's recent talks on safety & AI research, and also from seeing Nicholas Carlini's COLM keynote where he said we researchers can all do our bit to help (paraphrased). So – hope this helps! 🙏
Jason Weston tweet media
English
7
40
245
28.1K
Leitian Tao@NeurIPS 2025
Leitian Tao@NeurIPS 2025@LeitianT·
I’ll be at #NeurIPS @ SanDiego next week to present my paper! 🚩Exhibit Hall C,D,E #115 🕟Wed 3 Dec 11 a.m. PST — 2 p.m. PST My research focuses on LLM alignment, reasoning and agent, If you’d like to chat about research, please feel free to reach out -- let's connect!
Sharon Li@SharonYixuanLi

Collecting large human preference data is expensive—the biggest bottleneck in reward modeling. In our #NeurIPS2025 paper, we introduce latent-space synthesis for preference data, which is 18× faster and uses a network that’s 16,000× smaller (0.5M vs 8B parameters) than text-based synthesis methods. 📄arxiv.org/abs/2509.26074 🧵Thread below -------------------------------------------------------- Instead of generating and annotating new text, our approach — LENS (Latent EmbeddiNg Synthesis) — learns to expand preference datasets directly in embedding space. We train a variational autoencoder on existing preference data, then sample new latent vectors to synthesize diverse, high-quality preference pairs — all without text generation or extra human labeling. This simple shift from text space → latent space makes reward modeling dramatically more efficient while preserving semantic consistency. 📊 Results: - 18× faster data generation - 16,000× smaller augmentation model (0.5M vs 8B parameters) - On HH-RLHF and TL;DR benchmarks, LENS achieves large improvements over text-based augmentation baselines. - Generalizability: Works across different LLM backbones and shows the strongest gains in low-data regimes. 📚 Theoretical insights We show that latent-space synthetic pairs preserve preference ordering within a provable bound, and that augmentation with LENS improves the generalization error of reward models. In other words — it’s not just faster; it is theoretically grounded. While LENS is motivated by efficiency in preference data synthesis, its potential extends beyond. For example, LENS opens doors for: - Low-resource alignment: For languages, domains, or communities with limited human preference data, LENS can expand training sets in embedding space, helping close the data gap in pluristic alignment efforts. - Personalized reward modeling: Generating synthetic preferences in latent space tailored to individual user preferences or styles. Massive credit to @LeitianT for leading the effort and @xuefeng_du for mentorship. Couldn’t have done this without this stellar team!

English
0
2
13
2.2K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Sean Du
Sean Du@xuefeng_du·
🎉 Honored to be selected for @RealAAAI 26 New Faculty Highlights program! I’ll showcase research on 🤖 AI reliability: OOD detection, LLM hallucination & alignment in person. See you at #AAAI26 at Singapore in January next year!
Sean Du tweet media
English
5
7
57
19.6K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Jason Weston
Jason Weston@jaseweston·
🌶️SPICE: Self-Play in Corpus Environments🌶️ 📝: arxiv.org/abs/2510.24684 - Challenger creates tasks based on *corpora* - Reasoner solves them - Both trained together ⚔️ -> automatic curriculum! 🔥 Outperforms standard (ungrounded) self-play Grounding fixes hallucination & lack of diversity 🧵1/6
Jason Weston tweet media
English
8
54
333
79.9K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Sharon Li
Sharon Li@SharonYixuanLi·
Deception is one of the most concerning behaviors that advanced AI systems can display. If you are not concerned yet, this paper might change your view. We built a multi-agent framework to study: 👉 How deceptive behaviors can emerge and evolve in LLM agents during realistic long-horizon interactions? 🎯 Motivation When we talk about AI deception, we mean an AI can produce outputs that mislead someone — deliberately or strategically — to achieve a goal or avoid a consequence. Examples: 🕵️ Intentionally hiding part of the truth (to make itself look more successful) 🤐 Giving vague answers so it can’t be blamed later 🧢 Saying something false to pass a “test” or finish a task Most evaluations look at one-shot prompts. But deception doesn’t always show up in a single exchange. It can develop gradually over time — as the model plans, reacts to pressure, or tries to “look good” under supervision. That’s the gap we wanted to study. ---------------------------- 🧪 Our framework We built a multi-agent simulation with three key roles: 1. Performer agent — the agent completing complex, interdependent tasks. 2. Supervisor agent — tracking progress and forming trust judgments as the interaction unfolds. 3. Deception auditor — independently reviewing the entire trajectory to detect deceptive behaviors. This setup enables us to observe not only whether deception occurs, but also how it emerges, escalates, and erodes trust over extended periods. 📊 What we found - Deception is model-dependent — some models are more prone to engage in deceptive strategies than others (see Table for more). - Deceptive behaviors are more likely under event pressure (when the performer faces setbacks or high-stakes conditions). - Deception systematically erodes supervisor trust across long horizons. 🤝 Closing thoughts Our work doesn’t claim to solve deception — it’s a step toward understanding it in more realistic, dynamic settings. We hope this simulation framework becomes a foundation for the field: a practical way to evaluate long-horizon deception, a strong baseline that future safety research can build on, and a concrete tool to help guide governance discussions around responsible AI systems. ------- Huge shout-out to Yang Xu, @xuanmingzhangai, @Samuel861025 for driving this work forward over the last few months. We’re also grateful to our collaborators (@jwaladhamala, Ousmane Dia, @rahul1987iit) at @amazon AGI team supporting and contributing to this work. 📝 “Simulating and Understanding Deceptive Behaviors in Long-Horizon Interactions” 📄 arxiv.org/abs/2510.03999 Code: github.com/deeplearning-w… #AI #LLM #Deception #Trust #AIethics #AgenticAI #AIResearch
Sharon Li tweet media
English
16
49
224
27.2K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Sharon Li
Sharon Li@SharonYixuanLi·
Human preference data is noisy: inconsistent labels, annotator bias, etc. No matter how fancy the post-training algorithm is, bad data can sink your model. 🔥 @Samuel861025 and I are thrilled to release PrefCleanBench — a systematic benchmark for evaluating data cleaning strategies in preference data. It has been recently accepted to the #neurips2025 DB track. 💡 We benchmarked 13 cleaning methods across datasets, backbones, and preference optimization algorithms — and found: 🧼 Cleaning consistently boosts alignment. 👥 Committees of reward models are especially robust. ✂️ Removing bad data works better than simply flipping labels. 📉 Smaller but cleaner datasets can outperform larger noisy ones. This work also echoes the concerns we put forward in the #ICML2025 position paper "Challenges and Future Directions of Data-Centric AI Alignment". Back then, we argued that data quality was the key underexplored piece in alignment. PrefCleanBench is a concrete step toward tackling those challenges. With PrefCleanBench, we hope to make data hygiene a first-class consideration in the alignment pipeline — something measured, benchmarked, and improved systematically, not just assumed away. We also see this as an invitation: there’s so much room for innovation in cleaning methods, data auditing, annotation protocols, and feedback design. Getting this layer right is critical for building aligned, reliable systems at scale. 📄 Paper: arxiv.org/pdf/2509.23564 📝 Position paper: arxiv.org/abs/2410.01957 💻 Code: github.com/deeplearning-w… #LLM #AIAlignment #MachineLearning #DataQuality
Sharon Li tweet media
English
7
44
238
16K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Jason Weston
Jason Weston@jaseweston·
💃New Multi-Agent RL Method: WaltzRL💃 📝: arxiv.org/abs/2510.08240 - Makes LLM safety a positive-sum game between a conversation & feedback agent - At inference feedback is adaptive, used when needed -> Improves safety & reduces overrefusals without degrading capabilities! 🧵1/5
Jason Weston tweet media
English
5
33
151
24.3K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Sharon Li
Sharon Li@SharonYixuanLi·
Your LVLM says: “There’s a cat on the table.” But… there’s no cat in the image. Not even a whisker. This is object hallucination — one of the most persistent reliability failures in multi-modal language models. Our new #NeurIPS2025 paper introduces GLSim, a simple but powerful training-free method to catch these phantom objects by combining global and local embedding similarity signals. Unlike previous detectors that rely on either global or local signals alone, GLSim fuses both: - Global similarity: how well an object semantically fits the overall scene. - Local similarity: whether any specific region in the image actually supports the object’s presence. → Together, they catch hallucinations with higher accuracy. 🔸 Results: Across multiple benchmarks and backbones, GLSim significantly outperforms previous methods, showing better accuracy and generalization—all while being lightweight and model-agnostic. ✅ Tested on Llava, MiniGPT-4, Shikra, InternVL, Qwen2.5-VL, InstructBLIP, and Cambrian. 🔸 Why it’s exciting: - Boosts LVLM reliability out of the box — no fine-tuning required - Easy to integrate into evaluation or deployment pipelines - Provides interpretable signals for auditing model outputs 📄 Paper: arxiv.org/abs/2508.19972 💻 Code: github.com/deeplearning-w… Huge thanks to @seongheon_96 for contributing such a practical toolbox as we push LVLMs toward safer, more reliable vision-language understanding. #AI #Multimodal #HallucinationDetection #LVLM
Sharon Li tweet media
English
3
45
229
20.6K
Sharon Li
Sharon Li@SharonYixuanLi·
Collecting large human preference data is expensive—the biggest bottleneck in reward modeling. In our #NeurIPS2025 paper, we introduce latent-space synthesis for preference data, which is 18× faster and uses a network that’s 16,000× smaller (0.5M vs 8B parameters) than text-based synthesis methods. 📄arxiv.org/abs/2509.26074 🧵Thread below -------------------------------------------------------- Instead of generating and annotating new text, our approach — LENS (Latent EmbeddiNg Synthesis) — learns to expand preference datasets directly in embedding space. We train a variational autoencoder on existing preference data, then sample new latent vectors to synthesize diverse, high-quality preference pairs — all without text generation or extra human labeling. This simple shift from text space → latent space makes reward modeling dramatically more efficient while preserving semantic consistency. 📊 Results: - 18× faster data generation - 16,000× smaller augmentation model (0.5M vs 8B parameters) - On HH-RLHF and TL;DR benchmarks, LENS achieves large improvements over text-based augmentation baselines. - Generalizability: Works across different LLM backbones and shows the strongest gains in low-data regimes. 📚 Theoretical insights We show that latent-space synthetic pairs preserve preference ordering within a provable bound, and that augmentation with LENS improves the generalization error of reward models. In other words — it’s not just faster; it is theoretically grounded. While LENS is motivated by efficiency in preference data synthesis, its potential extends beyond. For example, LENS opens doors for: - Low-resource alignment: For languages, domains, or communities with limited human preference data, LENS can expand training sets in embedding space, helping close the data gap in pluristic alignment efforts. - Personalized reward modeling: Generating synthetic preferences in latent space tailored to individual user preferences or styles. Massive credit to @LeitianT for leading the effort and @xuefeng_du for mentorship. Couldn’t have done this without this stellar team!
Sharon Li tweet media
English
6
57
324
31.2K
Leitian Tao@NeurIPS 2025
Leitian Tao@NeurIPS 2025@LeitianT·
Thrilled to announce our #NeurIPS2025 paper on LENS (Latent EmbeddiNg Synthesis)! LENS learns to synthesize preference pairs directly in latent space — 🚀 18× faster data generation ⚙️ 16 000× smaller model Efficient, theoretically grounded, and scalable reward modeling.
Sharon Li@SharonYixuanLi

Collecting large human preference data is expensive—the biggest bottleneck in reward modeling. In our #NeurIPS2025 paper, we introduce latent-space synthesis for preference data, which is 18× faster and uses a network that’s 16,000× smaller (0.5M vs 8B parameters) than text-based synthesis methods. 📄arxiv.org/abs/2509.26074 🧵Thread below -------------------------------------------------------- Instead of generating and annotating new text, our approach — LENS (Latent EmbeddiNg Synthesis) — learns to expand preference datasets directly in embedding space. We train a variational autoencoder on existing preference data, then sample new latent vectors to synthesize diverse, high-quality preference pairs — all without text generation or extra human labeling. This simple shift from text space → latent space makes reward modeling dramatically more efficient while preserving semantic consistency. 📊 Results: - 18× faster data generation - 16,000× smaller augmentation model (0.5M vs 8B parameters) - On HH-RLHF and TL;DR benchmarks, LENS achieves large improvements over text-based augmentation baselines. - Generalizability: Works across different LLM backbones and shows the strongest gains in low-data regimes. 📚 Theoretical insights We show that latent-space synthetic pairs preserve preference ordering within a provable bound, and that augmentation with LENS improves the generalization error of reward models. In other words — it’s not just faster; it is theoretically grounded. While LENS is motivated by efficiency in preference data synthesis, its potential extends beyond. For example, LENS opens doors for: - Low-resource alignment: For languages, domains, or communities with limited human preference data, LENS can expand training sets in embedding space, helping close the data gap in pluristic alignment efforts. - Personalized reward modeling: Generating synthetic preferences in latent space tailored to individual user preferences or styles. Massive credit to @LeitianT for leading the effort and @xuefeng_du for mentorship. Couldn’t have done this without this stellar team!

English
0
1
8
1.4K
Leitian Tao@NeurIPS 2025 รีทวีตแล้ว
Sharon Li
Sharon Li@SharonYixuanLi·
Excited to share our #NeurIPS2025 paper: Visual Instruction Bottleneck Tuning (Vittle) Multimodal LLMs do great in-distribution, but often break in the wild. Scaling data or models helps, but it’s costly. 💡 Our work is inspired by the Information Bottleneck (IB) principle, which promotes representations that discard non-essential features tied to the input modality while preserving those critical for solving the task. This is ideal for robust instruction tuning because it facilitates invariance to low-level superficial features, enabling generalization. ⚠️ But here’s the catch: integrating IB into MLLMs is highly non-trivial. Why? - Mutual information estimation is intractable at scale - Autoregressive & multimodal architectures make standard IB formulations break down ✅ Our contributions: (1) Derive a new variational lower bound of the IB objective tailored to MLLMs (2) Provide a practical implementation (Vittle) as a lightweight, scalable module (3) Show via extensive evaluations (45 datasets, 30 distribution shift scenarios) that Vittle consistently boosts robustness across open QA, closed QA, and hallucination detection. Paper: arxiv.org/abs/2505.13946 Huge kudos to the team: @Changdae_Oh @JiatongLi0418 @shawnim00 A personal note: This project was particularly challenging — deriving new theory, building a practical implementation, and running massive training + evaluation experiments, all within a resource-constrained academic setting. The fact that the team made all of this happen is nothing short of a miracle to me.
Sharon Li tweet media
English
2
37
241
20.5K