Chongyu Fan retweetledi

🎯 Our EMNLP 2025 Main paper
“Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills” goes live soon!
Catch us on Wednesday in Suzhou at #EMNLP2025 🇨🇳
🔗 Paper Link: arxiv.org/pdf/2506.12963
🏡 Project Page Link: r2mu.netlify.app
🗓 November 5, 11:00–12:30 CST (UTC+8)
📍 Hall C, Section 2, 500-Main
🧍 I won’t be there in person — but feel free to chat with my co-authors!
🧠 The Problem
You’ve erased sensitive answers from your LRM.
But the reasoning traces, the step-by-step “thoughts” that led there, still remain.
Even after unlearning, the model can reconstruct or re-infer forgotten answers through these traces.
So the question is:
👉 Can we truly forget reasoning traces, while preserving the model’s reasoning ability?
🎯 Our Solution: R²MU (Reasoning-aware Representation Misdirection for Unlearning)
We go beyond answer-level forgetting and target the reasoning process itself.
R²MU suppresses sensitive reasoning traces while maintaining general reasoning competence.
Through representation misdirection, the model unthinks unsafe reasoning paths, while CoT supervision preserves valid reasoning skills.
⚙️ How it Works
🔄 Unthinking Loss: misaligns hidden representations of sensitive reasoning traces with randomized features.
💡 Reasoning Preservation: uses CoT datasets (like LIMO) to retain problem-solving ability.
✅ R²MU erases reasoning traces — not just answers.
✅ Preserves general reasoning and utility across diverse benchmarks.
✅ Achieves the lowest reasoning-trace leakage (RT-UA ↓) on unlearning benchmark WMDP and LRM safety benchmark STAR-1, while maintaining top reasoning accuracy on AIME, MATH-500, and GPQA.
👥 With amazing collaborators from MSU: @ChongyuFan ,@zyh2022 , @jia_jinghan , and my advisor @sijialiu17 .
🙏 Grateful to our IBM collaborators from @MITIBMLab : @NathalieBaraca1 , Dennis Wei, @p_ram_p.

English

