Chongyu Fan

8 posts

Chongyu Fan

Chongyu Fan

@ChongyuFan

PhD student @ Michigan State University

Katılım Mayıs 2024
373 Takip Edilen30 Takipçiler
Chongyu Fan retweetledi
Changsheng Wang @ NeurIPS
Changsheng Wang @ NeurIPS@wcsa23187·
🎯 Our EMNLP 2025 Main paper “Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills” goes live soon! Catch us on Wednesday in Suzhou at #EMNLP2025 🇨🇳 🔗 Paper Link: arxiv.org/pdf/2506.12963 🏡 Project Page Link: r2mu.netlify.app 🗓 November 5, 11:00–12:30 CST (UTC+8) 📍 Hall C, Section 2, 500-Main 🧍 I won’t be there in person — but feel free to chat with my co-authors! 🧠 The Problem You’ve erased sensitive answers from your LRM. But the reasoning traces, the step-by-step “thoughts” that led there, still remain. Even after unlearning, the model can reconstruct or re-infer forgotten answers through these traces. So the question is: 👉 Can we truly forget reasoning traces, while preserving the model’s reasoning ability? 🎯 Our Solution: R²MU (Reasoning-aware Representation Misdirection for Unlearning) We go beyond answer-level forgetting and target the reasoning process itself. R²MU suppresses sensitive reasoning traces while maintaining general reasoning competence. Through representation misdirection, the model unthinks unsafe reasoning paths, while CoT supervision preserves valid reasoning skills. ⚙️ How it Works 🔄 Unthinking Loss: misaligns hidden representations of sensitive reasoning traces with randomized features. 💡 Reasoning Preservation: uses CoT datasets (like LIMO) to retain problem-solving ability. ✅ R²MU erases reasoning traces — not just answers. ✅ Preserves general reasoning and utility across diverse benchmarks. ✅ Achieves the lowest reasoning-trace leakage (RT-UA ↓) on unlearning benchmark WMDP and LRM safety benchmark STAR-1, while maintaining top reasoning accuracy on AIME, MATH-500, and GPQA. 👥 With amazing collaborators from MSU: @ChongyuFan ,@zyh2022 , @jia_jinghan , and my advisor @sijialiu17 . 🙏 Grateful to our IBM collaborators from @MITIBMLab : @NathalieBaraca1 , Dennis Wei, @p_ram_p.
Changsheng Wang @ NeurIPS tweet media
English
0
1
18
2.2K
Chongyu Fan
Chongyu Fan@ChongyuFan·
Although I won’t be attending in person, feel free to stop by our poster on Wed, July 16 @ 4:30 pm PT (East Exhibition Hall A-B, E-2803) to chat with my co-authors about the work!
English
0
0
0
99
Chongyu Fan
Chongyu Fan@ChongyuFan·
3. We integrate unlearning methods with smoothness optimization and validate their effectiveness through extensive experiments (e.g., on WMDP and MUSE) against both relearning and jailbreaking attacks.
English
1
0
0
92
Chongyu Fan
Chongyu Fan@ChongyuFan·
2. Beyond SAM, we explore a suite of smoothness optimization techniques, including gradient penalty, curvature regularization, randomized smoothing, and weight averaging.
English
1
0
0
52
Chongyu Fan
Chongyu Fan@ChongyuFan·
🔬Our key insight: 1. We frame relearning as a min–max adversarial game over model weights, and connect it to Sharpness-Aware Minimization (SAM)—which can flatten the loss landscape to make forgetting stick.
English
1
0
0
35
Chongyu Fan
Chongyu Fan@ChongyuFan·
Excited to share our ICML '25 work on robust LLM unlearning—“Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond”. 🔗 Paper: arxiv.org/abs/2502.05374 🗓️ Poster: Wed, July 16 @ 4:30 pm PT | E‑2803 🧵
Chongyu Fan tweet media
English
1
2
4
384
Chongyu Fan
Chongyu Fan@ChongyuFan·
🎯 What problem do we tackle? Current LLM unlearning methods aim to erase sensitive data—but remain vulnerable to relearning attacks, where even a few examples can resurrect forgotten knowledge. So how can we make unlearning truly robust?
English
1
0
0
34