Chongyu Fan (@ChongyuFan) - Twitter Profili | Zamantika Mersobahis Locabet

Chongyu Fan retweetledi

Changsheng Wang @ NeurIPS@wcsa23187·4 Kas

🎯 Our EMNLP 2025 Main paper “Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills” goes live soon! Catch us on Wednesday in Suzhou at #EMNLP2025 🇨🇳 🔗 Paper Link: arxiv.org/pdf/2506.12963 🏡 Project Page Link: r2mu.netlify.app 🗓 November 5, 11:00–12:30 CST (UTC+8) 📍 Hall C, Section 2, 500-Main 🧍 I won’t be there in person — but feel free to chat with my co-authors! 🧠 The Problem You’ve erased sensitive answers from your LRM. But the reasoning traces, the step-by-step “thoughts” that led there, still remain. Even after unlearning, the model can reconstruct or re-infer forgotten answers through these traces. So the question is: 👉 Can we truly forget reasoning traces, while preserving the model’s reasoning ability? 🎯 Our Solution: R²MU (Reasoning-aware Representation Misdirection for Unlearning) We go beyond answer-level forgetting and target the reasoning process itself. R²MU suppresses sensitive reasoning traces while maintaining general reasoning competence. Through representation misdirection, the model unthinks unsafe reasoning paths, while CoT supervision preserves valid reasoning skills. ⚙️ How it Works 🔄 Unthinking Loss: misaligns hidden representations of sensitive reasoning traces with randomized features. 💡 Reasoning Preservation: uses CoT datasets (like LIMO) to retain problem-solving ability. ✅ R²MU erases reasoning traces — not just answers. ✅ Preserves general reasoning and utility across diverse benchmarks. ✅ Achieves the lowest reasoning-trace leakage (RT-UA ↓) on unlearning benchmark WMDP and LRM safety benchmark STAR-1, while maintaining top reasoning accuracy on AIME, MATH-500, and GPQA. 👥 With amazing collaborators from MSU: @ChongyuFan ,@zyh2022 , @jia_jinghan , and my advisor @sijialiu17 . 🙏 Grateful to our IBM collaborators from @MITIBMLab : @NathalieBaraca1 , Dennis Wei, @p_ram_p.

English

2.2K

Chongyu Fan@ChongyuFan·15 Tem

Although I won’t be attending in person, feel free to stop by our poster on Wed, July 16 @ 4:30 pm PT (East Exhibition Hall A-B, E-2803) to chat with my co-authors about the work!

English

Chongyu Fan@ChongyuFan·15 Tem

🙏 Big thanks to my advisor @sijialiu17 and amazing collaborators @zyh2022, @jia_jinghan , @Mingyi552237, and @anil_k_ram!

English

161

Chongyu Fan@ChongyuFan·15 Tem

3. We integrate unlearning methods with smoothness optimization and validate their effectiveness through extensive experiments (e.g., on WMDP and MUSE) against both relearning and jailbreaking attacks.

English

Chongyu Fan@ChongyuFan·15 Tem

2. Beyond SAM, we explore a suite of smoothness optimization techniques, including gradient penalty, curvature regularization, randomized smoothing, and weight averaging.

English

Chongyu Fan@ChongyuFan·15 Tem

🔬Our key insight: 1. We frame relearning as a min–max adversarial game over model weights, and connect it to Sharpness-Aware Minimization (SAM)—which can flatten the loss landscape to make forgetting stick.

English

Chongyu Fan@ChongyuFan·15 Tem

Excited to share our ICML '25 work on robust LLM unlearning—“Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond”. 🔗 Paper: arxiv.org/abs/2502.05374 🗓️ Poster: Wed, July 16 @ 4:30 pm PT | E‑2803 🧵

English

384

Chongyu Fan@ChongyuFan·15 Tem

🎯 What problem do we tackle? Current LLM unlearning methods aim to erase sensitive data—but remain vulnerable to relearning attacks, where even a few examples can resurrect forgotten knowledge. So how can we make unlearning truly robust?

English

Chongyu Fan

Keşfet