Andrea | 🇸🇪🇪🇸🇻🇪 retweetledi

Introducing Critique Fine-Tuning (CFT): a more effective SFT method for enhancing LLMs' reasoning abilities.
📄 Paper: arxiv.org/pdf/2501.17703
CFT is simple: instead of training models to directly answer questions, we train them to critique noisy answers.
What's fascinating is that while most approaches focus on using generative critique or reward models to provide feedback for policy models, these critique models can themselves serve as policy models: directly answering questions with stronger reasoning.
Interestingly, we also found that CFT saturates quickly: overtraining on critiques can even degrade problem-solving performance.
Work led by @YuboWang726 and collaborated with @WenhuChen

English


































