
I will be at #NeurIPS2025 presenting our work on data-efficient RL fine-tuning for LLMs today (Dec 4, 11:00-2:00 pm, #2503)!
🚀 We propose difficulty-targeted online data selection and rollout replay, reducing RL training time by 23–62% while matching original GRPO performance.

English

