
SNR-adaptive filtering (RAGEN-v2) is lightweight, algorithm-agnostic, and easy to plug into RL pipelines.
We see consistent gains across algorithms, model scales, and envs—while training faster from fewer high-quality trajectories.
Sometimes, learning less is learning more.
Zihan "Zenus" Wang@wzenus
In Agent RL, models suffer from Template Collapse. They generate vast, diverse outputs (High Entropy) that lose all meaningful connection to the input prompt (Low Mutual Information). In other words, agent learn different ways to say nothing. 🚀 Introducing RAGEN-v2 -- Here's how we define and fix such silent failure modes in Agent RL. 🧵
English
