Chi Gui

4 posts

Chi Gui

Chi Gui

@chi_gui_1

MS in Computer Science @ UIUC | Interested in LLM, Reinforcement Learning, Agent

Chicago เข้าร่วม Ağustos 2024
55 กำลังติดตาม15 ผู้ติดตาม
Chi Gui
Chi Gui@chi_gui_1·
SNR-adaptive filtering (RAGEN-v2) is lightweight, algorithm-agnostic, and easy to plug into RL pipelines. We see consistent gains across algorithms, model scales, and envs—while training faster from fewer high-quality trajectories. Sometimes, learning less is learning more.
Zihan "Zenus" Wang@wzenus

In Agent RL, models suffer from Template Collapse. They generate vast, diverse outputs (High Entropy) that lose all meaningful connection to the input prompt (Low Mutual Information). In other words, agent learn different ways to say nothing. 🚀 Introducing RAGEN-v2 -- Here's how we define and fix such silent failure modes in Agent RL. 🧵

English
1
2
14
2K
Chi Gui
Chi Gui@chi_gui_1·
@wzenus By filtering low-SNR samples, the agent learns from fewer but higher-quality trajectories. This makes training faster and more stable — sometimes learning less is learning more!
English
0
0
1
228
Chi Gui รีทวีตแล้ว
Zihan "Zenus" Wang
Zihan "Zenus" Wang@wzenus·
In Agent RL, models suffer from Template Collapse. They generate vast, diverse outputs (High Entropy) that lose all meaningful connection to the input prompt (Low Mutual Information). In other words, agent learn different ways to say nothing. 🚀 Introducing RAGEN-v2 -- Here's how we define and fix such silent failure modes in Agent RL. 🧵
English
13
60
270
188.7K
Chi Gui
Chi Gui@chi_gui_1·
@wzenus A nice property of SNR-adaptive filtering (RAGEN-v2): it’s lightweight, algorithm-agnostic, and easy to plug into RL pipelines. We observe consistent improvements across RL algorithms, model scales, and diverse environments!
English
0
0
1
251