Chi Gui

@chi_gui_1

MS in Computer Science @ UIUC | Interested in LLM, Reinforcement Learning, Agent

Chicago เข้าร่วม Ağustos 2024

55 กำลังติดตาม15 ผู้ติดตาม

Chi Gui@chi_gui_1·13 Mar

SNR-adaptive filtering (RAGEN-v2) is lightweight, algorithm-agnostic, and easy to plug into RL pipelines. We see consistent gains across algorithms, model scales, and envs—while training faster from fewer high-quality trajectories. Sometimes, learning less is learning more.

Zihan "Zenus" Wang@wzenus

In Agent RL, models suffer from Template Collapse. They generate vast, diverse outputs (High Entropy) that lose all meaningful connection to the input prompt (Low Mutual Information). In other words, agent learn different ways to say nothing. 🚀 Introducing RAGEN-v2 -- Here's how we define and fix such silent failure modes in Agent RL. 🧵

English

Chi Gui@chi_gui_1·13 Mar

@wzenus By filtering low-SNR samples, the agent learns from fewer but higher-quality trajectories. This makes training faster and more stable — sometimes learning less is learning more！

English

228

Chi Gui รีทวีตแล้ว

Zihan "Zenus" Wang@wzenus·13 Mar

English

270

188.7K

Chi Gui@chi_gui_1·13 Mar

@wzenus A nice property of SNR-adaptive filtering (RAGEN-v2): it’s lightweight, algorithm-agnostic, and easy to plug into RL pipelines. We observe consistent improvements across RL algorithms, model scales, and diverse environments!

English

251

ค้นพบ

@wzenus @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine