Ilze Amanda Auzina
38 posts

Ilze Amanda Auzina
@AmandaIlze
@ELLISforEurope PhD Student in @bethgelab | Intern @AIatMeta | LM RL Post-training | Reasoning and Exploration | AI Oversight

How can agents learn in long, open-ended tasks where success is rare and rewards are sparse? 👀 🚨 Enter ∆Belief-RL: we show how to use agent’s own belief updates as a dense reward for turn-level credit assignment. the result? Surprisingly strong generalization. (1/8) 🧵⬇️





💡 Key idea: 👉 Use the change in the agent’s belief about the correct answer as a dense intrinsic reward. If an action increases: log p(target | history) → reward it. We call this ∆Belief-RL. No critic. No process reward model. Just the agent judging its own progress. (2/8)









🚨Great Models Think Alike and this Undermines AI Oversight🚨 New paper quantifies LM similarity (1) LLM-as-a-judge favor more similar models🤥 (2) Complementary knowledge benefits Weak-to-Strong Generalization☯️ (3) More capable models have more correlated failures 📈🙀 🧵👇



