Ilze Amanda Auzina

38 posts

Ilze Amanda Auzina

Ilze Amanda Auzina

@AmandaIlze

@ELLISforEurope PhD Student in @bethgelab | Intern @AIatMeta | LM RL Post-training | Reasoning and Exploration | AI Oversight

Tübingen, Deutschland Katılım Ekim 2020
185 Takip Edilen353 Takipçiler
Sabitlenmiş Tweet
Ilze Amanda Auzina
Ilze Amanda Auzina@AmandaIlze·
How can agents learn in long, open-ended tasks where success is rare and rewards are sparse? 👀 🚨 Enter ∆Belief-RL: we show how to use agent’s own belief updates as a dense reward for turn-level credit assignment. the result? Surprisingly strong generalization. (1/8) 🧵⬇️
Ilze Amanda Auzina tweet media
English
6
44
324
37.3K
Ilze Amanda Auzina
Ilze Amanda Auzina@AmandaIlze·
Happy to announce that our paper on "Intrinsic Credit Assignment for Long Horizon Interaction" is accepted for ICML 🥳 If your around 🇰🇷 and are *curious*, would love to chat about dense credit assingment for agentic behaviors beyond token generation x.com/AmandaIlze/sta…
Ilze Amanda Auzina@AmandaIlze

How can agents learn in long, open-ended tasks where success is rare and rewards are sparse? 👀 🚨 Enter ∆Belief-RL: we show how to use agent’s own belief updates as a dense reward for turn-level credit assignment. the result? Surprisingly strong generalization. (1/8) 🧵⬇️

English
1
7
34
1.9K
Ilze Amanda Auzina retweetledi
Lukas Thede
Lukas Thede@lukas_thede·
🚨 New paper! arxiv.org/abs/2603.06610 Are you sure your post-trained LLM isn’t forgetting something? Adapting LLMs is known to cause forgetting. We usually measure it via general knowledge benchmarks. But if MMLU doesn’t drop…. are you really fine?
Lukas Thede tweet media
English
1
9
27
3.7K
Ilze Amanda Auzina
Ilze Amanda Auzina@AmandaIlze·
@pounds_98 Good question. Both methods use log-probabilities as a signal. The key difference is that in our setting the environment provides no correctness feedback. We therefore reward belief change directly, whereas SDPO updates likelihood based on error information.
English
0
0
5
273
Ilze Amanda Auzina
Ilze Amanda Auzina@AmandaIlze·
How can agents learn in long, open-ended tasks where success is rare and rewards are sparse? 👀 🚨 Enter ∆Belief-RL: we show how to use agent’s own belief updates as a dense reward for turn-level credit assignment. the result? Surprisingly strong generalization. (1/8) 🧵⬇️
Ilze Amanda Auzina tweet media
English
6
44
324
37.3K
Ilze Amanda Auzina
Ilze Amanda Auzina@AmandaIlze·
@YouJiacheng cool suggestion. potentially yes, in this case the belief elicitation should capture the agent's belief update wrt whether its edit allowed it to improve towards the final deliverable;
English
1
0
4
90
You Jiacheng
You Jiacheng@YouJiacheng·
interesting. can we extend it to agents with the "edit" action? (i.e. the agent can progressively improve its final deliverable -- not only internal belief matters, external files also matter)
Ilze Amanda Auzina@AmandaIlze

💡 Key idea: 👉 Use the change in the agent’s belief about the correct answer as a dense intrinsic reward. If an action increases: log p(target | history) → reward it. We call this ∆Belief-RL. No critic. No process reward model. Just the agent judging its own progress. (2/8)

English
1
6
48
4.1K
Ilze Amanda Auzina
Ilze Amanda Auzina@AmandaIlze·
@pannous @grok not quite, entropy based loss modifications are at trajectory level (signal still sparse), what we suggest is a solution at turn-level without any extra verifier, process model needed
English
0
0
4
53
Pannous
Pannous@pannous·
@AmandaIlze @grok isn't that the main idea behind all Entropy based loss modifications
English
2
0
0
481
Ilze Amanda Auzina
Ilze Amanda Auzina@AmandaIlze·
This suggests a shift in how we train agents: Instead of external critics or verifiers, 👉 let agents learn by tracking their own uncertainty reduction. A step toward agents that reason about what they don’t know. (7/8)
English
1
0
16
1.5K
Ilze Amanda Auzina retweetledi
Adhiraj Ghosh
Adhiraj Ghosh@adhiraj_ghosh98·
🚨Current data curation results in the creation of static datasets and the use of model-based filters that induce many biases. Can we fix this? We propose ✨CABS✨, a flexible concept-aware online batch curation method that improves CLIP pretraining! arxiv.org/abs/2511.20643 🧵👇
Adhiraj Ghosh tweet media
English
2
7
25
6.8K
Ilze Amanda Auzina retweetledi
Ameya P.
Ameya P.@AmyPrb·
New paper: arxiv.org/abs/2511.16655 We revisit Cambrian‑S & show: 1️⃣ Current VSI‑Super benchmarks do not yet reliably measure spatial supersensing. 2️⃣ Cambrian‑S’s inference pipelines improve by inadvertently exploiting shortcuts we discover -- finding these is valueable! 🧵👇
Ameya P. tweet media
English
1
1
17
937
Ilze Amanda Auzina retweetledi
Vishaal Udandarao
Vishaal Udandarao@vishaal_urao·
🚀New Paper arxiv.org/abs/2510.20860 We conduct a systematic data-centric study for speech-language pretraining, to improve end-to-end spoken-QA! 🎙️🤖 Using our data-centric insights, we pretrain a 3.8B SpeechLM (called SpeLangy) outperforming 3x larger models! 🧵👇
Vishaal Udandarao tweet media
English
3
40
127
9.7K
Ilze Amanda Auzina retweetledi
Shashwat Goel @ ICLR'26
Shashwat Goel @ ICLR'26@ShashwatGoel7·
Presenting today at #ICML2025. To learn how to measure language model similarity, and it's effects on LLM as a Judge and Weak to Strong distillation, join our poster session: Today 11 am -1:30 pm, East Exhibition Hall A-B E-2411 w/ @AmyPrb @JoschkaStrueber @AmandaIlze
Shashwat Goel @ ICLR'26@ShashwatGoel7

🚨Great Models Think Alike and this Undermines AI Oversight🚨 New paper quantifies LM similarity (1) LLM-as-a-judge favor more similar models🤥 (2) Complementary knowledge benefits Weak-to-Strong Generalization☯️ (3) More capable models have more correlated failures 📈🙀 🧵👇

English
0
2
14
815
Ilze Amanda Auzina
Ilze Amanda Auzina@AmandaIlze·
Ever wondered how an agent's own beliefs evolve as it reasons across multiple timesteps? Come by — we’ve been thinking about that too 🤖🧠 🔹 Spotlight + Oral, World Models Workshop Measuring Belief Updates in Curious Agents 📍 Fri, July 18 | 10:00 📍 West Ballroom B
Ilze Amanda Auzina tweet media
English
1
0
3
246
Ilze Amanda Auzina
Ilze Amanda Auzina@AmandaIlze·
Excited to be heading to ICML this year to present two projects, both as spotlights! 🎉 Big thanks to my collaborators — come say hi if you're around! #ICML2025 #ML
English
1
4
33
3.5K