Raphael Avalos

56 posts

Raphael Avalos

Raphael Avalos

@raphael_avalos

Writing the PhD thesis @aibrussels | ex Cohere and FWO Fellow

Belgium Katılım Mayıs 2019
357 Takip Edilen200 Takipçiler
Sabitlenmiş Tweet
Raphael Avalos
Raphael Avalos@raphael_avalos·
Last week, I wrapped up my internship @cohere, where I had the chance to work with fantastic people on RL for LLMs. It was an amazing 6 months, and I'm excited to share one of the outcomes: ShiQ, a Q-value based RL algorithm for fine-tuning LLMs 🚀 🧵Details in @irombie's post!
Irem Ergün@irombie

I'm excited to share our new pre-print ShiQ: Bringing back Bellman to LLMs! arxiv.org/abs/2505.11081 In this work, we propose a new, Q-learning inspired RL algorithm for finetuning LLMs 🎉 (1/n)

English
0
2
31
1.4K
Raphael Avalos
Raphael Avalos@raphael_avalos·
🚀 Excited to share the 3rd outcome of my internship at @CohereAI: a new RL algo for agentic LLMs that combines policy learning and world modeling, letting agents verify actions before executing them. Check out the 🧵 and 📄! Big thanks to my co-authors and Cohere’s RL team 🙏
Shangmin Guo@ShangminGuo

📢After months of work, I can finally share our latest research, couldn’t be more thrilled and excited. 🎉 We unify a policy 🤖 and a world model 🌍 into a single LLM, thus no external dynamics model needed! Why does this matter? Because now, the policy can plan based on its internal world model! And this planning boosts tool-use success rates to >90%, on top of SFT + RL. 📄: arxiv.org/abs/2506.02918 🧵[1/8]

English
0
1
19
995
Raphael Avalos retweetledi
Willem Röpke
Willem Röpke@willem_ropke·
Exciting news! My paper on multi-objective reinforcement learning was accepted at AAMAS 2025! We introduce IPRO (Iterated Pareto Referent Optimisation)—a principled approach to solving multi-objective problems. 🔗 Paper: arxiv.org/abs/2402.07182 💻 Code: github.com/wilrop/ipro
English
3
6
30
1.8K
Raphael Avalos
Raphael Avalos@raphael_avalos·
Starting my internship at @cohere today to work on LLMs! I'll be in Paris a couple of days a week, so if anyone wants to meet up, let me know!
English
0
0
28
1.4K
Raphael Avalos retweetledi
Florent Delgrange
Florent Delgrange@f_delgrange·
Two weeks ago, I publicly defended my PhD thesis, entitled « Activating Formal Verification of Deep Reinforcement Learning Policies by Model Checking Bisimilar Latent Space Models ». 📚 The full dissertation is available here: tinyurl.com/formarl (1/n)
Florent Delgrange tweet media
English
1
1
4
419
Raphael Avalos
Raphael Avalos@raphael_avalos·
Looking forward to the next edition, and in the meantime, see you all at EWRL in Toulouse this October! 🚀 3/3
English
0
0
4
96
Raphael Avalos retweetledi
Willem Röpke
Willem Röpke@willem_ropke·
Okay people, I need some help. We’re working on a project and have been stuck for a while. My final guess for what the issue may be is that gradients are not flowing as we would want them. Does anyone have a intuitive visualisation/debugging tool for gradient flows in jax?
English
0
1
3
509
Raphael Avalos retweetledi
Alizée Pace
Alizée Pace@AlizeePace·
Presenting work on synthetic preference generation at two #ICLR2024 workshops today: DPFM & GenAI4DM @genai4dm. Come say hi to find out how to improve your reward model without collecting additional human feedback!
Alizée Pace tweet media
Alizée Pace@AlizeePace

RLHF gains are largely determined by the quality of the underlying reward model. How can we improve reward model quality without collecting more data? Introducing a novel approach to augmenting human feedback data with synthetic preferences! 🧵 arxiv.org/abs/2401.12086

English
0
2
20
1.3K