Raphael Avalos

56 posts

Raphael Avalos

@raphael_avalos

Writing the PhD thesis @aibrussels | ex Cohere and FWO Fellow

Belgium Katılım Mayıs 2019

355 Takip Edilen198 Takipçiler

Sabitlenmiş Tweet

Raphael Avalos@raphael_avalos·20 May

Last week, I wrapped up my internship @cohere, where I had the chance to work with fantastic people on RL for LLMs. It was an amazing 6 months, and I'm excited to share one of the outcomes: ShiQ, a Q-value based RL algorithm for fine-tuning LLMs 🚀 🧵Details in @irombie's post!

Irem Ergün@irombie

I'm excited to share our new pre-print ShiQ: Bringing back Bellman to LLMs! arxiv.org/abs/2505.11081 In this work, we propose a new, Q-learning inspired RL algorithm for finetuning LLMs 🎉 (1/n)

English

1.4K

Raphael Avalos@raphael_avalos·4 Haz

🚀 Excited to share the 3rd outcome of my internship at @CohereAI: a new RL algo for agentic LLMs that combines policy learning and world modeling, letting agents verify actions before executing them. Check out the 🧵 and 📄! Big thanks to my co-authors and Cohere’s RL team 🙏

Shangmin Guo@ShangminGuo

📢After months of work, I can finally share our latest research, couldn’t be more thrilled and excited. 🎉 We unify a policy 🤖 and a world model 🌍 into a single LLM, thus no external dynamics model needed! Why does this matter? Because now, the policy can plan based on its internal world model! And this planning boosts tool-use success rates to >90%, on top of SFT + RL. 📄: arxiv.org/abs/2506.02918 🧵[1/8]

English

999

Raphael Avalos retweetledi

Andrew Zhao@_AndrewZhao·19 May

Okay, I was definitely not vague posting

Andrew Zhao@_AndrewZhao

How come people don’t do Q-learning on LLMs

English

417

71K

Raphael Avalos@raphael_avalos·28 Mar

Excited to share the technical report on Command R7B (7B) and Command A (111B), our flagship model! These models are the result of incredible teamwork at @cohere, and it was an honor to be part of it. Report: cohere.com/research/paper…

Seraphina Goldfarb-Tarrant @ICLR🇧🇷@seraphinagt

Today (two weeks after model launch 🔥) we're releasing a technical report of how we made Command A and R7B 🚀! It has detailed breakdowns of our training process, and evaluations per capability (tools, multilingual, code, reasoning, safety, enterprise, long context)🧵 1/3.

English

222

Raphael Avalos retweetledi

Adaptive and Learning Agents (ALA) Workshop@ALA_workshop·25 Şub

📢 Deadline Extended! 📢 Due to multiple requests and the overlap with @RL_Conference and @RealAAAI, we’re extending the Adaptive Learning Agent workshop @AAMASconf submission deadline to March 1st (AOE)! 🚀 🔗 More details: ala-workshop.github.io

English

242

Raphael Avalos retweetledi

Adaptive and Learning Agents (ALA) Workshop@ALA_workshop·24 Şub

🚨 Less than 48 hours left to submit to the 17th Adaptive Learning Agent workshop at @AAMASconf! 🚨 We welcome full papers, work in progress, and 2-page abstracts of recent journal papers. Don't miss the deadline! 🔗 More details: ala-workshop.github.io

English

569

Raphael Avalos retweetledi

Willem Röpke@willem_ropke·17 Şub

Exciting news! My paper on multi-objective reinforcement learning was accepted at AAMAS 2025! We introduce IPRO (Iterated Pareto Referent Optimisation)—a principled approach to solving multi-objective problems. 🔗 Paper: arxiv.org/abs/2402.07182 💻 Code: github.com/wilrop/ipro

English

1.8K

Raphael Avalos retweetledi

Adaptive and Learning Agents (ALA) Workshop@ALA_workshop·5 Şub

Missed the deadline? No worries! We've extended the submission deadline to Feb 25! Find all the details on our website: ala-workshop.github.io

English

Raphael Avalos@raphael_avalos·28 Oca

Don't miss the opportunity to submit your (Multi-Agent) RL work to the ALA workshop!

Adaptive and Learning Agents (ALA) Workshop@ALA_workshop

Still 8 days to submit your work to the ALA workshop at AAMAS! We welcome full papers, work in progress, and 2-page abstracts of recently published journal papers. All the info is available at ala-workshop.github.io.

English

463

Raphael Avalos@raphael_avalos·3 Oca

The X account and website for the next edition of the ALA workshop is live! Follow it to get all the updates :)

Adaptive and Learning Agents (ALA) Workshop@ALA_workshop

Excited to announce the 17th Adaptive Learning Agent workshop at @AAMASconf in May! We welcome full papers, work in progress, and 2-page abstracts of recently published journal papers. Find out more at our website: ala-workshop.github.io. Deadline for submissions: February 4th.

English

Raphael Avalos@raphael_avalos·18 Kas

Starting my internship at @cohere today to work on LLMs! I'll be in Paris a couple of days a week, so if anyone wants to meet up, let me know!

English

1.4K

Raphael Avalos retweetledi

Florent Delgrange@f_delgrange·3 Eyl

Two weeks ago, I publicly defended my PhD thesis, entitled « Activating Formal Verification of Deep Reinforcement Learning Policies by Model Checking Bisimilar Latent Space Models ». 📚 The full dissertation is available here: tinyurl.com/formarl (1/n)

English

420

Raphael Avalos@raphael_avalos·15 Ağu

Looking forward to the next edition, and in the meantime, see you all at EWRL in Toulouse this October! 🚀 3/3

English

Raphael Avalos@raphael_avalos·15 Ağu

I also had the pleasure of presenting our latest work on Online Planning for POMDPs with State Requests (with E. Bargiacchi, A. Nowé, @DiederikRo, @faoliehoek). Check the paper here: rlj.cs.umass.edu/2024/papers/Pa… 2/3

English

353

Raphael Avalos@raphael_avalos·15 Ağu

The 1st edition of @RL_Conference was amazing! Congrats to the organizers for making this happen and for trying a new review system. I had such a great time with @GsprdLambrechts @kohler_hector @SuauMiguel @RiccZamboni Mathieu Reymond and all the others! 1/3

English

883

Raphael Avalos retweetledi

Hector Kohler@kohler_hector·15 Ağu

@RL_Conference was a blast and I caught up with some of the usual suspects from european RL @vernadec @araffin2 @raphael_avalos @GsprdLambrechts @RiccZamboni. See you all at EWRL 2024. Looking forward to next year's edition!! 🥳🧠

English

446

Raphael Avalos retweetledi

Willem Röpke@willem_ropke·16 Tem

Okay people, I need some help. We’re working on a project and have been stuck for a while. My final guess for what the issue may be is that gradients are not flowing as we would want them. Does anyone have a intuitive visualisation/debugging tool for gradient flows in jax?

English

510

Raphael Avalos retweetledi

Alizée Pace@AlizeePace·11 May

Presenting work on synthetic preference generation at two #ICLR2024 workshops today: DPFM & GenAI4DM @genai4dm. Come say hi to find out how to improve your reward model without collecting additional human feedback!

Alizée Pace@AlizeePace

RLHF gains are largely determined by the quality of the underlying reward model. How can we improve reward model quality without collecting more data? Introducing a novel approach to augmenting human feedback data with synthetic preferences! 🧵 arxiv.org/abs/2401.12086

English

1.3K

Raphael Avalos@raphael_avalos·10 May

If you are attending #ICLR2024 workshops go checkout this cool work !

Hugo Yeche (@hy9.bsky.social)@HugoYeche

In clinical early warning systems (EWS), can we go beyond the model estimate of event occurrence and leverage its belief about the event distance to improve our alarm policy? Introducing “Dynamic Survival Analysis for Early Event Prediction” with @ToManuelBurger and @gxr. 🧶

English

134

Raphael Avalos@raphael_avalos·9 May

Poster session now ! We are waiting for you with @f_delgrange at the poster 158 ! #ICLR2024

Raphael Avalos@raphael_avalos

Arrived at #ICLR2024 with @f_delgrange to present our work "The Wasserstein Believer: Learning Belief Updates for Partially Observable MDPs through Reliable Latent Space Models".

English

493

Keşfet

@cohere @RL_Conference @RealAAAI @AAMASconf @faoliehoek @GsprdLambrechts @kohler_hector @SuauMiguel