R. Alessio @ ETH | RL, Bandits, Exploration

153 posts

R. Alessio @ ETH | RL, Bandits, Exploration

@rssalessio

Postdoc at @BU_CDS with @aldopacchiano (https://t.co/Ekl0jGFIbd Lab). Interested in RL, Bandit problems and Adaptive Control.

Boston, MA เข้าร่วม Kasım 2010

335 กำลังติดตาม156 ผู้ติดตาม

ทวีตที่ปักหมุด

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·25 Kas

Excited to be in San Diego next week for #NeurIPS2025 🎉! Will present Adversarial Diffusion for Robust RL together with @DanieleFoffano. Poster session on Fri 5 Dec 7:30 p.m. EST, Exhibit Hall C,D,E. AD-RRL uses diffusion models to train Robust RL policies. #RL #Diffusion

R. Alessio @ ETH | RL, Bandits, Exploration tweet media

English

2.5K

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·31 Mar

@SAS So, if your crew spills wine on its customers onboard, is this the best you can do? Is it my fault if I had to throw away the clothes and the crew did not make any report? Literally the worst customer service #SAS

English

103

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·27 Mar

@ShashwatGoel7 Belief/posteriors for computing rewards is something that was already used, example arxiv.org/pdf/2506.01876

English

Shashwat Goel @ ICLR'26@ShashwatGoel7·26 Mar

Great paper showing self-distillation internalizes environment feedback, but also breaks the ability to navigate uncertainty as the "supervisor" already knows the outcome and doesn't have the same uncertainty. To teach uncertainty navigation, we proposed ∆Belief-RL. We reward actions based on whether they lead to "progress" which is estimated by the update in the model's own beliefs of achieving success. We show this improves both interaction efficiency and scaling in guessing environments, and parallel work like iGPO and TIPS shows it works for search agents. arxiv.org/abs/2602.12342 - Intrinsic Credit Assignment for Long Horizon Interaction arxiv.org/abs/2510.14967 - Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents arxiv.org/abs/2603.22293 - TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs The idea has rich roots in the 1999 paper on potential based reward shaping people.eecs.berkeley.edu/~pabbeel/cs287…, and a 2018 paper showing the potential can be estimated using the agents own beliefs cdn.aaai.org/ojs/11741/1174…. Lots of interesting future work here, ranging from how to measure beliefs over long-form answers, where logprobs might reward style over substance, to beliefs over arbitrary rewards and goals instead of answers, and also incorporating beliefs of other agents in the environment similar to ReBeL for multi-agent imperfect information games github.com/facebookresear….

Rosinality@rosinality

Analysis on self-distillation. It works by increasing the confidence, and does not generalize well. We can't assume the distribution given the solution behaves well, and it could be similar to unsupervised model-based verification.

English

13K

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·19 Mar

@IanOsband What if you have a wrong estimate of the advantage? Doesn't seem like the paper discusses that.

English

468

Ian Osband@IanOsband·18 Mar

Lots of talented applicants, and a good chance we will end up making multiple hires. If you want to stand out, take a look at this paper: arxiv.org/abs/2603.14608… When would you expect DG to be worse than PG? How would you bridge the gap to LLMs?

Ian Osband@IanOsband

Assembling a team at DeepMind in London. Scaling up RL for post-training is working, but right now it's still mostly hacks and dark arts (pretraining circa 2019). Pre-training wasn't always scaling laws and log-log plots; someone had to find the simplicity. We aim to do the same. If you're interested in doing things right in a research-first environment that scales all the way, please apply: job-boards.greenhouse.io/deepmind/jobs/…

English

339

67.6K

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·23 Şub

Excited to say that I'll be in Zurich for the next 2 months, visiting #ETH Prof. Andreas Krause group LAS (Learning & Adaptive Systems Group)! Happy to chat about #RL and #Exploration problems!

English

286

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·11 Ara

A neat result: the Complete Class Theorem . ➡️ pick any non-Bayes decision rule, there’s always a Bayes rule that is at least as good as non-Bayes one. When we talk about “good” procedures, we never really need to leave the Bayes world, at least for compact parameter spaces.

English

167

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·6 Ara

Amazing poster session yesterday at #NeurIPS2025 where we presented Adversarial Diffusion for Robust Reinforcement Learning! Thanks for all the great work @DanieleFoffano, looking forward to the next one! #RL #Diffusion

English

615

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·4 Ara

Adversarial Diffusion for Robust #RL! Tomorrow @ #NeurIPS2025 , afternoon poster session #313

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio

English

1.2K

R. Alessio @ ETH | RL, Bandits, Exploration รีทวีตแล้ว

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·25 Kas

English

2.5K

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·3 Ara

Too much excitement on this app. Where is the openreview drama? #iclr2026 #neurips2025

English

371

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·1 Ara

@apsarathchandar @NeurIPSConf x.com/rssalessio/sta…

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio

Just a heads-up: this year @NeurIPSConf is not using the Whova app. You can find the new mobile app on the NeurIPS website neurips.cc/mobile/support/ Literally the worst communications by the organizers on this one. What was wrong with Whova? #NeurIPS2025 #whova

QME

174

Sarath Chandar@apsarathchandar·1 Ara

Is there no Whova app for @NeurIPSConf this year?

English

1.3K

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·1 Ara

@sikatasengupta In the picture all of the people that have worked with me (some are unrelated to the topic, but I still feel like everyone is part of the journey!) And I'm missing some probably!

English

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·1 Ara

Incredibly happy to have presented at the CS theory seminar a #UPenn ! Many thanks to @sikatasengupta for organizing this!

English

129

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·1 Ara

Glhf to all ACs

Egor Shulgin@egor_shulg

@iclr_conf reverted all reviews to pre-discussion state after the OpenReview bug. Result: one paper I’m reviewing has the authors' rebuttal responding point-by-point to concerns that were edited out and no longer exist in the system. New ACs: good luck making sense of this.

English

751

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·29 Kas

English

13.6K

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·1 Ara

Just a tip, @openreviewnet should wrap this whole incident up into a paper and submit it to the next @iclr_conf . #iclr2026 #Openreview

Open Review@openreviewnet

Initial Analysis of OpenReview API Security Incident

English

1.2K

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·30 Kas

@rainx0r @NeurIPSConf Agree :(

English

185

rain@rainx0r·30 Kas

@rssalessio @NeurIPSConf new app is so terrible this is genuinely embarrassing

English

193

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·30 Kas

@quantifiedmuse @NeurIPSConf Agree. I'm wondering if they did this because there are multiple locations this year.

English

175

corey@quantifiedmuse·30 Kas

@rssalessio @NeurIPSConf The new app is not great 😔

English

224

R. Alessio @ ETH | RL, Bandits, Exploration@rssalessio·30 Kas

Bugs happen unfortunately. Hope @openreviewnet can grow stronger from this. They have my respect for all of the work they have been doing so far.

Chuang Gan@gan_chuang

ICLR has placed OpenReview in a difficult position, so I want to offer a few words about the OpenReview team working behind the scenes. OpenReview has long been operated at UMass Amherst as a non-profit organization founded by Andrew McCallum. Each year, Andrew must raise more than $2 million to support a 20-person team that provides essential infrastructure for most major conferences. I once asked Andrew what might have been a naïve question: whether he had considered developing a business model for OpenReview, given its prominence and the seemingly obvious opportunities. He pushed back, explaining that everything he has done for OpenReview is driven by a commitment to serve and strengthen the academic community. He is willing to devote significant personal effort to ensure the platform remains freely accessible to all. We should not blame such a brilliant and dedicated team for an accidental issue. Otherwise, fewer people would be willing to shoulder this kind of responsibility in the future. Deep respect to the OpenReview team! I’m grateful for their work and happy to support in any way!

English

287

ค้นพบ

@SAS @ShashwatGoel7 @IanOsband @DanieleFoffano @apsarathchandar @NeurIPSConf @sikatasengupta @openreviewnet