Haoran Xu✈️ICLR26

136 posts

Haoran Xu✈️ICLR26

@ryanxhr

PhD student @UTAustin | @AmazonScience AI PhD Fellow | Towards super-human AGI using RL🚀

Austin, TX Katılım Aralık 2016

339 Takip Edilen338 Takipçiler

Sabitlenmiş Tweet

Haoran Xu✈️ICLR26@ryanxhr·20 Nis

Both offline RL and LLM RL fine-tuning can be formulated as behavior-regularized RL problems. We propose Value Grdient Flow (VGF), a new scalable and sample-efficient paradigam that treats behavior-regularized RL as an optimal transport problem. arxiv.org/abs/2604.14265 🧵[1/7]

GIF

English

176

13.2K

Haoran Xu✈️ICLR26@ryanxhr·21 Nis

@LucaAmb Yes, actually VGF could be thought of doing additioned flow matching with known velocity and steps.

English

Luca Ambrogioni@LucaAmb·21 Nis

@ryanxhr Can it be related to diffusion?

English

144

Haoran Xu✈️ICLR26 retweetledi

Haoran Xu✈️ICLR26@ryanxhr·20 Nis

GIF

English

176

13.2K

Haoran Xu✈️ICLR26@ryanxhr·21 Nis

@linghui35877581 In our paper we only tried the offline setting, i.e., the RLHF setup. Generally offline RL could also still be used with the development of advanced off-policy LLM RL algorithms.

English

linghui@linghui35877581·21 Nis

@ryanxhr Very curious about how the offline RL could be applied on the utilization of the offline RL rollout data in LLM online RL period.

English

Haoran Xu✈️ICLR26 retweetledi

Amy Zhang@yayitsamyzhang·20 Nis

@ryanxhr has developed this very nice work framing offline RL as an optimal transport problem, with SOTA results on offline RL benchmarks and LLM RL tasks. Check it out, and chat with him at ICLR!

Haoran Xu✈️ICLR26@ryanxhr

English

2.7K

Haoran Xu✈️ICLR26@ryanxhr·20 Nis

This is joint work w/ @KaiwenHu856, Somayeh Sojoudi, @yayitsamyzhang. Paper: arxiv.org/abs/2604.14265. Code: github.com/ryanxhr/vgf. A nice walkthrough: arxivexplained.com/papers/reinfor…. I will present VGF at @iclr_conf and can't wait to see you all at 🇧🇷. 🧵[7/7]

English

310

Haoran Xu✈️ICLR26@ryanxhr·20 Nis

For online RL finetuning, VGF solves several hard tasks that previous methods could not. 🧵[6/7]

English

297

Haoran Xu✈️ICLR26@ryanxhr·3 Ara

2️⃣ Information-Theoretic Reward Decomposition for Generalizable RLHF 🗓️ Thu, Dec 4, 4:30 PM – 7:30 PM, Exhibit Hall C, D, E #5413

English

124

Haoran Xu✈️ICLR26@ryanxhr·3 Ara

I will be at #NeurIPS2025 from 12/3 to 12/7 to present two papers. Come to chat everything about RL! 1️⃣ Unifying Online and Offline RL via Implicit Value Regularization 🗓️ Thu, Dec 4, 11:00 AM – 2:00 PM, Exhibit Hall C, D, E #303

English

245

Haoran Xu✈️ICLR26@ryanxhr·24 Eki

Grateful and honored to receive the Amazon AI Fellowship to support my research!

Rohit Prasad@RohitPrasadAI

Excited to announce @amazon's new AI PhD Fellowship Program supporting 100+ students across 9 universities like Carnegie Mellon, MIT & Stanford. Fellows will be paired with senior scientists working in related fields, plus receive financial support and AWS credits for research. Learn more: amazon.science/news/amazon-la…

English

290

Haoran Xu✈️ICLR26 retweetledi

RL Beyond Rewards Workshop@RLBRew_RLC·13 May

⚠️ Reminder! Submissions for @RL_Conference's RL beyond Reward Workshop are due May 30 (AoE)! We are brewing an interesting program and seeking innovative research work in reward-free RL. All papers are welcome, from exploratory abstracts to complete research papers.

English

15.7K

Haoran Xu✈️ICLR26@ryanxhr·24 Nis

I will miss #ICLR2025 but come to check our work on a new perspective of solving Reinforcement Learning using discriminator-weighted imitation learning. @ShuozheL and @yayitsamyzhang will present it during today’s poster session.

English

2.7K

Haoran Xu✈️ICLR26@ryanxhr·24 Nis

Work with @harshit_sikchi @scottniekum

English

207

Haoran Xu✈️ICLR26@ryanxhr·23 Nis

Come to check our work on a new pre-training objective for LLM based on the transformer architecture, led by Edward! arxiv.org/abs/2410.23506 More work around BST will come out, stay tuned!

Edward Hu@edward_s_hu

introducing the belief state transformer: a new LLM training objective that learns (provably) rich representations for planning bst objective is satisfyingly simple: just predict a "previous" token alongside the next token come by our ICLR poster this thursday to chat!

English

1.1K

Haoran Xu✈️ICLR26 retweetledi

John Langford@JohnCLangford·21 Nis

The Belief State Transformer edwardshu.com/bst-website/ is at ICLR this week. The BST objective efficiently creates compact belief states: summaries of the past sufficient for all future predictions. See the short talk: microsoft.com/en-us/research… and @mgostIH for further discussion.

English

106

16.3K

Keşfet

@LucaAmb @linghui35877581 @KaiwenHu856 @yayitsamyzhang @iclr_conf @RL_Conference @ShuozheL @harshit_sikchi