Haoran Xu✈️ICLR26

136 posts

Haoran Xu✈️ICLR26 banner
Haoran Xu✈️ICLR26

Haoran Xu✈️ICLR26

@ryanxhr

PhD student @UTAustin | @AmazonScience AI PhD Fellow | Towards super-human AGI using RL🚀

Austin, TX Katılım Aralık 2016
339 Takip Edilen338 Takipçiler
Sabitlenmiş Tweet
Haoran Xu✈️ICLR26
Haoran Xu✈️ICLR26@ryanxhr·
Both offline RL and LLM RL fine-tuning can be formulated as behavior-regularized RL problems. We propose Value Grdient Flow (VGF), a new scalable and sample-efficient paradigam that treats behavior-regularized RL as an optimal transport problem. arxiv.org/abs/2604.14265 🧵[1/7]
GIF
English
3
23
176
13.2K
Haoran Xu✈️ICLR26
Haoran Xu✈️ICLR26@ryanxhr·
@LucaAmb Yes, actually VGF could be thought of doing additioned flow matching with known velocity and steps.
English
0
0
0
61
Haoran Xu✈️ICLR26 retweetledi
Haoran Xu✈️ICLR26
Haoran Xu✈️ICLR26@ryanxhr·
Both offline RL and LLM RL fine-tuning can be formulated as behavior-regularized RL problems. We propose Value Grdient Flow (VGF), a new scalable and sample-efficient paradigam that treats behavior-regularized RL as an optimal transport problem. arxiv.org/abs/2604.14265 🧵[1/7]
GIF
English
3
23
176
13.2K
Haoran Xu✈️ICLR26
Haoran Xu✈️ICLR26@ryanxhr·
@linghui35877581 In our paper we only tried the offline setting, i.e., the RLHF setup. Generally offline RL could also still be used with the development of advanced off-policy LLM RL algorithms.
English
0
0
0
53
linghui
linghui@linghui35877581·
@ryanxhr Very curious about how the offline RL could be applied on the utilization of the offline RL rollout data in LLM online RL period.
English
1
0
0
86
Haoran Xu✈️ICLR26 retweetledi
Amy Zhang
Amy Zhang@yayitsamyzhang·
@ryanxhr has developed this very nice work framing offline RL as an optimal transport problem, with SOTA results on offline RL benchmarks and LLM RL tasks. Check it out, and chat with him at ICLR!
Haoran Xu✈️ICLR26@ryanxhr

Both offline RL and LLM RL fine-tuning can be formulated as behavior-regularized RL problems. We propose Value Grdient Flow (VGF), a new scalable and sample-efficient paradigam that treats behavior-regularized RL as an optimal transport problem. arxiv.org/abs/2604.14265 🧵[1/7]

English
0
3
39
2.7K
Haoran Xu✈️ICLR26
Haoran Xu✈️ICLR26@ryanxhr·
For online RL finetuning, VGF solves several hard tasks that previous methods could not. 🧵[6/7]
Haoran Xu✈️ICLR26 tweet media
English
1
1
2
297
Haoran Xu✈️ICLR26
Haoran Xu✈️ICLR26@ryanxhr·
2️⃣ Information-Theoretic Reward Decomposition for Generalizable RLHF 🗓️ Thu, Dec 4, 4:30 PM – 7:30 PM, Exhibit Hall C, D, E #5413
Haoran Xu✈️ICLR26 tweet media
English
0
0
1
124
Haoran Xu✈️ICLR26
Haoran Xu✈️ICLR26@ryanxhr·
I will be at #NeurIPS2025 from 12/3 to 12/7 to present two papers. Come to chat everything about RL! 1️⃣ Unifying Online and Offline RL via Implicit Value Regularization 🗓️ Thu, Dec 4, 11:00 AM – 2:00 PM, Exhibit Hall C, D, E #303
Haoran Xu✈️ICLR26 tweet media
English
1
0
4
245
Haoran Xu✈️ICLR26 retweetledi
RL Beyond Rewards Workshop
RL Beyond Rewards Workshop@RLBRew_RLC·
⚠️ Reminder! Submissions for @RL_Conference's RL beyond Reward Workshop are due May 30 (AoE)! We are brewing an interesting program and seeking innovative research work in reward-free RL. All papers are welcome, from exploratory abstracts to complete research papers.
RL Beyond Rewards Workshop tweet media
English
1
12
51
15.7K
Haoran Xu✈️ICLR26
Haoran Xu✈️ICLR26@ryanxhr·
I will miss #ICLR2025 but come to check our work on a new perspective of solving Reinforcement Learning using discriminator-weighted imitation learning. @ShuozheL and @yayitsamyzhang will present it during today’s poster session.
Haoran Xu✈️ICLR26 tweet mediaHaoran Xu✈️ICLR26 tweet mediaHaoran Xu✈️ICLR26 tweet media
English
1
4
14
2.7K