Jason Liu

47 posts

Jason Liu

@JasonLiu106968

Emancipate your mind and seek truth from facts

HongKong Katılım Ağustos 2025

115 Takip Edilen104 Takipçiler

Sabitlenmiş Tweet

Jason Liu@JasonLiu106968·13 Ağu

Excited to share our #RL_for_LLM paper: "Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning" We conducted a comprehensive analysis of RL techniques in LLM domain!🥳 Surprisingly, we found that using only 2 techniques can unlock the learning capability of LLMs.😮

English

160

31.1K

Jason Liu@JasonLiu106968·19 Mar

Thread (5/5) ⚙️ The Infrastructure: Fast & Scalable! Won't co-evolving two models bottleneck training? Not with our fully asynchronous framework! ⚡ We decoupled the process into a Primary Loop (actor rolls out) and a Background Track (extractor distills). It is orchestrated by a Centralized **Experience Manager** that handles query batching, caching, and lock-free concurrent retrieval—meaning low blocking latency for actor training! 🏎️💨

English

Jason Liu@JasonLiu106968·19 Mar

Thread (4/5) 💡 The Algorithm: A simple and pure dual-RL paradigm! We form a closed co-evolutionary loop between two models: 🔹 **Policy Actor:** Optimized via sparse outcome rewards. We show that the mainstream **GRPO** algorithm works perfectly here to drive the actor's task mastery! 🎯 🔹 **Experience Extractor:** Optimized based on whether its distilled experience actually helped the actor succeed. To ensure highly stable optimization and prevent abrupt distribution shifts during co-evolution, we use a REINFORCE + IS clip variant! ⚖️ No messy heuristics—just pure, mutually beneficial RL! 🔄🤝

English

Jason Liu@JasonLiu106968·19 Mar

#Agentic #OpenClaw #RL #Code_agent #LLM 🚀 **The secret for breaking the bottleneck of agentic RL training efficiency? From self-evolution to co-evolution!** 🧠 Excellent project led by @pumpkinnnnne and supported by Alibaba ROLL. It's my honor to collaborate with them!

English

308

Jason Liu@JasonLiu106968·6 Oca

4.00001/4 📕4. Core Innovation #3: Chunk-Initialized Resampling In long-horizon tasks, sampling a full successful trajectory from scratch is nearly impossible. Success often hinges on a few crucial forks—decision points where choosing the right chunk determines final outcome. IPA’s insight: Why start from the beginning every time? Instead, it uses expert-like trajectories (from self-play or teacher models) and resamples only the suffix starting from a selected chunk. This dramatically reduces exploration burden. Two strategies are proposed: - Sequential Rollback: Start resampling from the last chunk and progressively move backward—enabling curriculum-style learning. - Parallelized Initialization: Launch rollouts in parallel from multiple anchor chunks across the trajectory—improving efficiency across diverse task structures. To prevent learning stalls when no positive samples are found, IPA blends imitation learning (IL) and RL: - Apply IL loss to expert prefill chunks (to preserve reliable subroutines), - Apply chunk-level RL to resampled suffixes (to enable adaptive credit assignment). > 🌟 This hybrid approach allows agents to solve tasks previously deemed intractable—even when initial success rates are zero.

English

Jason Liu@JasonLiu106968·6 Oca

4/4💡3. Core Innovation #2: Chunk-Level Policy Optimization IPA lifts the entire training objective to the chunk level

English

Jason Liu@JasonLiu106968·6 Oca

#rl4LLM #AI #rl 🔥Don't miss our new algorithm, Interaction Perceptive Agentic Policy Optimization (IPA)! It enables small models to reach the super-large LMs' performance! TL;DR: REINFORCE is surprisingly powerful—if properly enhanced. By shifting the optimization unit from tokens to interaction chunks, introducing chunk-level discounted returns, and designing a smart resampling strategy, IPA enables stable, efficient, and scalable reinforcement learning for agentic systems. In the following thread, I will further introduce the design of IPA👇 Paper linek🔗: arxiv.org/abs/2512.24873

Brady Long@thisguyknowsai

🚨 Chinese researchers just published a paper that destroys every AI agent startup pitch deck. It's called ROME + ALE, and it exposes why every "AI agent company" you've heard of is building on quicksand. Here's what nobody's talking about:

English

211

Jason Liu retweetledi

Pablo Samuel Castro@pcastr·8 Ara

This #NeurIPS2025 was tiring, but it was fantastic to connect with so many friends and colleagues! I was so busy I didn't get a chance to tweemote our papers at the conference, so I'll remedy that with this post-hoc thread: 👇🏾

English

5.7K

Jason Liu@JasonLiu106968·27 Kas

@johanobandoc @AaronCourville @pcastr congrats mate!🥳

English

Johan Obando-Ceron 👍🏽@johanobandoc·26 Kas

🥳After two amazing years of my PhD, I’m very happy to share that I have officially passed my PhD candidacy exam! 🎉 I want to thank my colleagues and my PhD supervisors (@AaronCourville, @pcastr ) for their tremendous support and for helping shape my research thinking. I’m also deeply grateful to the jury committee for their valuable feedback during the examination (@GlenBerseth, @apsarathchandar ). 🙏

Montréal, Québec 🇨🇦 English

2.7K

Jason Liu retweetledi

wang@weixunwang·7 Kas

@_lewtun @alexpiche_ ROLL：github.com/alibaba/ROLL

English

103

Jason Liu retweetledi

Pablo Samuel Castro@pcastr·4 Kas

going beyond dormancy and into gradient activity for identifying neuron activity. check out our work led by @JasonLiu106968 , zihao wu, and @johanobandoc . and if you'll be in san diego for #neurips2025 come by our poster to chat!

Jason Liu@JasonLiu106968

#NeurIPS2025 #AgenticAI 🚀 We're thrilled to share our #NeurIPS2025 paper: Measuring Gradients, Not Activations! Enhancing Neuronal Activity in Deep Reinforcement Learning We identify a flaw in measuring "inactive" neurons in various complex networks and propose focusing on how neurons learn, not their outputs. Paper: arxiv.org/abs/2505.24061 Code: github.com/torressliu/gra… Thread on why gradients reveal true neuronal health in RL agents👇:

English

3.9K

Jason Liu@JasonLiu106968·4 Kas

📕5/5 Bigger picture: This isn’t just about resetting neurons. It’s about redefining neuronal health in deep learning: ⭐️From expressiveness → adaptability. 🌟GraMa offers a general, scalable lens to diagnose and fix representational collapse in RL—and beyond. 💫We hope GraMa inspires more learning-aware design in deep RL! 🥳Thanks to our brilliant co-authors: @johanobandoc, @pcastr, @AaronCourville, @lingpan_hkust, and Zihao Wu. It has been a wonderful academic journey!

English

Jason Liu@JasonLiu106968·4 Kas

🌞4/5 Built on GraMa, we propose ReGraMa: A targeted neuron reset that only revives neurons truly stuck in learning limbo. 💪Result? >15% performance gains in hard DM Control tasks. Restores learning in non-ReLu SAC & BRO-net & DACER (where ReDo fails). Mitigates catastrophic forgetting. 📶Demonstrate robust efficiency when scaling up the parameters in the network. ⏩And even more lightweight!

English

112

Jason Liu@JasonLiu106968·4 Kas

English

8.5K

Keşfet

@pumpkinnnnne @johanobandoc @AaronCourville @pcastr @GlenBerseth @apsarathchandar @_lewtun @alexpiche_