Tianwei Ni

208 posts

Tianwei Ni banner
Tianwei Ni

Tianwei Ni

@twni2016

On the job market for research scientist roles | Final-year PhD student @Mila_Quebec on RL | Prev @AmazonScience on LLMs | @continual_learn

Montreal, Canada Katılım Temmuz 2017
875 Takip Edilen569 Takipçiler
Sabitlenmiş Tweet
Tianwei Ni
Tianwei Ni@twni2016·
Excited to share my recent blog post on offline RL as policy prior learning. Offline RL is often framed as learning a safe, deploy-as-is policy from static data. I argue it should instead learn a policy prior optimized for future adaptation. 🧵 Blog: twni2016.github.io/blogs/policypr…
Tianwei Ni tweet media
English
1
20
116
7.7K
Tianwei Ni retweetledi
Sony AI
Sony AI@SonyAI_global·
For 40+ years, building a robot that could rally with an elite human table tennis player at full speed was an unsolved problem. Sony AI's Ace research project set out to change that—and the results are now accepted for publication in @Nature and featured on the cover.
Sony AI tweet media
English
7
89
350
127.9K
Tianwei Ni retweetledi
Yihao Sun
Yihao Sun@Tobealegend24·
Most VLA-RL frameworks inherit the complexity of LLM-RL infra but we found that none of it is necessary. We therefore introduce VLARLKit: A simple yet fast VLA RL framework. Code link: github.com/VLARLKit/VLARL…
Yihao Sun tweet media
English
4
20
115
8.9K
Tianwei Ni retweetledi
Ziyan "Ray" Luo
Ziyan "Ray" Luo@RayZiyan41307·
Our workshop @continual_learn announcement post was removed unexpectedly with no reason; it had significant reach (including RTs from our founding father, which were also removed). @X @Support Given its prior engagement, we hope to recover it. x.com/continual_lear…
English
0
1
9
706
Tianwei Ni retweetledi
RL Beyond Rewards Workshop
RL Beyond Rewards Workshop@RLBRew_RLC·
Scalar, well-defined, easy-to-optimize rewards aren’t always available in real-world interaction data–yet that data is crucial for scaling general-purpose agents. Excited to announce the 3rd edition of RLBRew: Towards Scalable General-Purpose Agents at @RL_Conference 2026!
RL Beyond Rewards Workshop tweet media
English
1
11
31
5.4K
Tianwei Ni retweetledi
Rasool Fakoor
Rasool Fakoor@rasoolfa·
Are you working on RL, principled ways to build RL envs for agent training, or effective evaluation for agents? Want to showcase your NeurIPS submission? or just discuss about research more broadly? Then consider submitting and attending to our first ever workshop on Methods and RL Environments for Evaluating AI Agents. Deadline: May 11 rl-eval.github.io
Jonas Mueller@jomulr

📢 Call for papers: Workshop on Methods and Reinforcement Learning Environments for Evaluating AI Agents @ ACM CAIS 2026 (inaugural edition!) Topics include: - Design principles for effective RL Environments - Methods to evaluate Agents, esp. causal/interventional techniques

English
0
1
7
981
Tianwei Ni retweetledi
RL in Big Worlds
RL in Big Worlds@rlc_bigworlds·
RL in Big Worlds is a workshop at @RL_Conference about ideas that enable agents to achieve goals in environments vastly more complex than themselves This requires giving agents the ability to learn continually and use approximate value functions, models and policies effectively
RL in Big Worlds tweet media
English
3
26
179
84K
Tianwei Ni
Tianwei Ni@twni2016·
🔥Thrilled to announce the Continual Reinforcement Learning (CRL) Workshop @RL_Conference 2026 in Montreal, Canada! 📷 We welcome submissions on broad topics of continual RL. Interested in submitting or reviewing? Check out our website for more details!
English
0
1
13
581
Tianwei Ni retweetledi
Shenao Zhang
Shenao Zhang@ShenaoZhang·
Earlier this month, I wrote up a few thoughts on how to train LLMs as strong continual learners that can do test-time exploration and self-improve. Examples include emergent agentic capabilities such as error recovery and dynamic tool learning. Figured I’d share it here in case it’s useful to others :) notion.so/Towards-Contin…
English
1
12
109
7.9K
Tianwei Ni retweetledi
Tianwei Ni retweetledi
Max Schwarzer
Max Schwarzer@max_a_schwarzer·
I've decided to leave OpenAI. I'm incredibly proud of all the work I've been part of here, from helping create the reasoning paradigm with @MillionInt, scaling up test-time compute with @polynoamial, working on RL algorithms with my fellow strawberries, shipping o1-preview (which started life as of one of my derisking runs), to post-training o1 and o3 with @ericmitchellai, @yanndubs and many others. I'm most proud of having led the post-training team here for the last year -- the team has done incredible work and shipped some really smart models, including GPT-5, 5.1, 5.2, and 5.3-Codex. OpenAI has genuinely some of the most talented researchers I have ever met, and I have learned more than I could have imagined knowing since I joined as a new grad. I want to thank @markchen90 @FidjiSimo @sama @merettm for all their support over my time here, and too many collaborators to name for the insights, ideas, and just plain fun we have had working together. After leading post-training for a year, though, I'm longing to start fresh and return to IC research work. I've been thinking about going back to technical research for quite some time, and I genuinely believe my colleagues and team here are set up to succeed going forward without me. I'm personally very excited for my next chapter -- I'm proud to be joining @AnthropicAI to get back into the weeds in RL research, and I'm looking forward supporting my friends there at this important time. Many of people I most trust and respect have joined Anthropic over the last couple of years, and I'm excited to work with them again. I have also been very impressed with Anthropic's talent, research taste and values, and I'm excited to be part of what the company does next!
English
610
1.2K
21.2K
3.2M
Tianwei Ni retweetledi
Guozheng Ma
Guozheng Ma@Guozheng_Ma·
O2O RL is complex because you have access to both a static dataset and a live environment. Two priors that don't always agree. 🤔Most prior work asks: what's the best algorithm? This paper asks: what's the real principle behind O2O RL? The answer is stability-plasticity balance. With that lens, algorithm design becomes adaptation, not guesswork. 💡Under this framework, O2O RL becomes a series of quantifiable decisions: - Which prior is stronger? - Where to anchor stability? - Is extra plasticity needed? ...
Tianwei Ni@twni2016

Offline-to-online RL fine-tuning feels unpredictable: methods that work in one task can collapse in another. In work led by @luli_airl, we argue this isn’t noise — it’s a stability–plasticity mismatch driven by where prior knowledge lives. Paper: arxiv.org/abs/2510.01460 🧵

English
0
2
10
1.2K
Tianwei Ni retweetledi
Lu Li
Lu Li@luli_airl·
Excited to share our new paper (arxiv.org/abs/2510.01460) and blog (twni2016.github.io/blogs/policyfi…) on offline-to-online reinforcement learning. Grateful to my amazing coauthor for their insight and key contributions.🙏
Tianwei Ni@twni2016

Offline-to-online RL fine-tuning feels unpredictable: methods that work in one task can collapse in another. In work led by @luli_airl, we argue this isn’t noise — it’s a stability–plasticity mismatch driven by where prior knowledge lives. Paper: arxiv.org/abs/2510.01460 🧵

English
0
7
55
5.2K
Tianwei Ni
Tianwei Ni@twni2016·
Offline-to-online RL fine-tuning feels unpredictable: methods that work in one task can collapse in another. In work led by @luli_airl, we argue this isn’t noise — it’s a stability–plasticity mismatch driven by where prior knowledge lives. Paper: arxiv.org/abs/2510.01460 🧵
Tianwei Ni tweet media
English
3
16
142
12.6K