Tianwei Ni

208 posts

Tianwei Ni

@twni2016

On the job market for research scientist roles | Final-year PhD student @Mila_Quebec on RL | Prev @AmazonScience on LLMs | @continual_learn

Montreal, Canada Katılım Temmuz 2017

875 Takip Edilen569 Takipçiler

Sabitlenmiş Tweet

Tianwei Ni@twni2016·4 Şub

Excited to share my recent blog post on offline RL as policy prior learning. Offline RL is often framed as learning a safe, deploy-as-is policy from static data. I argue it should instead learn a policy prior optimized for future adaptation. 🧵 Blog: twni2016.github.io/blogs/policypr…

English

116

7.7K

Tianwei Ni retweetledi

Sony AI@SonyAI_global·5d

For 40+ years, building a robot that could rally with an elite human table tennis player at full speed was an unsolved problem. Sony AI's Ace research project set out to change that—and the results are now accepted for publication in @Nature and featured on the cover.

English

350

127.9K

Tianwei Ni retweetledi

Yihao Sun@Tobealegend24·20 Nis

Most VLA-RL frameworks inherit the complexity of LLM-RL infra but we found that none of it is necessary. We therefore introduce VLARLKit: A simple yet fast VLA RL framework. Code link: github.com/VLARLKit/VLARL…

English

115

8.9K

Tianwei Ni retweetledi

Continual RL Workshop@continual_learn·20 Nis

Standard RL assumes a stable world. The real world may not. ♾ Introducing the Continual RL Workshop @RL_Conference 2026, Montreal, Canada. 🤖 Agents should never stop learning! 🖼️ Site: sites.google.com/view/continual… 📄 Submit: sites.google.com/view/continual…

English

8.3K

Tianwei Ni retweetledi

Ziyan "Ray" Luo@RayZiyan41307·18 Nis

Our workshop @continual_learn announcement post was removed unexpectedly with no reason; it had significant reach (including RTs from our founding father, which were also removed). @X @Support Given its prior engagement, we hope to recover it. x.com/continual_lear…

English

706

Tianwei Ni retweetledi

RL Beyond Rewards Workshop@RLBRew_RLC·17 Nis

Scalar, well-defined, easy-to-optimize rewards aren’t always available in real-world interaction data–yet that data is crucial for scaling general-purpose agents. Excited to announce the 3rd edition of RLBRew: Towards Scalable General-Purpose Agents at @RL_Conference 2026!

English

5.4K

Tianwei Ni retweetledi

Rasool Fakoor@rasoolfa·13 Nis

Are you working on RL, principled ways to build RL envs for agent training, or effective evaluation for agents? Want to showcase your NeurIPS submission? or just discuss about research more broadly? Then consider submitting and attending to our first ever workshop on Methods and RL Environments for Evaluating AI Agents. Deadline: May 11 rl-eval.github.io

Jonas Mueller@jomulr

📢 Call for papers: Workshop on Methods and Reinforcement Learning Environments for Evaluating AI Agents @ ACM CAIS 2026 (inaugural edition!) Topics include: - Design principles for effective RL Environments - Methods to evaluate Agents, esp. causal/interventional techniques

English

981

Tianwei Ni retweetledi

Martin Klissarov@MartinKlissarov·13 Nis

Don’t miss this one!

Ziyan "Ray" Luo@RayZiyan41307

🔥Thrilled to announce the Continual Reinforcement Learning (CRL) Workshop @RL_Conference 2026 in Montreal, Canada! 📣 We welcome submissions on broad topics of continual RL. Interested in submitting or reviewing? Check out our website for more details!

English

741

Tianwei Ni retweetledi

RL in Big Worlds@rlc_bigworlds·11 Nis

RL in Big Worlds is a workshop at @RL_Conference about ideas that enable agents to achieve goals in environments vastly more complex than themselves This requires giving agents the ability to learn continually and use approximate value functions, models and policies effectively

English

179

84K

Tianwei Ni@twni2016·11 Nis

🔥Thrilled to announce the Continual Reinforcement Learning (CRL) Workshop @RL_Conference 2026 in Montreal, Canada! 📷 We welcome submissions on broad topics of continual RL. Interested in submitting or reviewing? Check out our website for more details!

English

581

Tianwei Ni retweetledi

Shenao Zhang@ShenaoZhang·17 Mar

Earlier this month, I wrote up a few thoughts on how to train LLMs as strong continual learners that can do test-time exploration and self-improve. Examples include emergent agentic capabilities such as error recovery and dynamic tool learning. Figured I’d share it here in case it’s useful to others :) notion.so/Towards-Contin…

English

109

7.9K

Tianwei Ni retweetledi

Yann LeCun@ylecun·10 Mar

Unveiling our new startup Advanced Machine Intelligence (AMI Labs). We just completed our seed round: $1.03B / 890M€, one the largest seeds ever, probably the largest for a European company. We're hiring! [the background image is the Veil Nebula - a picture I took from my backyard, most appropriate for an unveiling] More details here: techcrunch.com/2026/03/09/yan…

AMI Labs@amilabs

Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.

English

874

1.9K

19.2K

2.6M

Tianwei Ni retweetledi

Max Schwarzer@max_a_schwarzer·4 Mar

I've decided to leave OpenAI. I'm incredibly proud of all the work I've been part of here, from helping create the reasoning paradigm with @MillionInt, scaling up test-time compute with @polynoamial, working on RL algorithms with my fellow strawberries, shipping o1-preview (which started life as of one of my derisking runs), to post-training o1 and o3 with @ericmitchellai, @yanndubs and many others. I'm most proud of having led the post-training team here for the last year -- the team has done incredible work and shipped some really smart models, including GPT-5, 5.1, 5.2, and 5.3-Codex. OpenAI has genuinely some of the most talented researchers I have ever met, and I have learned more than I could have imagined knowing since I joined as a new grad. I want to thank @markchen90 @FidjiSimo @sama @merettm for all their support over my time here, and too many collaborators to name for the insights, ideas, and just plain fun we have had working together. After leading post-training for a year, though, I'm longing to start fresh and return to IC research work. I've been thinking about going back to technical research for quite some time, and I genuinely believe my colleagues and team here are set up to succeed going forward without me. I'm personally very excited for my next chapter -- I'm proud to be joining @AnthropicAI to get back into the weeds in RL research, and I'm looking forward supporting my friends there at this important time. Many of people I most trust and respect have joined Anthropic over the last couple of years, and I'm excited to work with them again. I have also been very impressed with Anthropic's talent, research taste and values, and I'm excited to be part of what the company does next!

English

610

1.2K

21.2K

3.2M

Tianwei Ni retweetledi

Rishabh Agarwal@agarwl_·27 Şub

I gave a guest lecture at McGill about scaling RL for LLMs, sharing the slides here. drive.google.com/file/d/1kM25WY…

English

160

1.3K

163.7K

Tianwei Ni retweetledi

Guozheng Ma@Guozheng_Ma·10 Şub

O2O RL is complex because you have access to both a static dataset and a live environment. Two priors that don't always agree. 🤔Most prior work asks: what's the best algorithm? This paper asks: what's the real principle behind O2O RL? The answer is stability-plasticity balance. With that lens, algorithm design becomes adaptation, not guesswork. 💡Under this framework, O2O RL becomes a series of quantifiable decisions: - Which prior is stronger? - Where to anchor stability? - Is extra plasticity needed? ...

Tianwei Ni@twni2016

Offline-to-online RL fine-tuning feels unpredictable: methods that work in one task can collapse in another. In work led by @luli_airl, we argue this isn’t noise — it’s a stability–plasticity mismatch driven by where prior knowledge lives. Paper: arxiv.org/abs/2510.01460 🧵

English

1.2K

Tianwei Ni retweetledi

Lu Li@luli_airl·10 Şub

Excited to share our new paper (arxiv.org/abs/2510.01460) and blog (twni2016.github.io/blogs/policyfi…) on offline-to-online reinforcement learning. Grateful to my amazing coauthor for their insight and key contributions.🙏

Tianwei Ni@twni2016

English

5.2K

Tianwei Ni@twni2016·10 Şub

This work is led by Lu Li @luli_airl and co-authored by me, @Tobealegend24, @pierrelux. We also wrote a blog with intuition and implications for modern post-training: twni2016.github.io/blogs/policyfi… Thanks to @Guozheng_Ma for feedback on the blog. Questions and comments welcome!

English

404

Tianwei Ni@twni2016·10 Şub

Related work: - Warm-Start RL (WSRL): arxiv.org/abs/2412.07762 - RL with Prior Data (RLPD): arxiv.org/abs/2302.02948 - Calibrated Q-learning (Cal-QL): arxiv.org/abs/2303.05479 - ReBRAC: arxiv.org/abs/2305.09836

English

389

Tianwei Ni@twni2016·10 Şub

English

142

12.6K

Keşfet

@Nature @RL_Conference @continual_learn @X @Support @MillionInt @polynoamial @ericmitchellai