Alexandre L.-Piché (@alexpiche_) - Twitter Profili

Sabitlenmiş Tweet

Very excited to see vLLM supports Pipeline RL’s in-flight weight updates! It allowed our team to quickly and reliably train Qwen base 7B to reason from scratch! Want to hear more? Join us at our Pipeline RL expo talk at CoLM this Thursday 1PM room 524C.

vLLM@vllm_project

🚀 The RL community keeps pushing boundaries — from better on-policy data and partial rollouts to in-flight weight updates that mix KV caches across models during inference. Continuing inference while weights change and KV states stay stale sounds wild — but that’s exactly what PipelineRL makes work. vLLM is proud to power this kind of modular, cutting-edge RL innovation. Give it a try and share your thoughts!

English

1

10

26

3.5K

Alexandre L.-Piché retweetledi

Alexandre Drouin@alexandredrouin·2 Ara

If you're at @NeurIPSConf tomorrow, come say hi and learn about our work on training web agents 🤖🌐

Massimo Caccia@MassCaccia

I’m at NeurIPS all week and presenting this work tomorrow! Come say hi or DM if you want to talk agent post-training, large-scale RL or Enterprise agents :) 📅 Wednesday, December 3 🕐 11:00 AM 📍Exhibit Hall C,D,E #4109

English

1

2

3

382

Alexandre L.-Piché@alexpiche_·25 Kas

It’s my last week at @ServiceNowRSRCH after joining ElementAI as an intern in 2018. Grateful for the incredible mentors and collaborators over the years. Ending things on a high note: PipelineRL won Best Paper at nowAI earlier this month! Hi @lawrennd 👋

English

1

0

21

732

Alexandre L.-Piché retweetledi

finbarr@finbarrtimbers·20 Kas

With in-flight updates (PipelineRL, from @alexpiche_, @DBahdanau et. al), we update our actors in the middle of generation. The system is much faster as we don't have to drain the generation queues to update the weights (which is the same problem as static batching).

English

1

11

684

Alexandre L.-Piché retweetledi

Hamish Ivison@hamishivi·6 Kas

to continue the PipelineRL glazing, @finbarrtimbers implemented PipelineRL for open-instruct a little bit ago and it ended up being probably the single biggest speedup to our overall pipeline. We went from 2-week long RL runs to 5-day runs, without sacrificing performance (combined with some other threading etc. updates). Here's IFEval perf for an internal model (same data, same starting model, same bsz). Same number of training steps, same end perf, but PipelineRL is much faster.

Rishabh Agarwal@agarwl_

Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to generator (to generate data from our latest policy being trained). (Conventional PPO-off-policy) A naive approach would be to "start generators on a batch, wait for all sequences to complete, update the model weights for both trainers and generators, and repeat. Unfortunately, this approach leads to idle generators and low pipeline efficiency due to heterogeneous completion times. (Pipeline-RL) Instead, we simply let the generators continue generating tokens without discarding or finishing ongoing generations in-flight whenever we need to do a weight update -- doing an "in-flight" weight update. As such our KV caches for these generations would be stale, as they would come from LLM with earlier copy(ies) of the weights) but this is ok (see below).

English

6

34

225

54.6K

Alexandre L.-Piché retweetledi

Rishabh Agarwal@agarwl_·6 Kas

Don't sleep on PipelineRL -- this is one of the biggest jumps in compute efficiency of RL setups that we found in the ScaleRL paper (also validated by Magistral & others before)! What's the problem PipelineRL solves? In RL for LLMs, we need to send weight updates from trainer to generator (to generate data from our latest policy being trained). (Conventional PPO-off-policy) A naive approach would be to "start generators on a batch, wait for all sequences to complete, update the model weights for both trainers and generators, and repeat. Unfortunately, this approach leads to idle generators and low pipeline efficiency due to heterogeneous completion times. (Pipeline-RL) Instead, we simply let the generators continue generating tokens without discarding or finishing ongoing generations in-flight whenever we need to do a weight update -- doing an "in-flight" weight update. As such our KV caches for these generations would be stale, as they would come from LLM with earlier copy(ies) of the weights) but this is ok (see below).

Alexandre L.-Piché@alexpiche_

In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave: youtu.be/Z1uEuRKACRs

English

12

59

477

131.2K

Lewis Tunstall@_lewtun·4 Kas

In the Smol Training Playbook, I tried to survey the state of popular post-training frameworks. Let me know if I missed any and I'll add them to the list!

English

20

15

195

14.9K

Alexandre L.-Piché@alexpiche_·4 Kas

@_lewtun Hi @_lewtun, thank you for including PipelineRL! We have some multimodal support, see the chartqa example github.com/ServiceNow/Pip…

English

1

0

2

110

Alexandre L.-Piché@alexpiche_·4 Kas

In-flight weight updates have gone from a “weird trick” to a must to train LLMs with RL in the last few weeks. If you want to understand the on-policy and throughput benefits here’s the CoLM talk @DBahdanau and I gave: youtu.be/Z1uEuRKACRs

YouTube

English

1

29

143

68.6K

Alexandre L.-Piché retweetledi

Michael Goin@mgoin_·13 Eki

@natolambert Shout out to PipelineRL for open sourcing this in April based on vLLM github.com/ServiceNow/Pip…

English

0

6

31

1.8K

Alexandre L.-Piché@alexpiche_·9 Eki

Room change again. We are now in 522!

Alexandre L.-Piché@alexpiche_

If you’re like Yuki and you are tired of slow RL. Join us in 15 minutes to learn more about Pipeline RL in room 523AB.

English

0

6

361

Alexandre L.-Piché@alexpiche_·9 Eki

If you’re like Yuki and you are tired of slow RL. Join us in 15 minutes to learn more about Pipeline RL in room 523AB.

🇺🇦 Dzmitry Bahdanau@DBahdanau

We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: arxiv.org/pdf/2509.19128… We'll present today at CoLM EXPO, room 524C, 1pm!

English

0

1

8

848

Alexandre L.-Piché@alexpiche_·9 Eki

Very excited to be presenting Pipeline RL this afternoon at CoLM. Join us if you are interested in fast on policy RL training for LLMs 🚀

🇺🇦 Dzmitry Bahdanau@DBahdanau

We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: arxiv.org/pdf/2509.19128… We'll present today at CoLM EXPO, room 524C, 1pm!

English

0

8

21

3.1K

Alexandre L.-Piché retweetledi

🇺🇦 Dzmitry Bahdanau@DBahdanau·9 Eki

We did lots of good work since PipelineRL release in May: ⚙️ higher throughput, seq parallel training, multimodal, agentic RL 📜 white paper with great explanations and results: arxiv.org/pdf/2509.19128… We'll present today at CoLM EXPO, room 524C, 1pm!

English

2

8

59

7.8K

Alexandre L.-Piché retweetledi

Torsten Scholak@tscholak·6 Eki

🧠 Call for Interns – ServiceNow AI Research (Montreal) Our Foundation Models Lab is recruiting interns for 2026! We train & optimize LLMs, from diffusion-based generation to state-space hybrids. If you care about efficient LLMs, diffusion or reasoning → this is for you. 🧵👇

English

5

22

139

10.9K

Alexandre L.-Piché retweetledi

Massimo Caccia@MassCaccia·6 Eki

Come join us in beautiful Montreal! 🇨🇦✨ Included: 🏓 Ping pong 🧠 Top-tier publications (and deadlines 👿) 💻 lots of Compute 🍪 okay Snacks 💸 Stipend

Alexandre Lacoste@alex_lacoste_

🚨 Call for Interns – ServiceNow AI Research (Montreal) Our Computer-Use Agents team (Frontier AI Research) is recruiting interns for 2026! We work on LLMs and VLMs that can reliably use software and publishing at top venues (NeurIPS, ICML, ICLR) and developing open-source tools like BrowserGym, WorkArena & AgentLab. We have 5 internship tracks ⬇️ 1️⃣ Post-training for CUAs 2️⃣ Tuning Closed-Cource Models 3️⃣ Automatic Red-teaming 4️⃣ Engineering 5️⃣ Privacy-Preserving Team Agents 📍 If you’re at #COLM2025, come find us at the ServiceNow booth! 🧵👇 details in thread

English

0

2

28

3.1K

Alexandre L.-Piché retweetledi

vLLM@vllm_project·5 Eki

🚀 The RL community keeps pushing boundaries — from better on-policy data and partial rollouts to in-flight weight updates that mix KV caches across models during inference. Continuing inference while weights change and KV states stay stale sounds wild — but that’s exactly what PipelineRL makes work. vLLM is proud to power this kind of modular, cutting-edge RL innovation. Give it a try and share your thoughts!

🇺🇦 Dzmitry Bahdanau@DBahdanau

I am excited to open-source PipelineRL - a scalable async RL implementation with in-flight weight updates. Why wait until your bored GPUs finish all sequences? Just update the weights and continue inference! Code: github.com/ServiceNow/Pip… Blog: huggingface.co/blog/ServiceNo…

English

8

64

466

88.5K

Alexandre L.-Piché retweetledi

Sai Rajeswar@RajeswarSai·12 Eyl

💡So far, I have been sharing our multimodal AI research at @ServiceNow focused on reasoning over pixels. Today, we share a new chapter with an open-source release of our big initiative in the voice and speech domain.🚀 🎧 AU-Harness: Holistic Evaluation of Audio LLM Responses

English

1

6

19

519

Alexandre L.-Piché@alexpiche_·8 Eyl

Glad to see OpenAI prioritizing abstention responses in their paper! That's a great intro to our TMLR paper in which we developed an iterative self-reflection method for LLM to know when to abstain without ground truth and no additional cost at test time. openreview.net/pdf?id=SvKPfch…

Adam Tauman Kalai@adamfungi

New research explains why LLMs hallucinate, through a connection between supervised and self-supervised learning. We also describe a key obstacle that can be removed to reduce them. 🧵openai.com/index/why-lang…

English

1

10

18

4.4K

Alexandre L.-Piché

Keşfet