Rishabh Tiwari (@rish2k1) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Very excited about this line of research of fast-slow learning, 1) potential to solve a lot of issues with current RL (eg. entropy collapse, sparse rewards) 2) an intuitive way of incorporating rich feedback with RL 3) provides a way to transfer knowledge of text-only based learning into the model 4) a great candidate for model-harness co-evolution, seeing a lot discussion on X lately about future models developing their own harness. 5) most importantly, can imagine these kinds of algorithms to be more suitable candidates for discovery that requires both extreme exploration but at the same time improving the underlying model capabilities. and much more ...

Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English

3

25

170

27.2K

Rishabh Tiwari retweetledi

Arnav Chavan@ArnavChavan6·6d

🚀 Organizing the Efficient Qwen Competition @icmlconf ! Goal: Minimize LLM inference latency for a single GPU without breaking model quality. Prizes: $3K / $2K / $1K + present at ICML 2026, Seoul Getting Started - adaptfm.gitlab.io/call-for-compe… Leaderboard - d1krc5fcnf73gi.cloudfront.net

English

6

16

144

10.3K

Rishabh Tiwari retweetledi

Haocheng Xi@HaochengXiUCB·6d

Proud to be part of StreamDiffusionV2! Streaming video generation opens up a very different -algorithmic-systems codesign problem: low latency, continuous interaction, and maintaining quality over time. Excited to see this direction recognized at #MLSys26!

Chenfeng_X@Chenfeng_X

Excited that our paper StreamdiffusionV2 received the Best Research Paper Award at #MLSys26! 🚀Video generation is quickly moving from demos to production-facing workloads. It is no longer a turn-based pipeline but should be a streaming pipeline to interact with users. 📖Our project page: streamdiffusionv2.github.io and paper: arxiv.org/pdf/2511.07399 👂Come join the talk if you are interested in streaming video generation. Our talk will be at the Research Track Oral Presentation: Best Paper Session on Tue 8:45AM at #MLSys26 , I will talk about how we attacked the efficiency and quality challenges. Hope to see you there! ❤️Huge thanks to all authors! This work would not have been possible without the incredible effort from the entire team. Big shout out to Tianrui Feng, Zhi Li, @Andy_ShuoYang , @HaochengXiUCB, @lmxyy1999 , @lvminzhang , @xiuyu_l , Keting Yang, @ZiqiPeng, @songhan_mit , @magrawala, @KurtKeutzer , and @cumulo_autumn

English

0

2

19

2.2K

Rishabh Tiwari@rish2k1·6d

@adityastomar_ @nvidia @NVIDIAAI GPU rich 🔥, congrats 🙌

Français

0

1

184

Aditya Tomar@adityastomar_·19 May

Excited to begin my summer research internship at @nvidia today. I’ll be working in the Applied Deep Learning Research team in the Santa Clara HQ office. Let me know if you are around and would like to meet!

English

38

4

457

14.8K

Rishabh Tiwari retweetledi

Alex Dimakis@AlexGDimakis·19 May

Learning in Prompts: Fast learning, Learning in weights: Slow learning. How to combine them iteratively!

Lakshya A Agrawal@LakshyAAAgrawal

Learning from rich textual feedback (errors, traces, partial reasoning) beats scalar reward alone for LLM optimization. GEPA demonstrated this for context-space optimization (prompts and agent harnesses), delivering frontier results at a fraction of the cost of RL. But context-only optimization is bounded by the base model's capability ceiling; weight updates can reach further. Very excited about this new line of work on Fast-Slow Training (FST), which interleaves context and model weight optimization! The idea is a clean division of labor between two interleaved loops: 🔹 Fast loop (context): GEPA reads rich rollout feedback updating the context layer. The context becomes a fast-updating scratchpad of what the model needs to know about this task, right now. 🔹 Slow loop (model parameters): RL updates the model's parameters conditioned on the evolving context. Because the prompt already carries task-specific nuances, the model parameters are freed from absorbing them and focus on what actually generalizes across tasks and pushes the frontier. ⦁ 3× more sample-efficient than RL on math, code, and physics reasoning ⦁ ~70% lower KL divergence from base at matched accuracy ⦁ Plasticity preserved: FST checkpoints respond better to additional RL on new tasks than RL-only ones ⦁ Continual learning across changing tasks (HoVer → CodeIO → Physics) where RL stalls the moment the task switches FST is a direction towards: ⦁ Addressing RL's pain points: entropy collapse, sparse rewards, long-horizon exploration ⦁ Providing a clean channel for rich feedback into weight updates ⦁ Demonstrating model-harness co-evolution ⦁ Discovery: Using fast context updates for broad exploration, while leveraging a continually improving model. Check out the full thread below:

English

6

3

35

5.6K

Rishabh Tiwari retweetledi

Chenfeng_X@Chenfeng_X·19 May

Excited that our paper StreamdiffusionV2 received the Best Research Paper Award at #MLSys26! 🚀Video generation is quickly moving from demos to production-facing workloads. It is no longer a turn-based pipeline but should be a streaming pipeline to interact with users. 📖Our project page: streamdiffusionv2.github.io and paper: arxiv.org/pdf/2511.07399 👂Come join the talk if you are interested in streaming video generation. Our talk will be at the Research Track Oral Presentation: Best Paper Session on Tue 8:45AM at #MLSys26 , I will talk about how we attacked the efficiency and quality challenges. Hope to see you there! ❤️Huge thanks to all authors! This work would not have been possible without the incredible effort from the entire team. Big shout out to Tianrui Feng, Zhi Li, @Andy_ShuoYang , @HaochengXiUCB, @lmxyy1999 , @lvminzhang , @xiuyu_l , Keting Yang, @ZiqiPeng, @songhan_mit , @magrawala, @KurtKeutzer , and @cumulo_autumn

English

5

33

211

56.7K

Rishabh Tiwari@rish2k1·16 May

Still there is no restriction stopping us from making heavy edits in the whole context, and thus can expect the model to considerably change its response, whereas we cant do the same in weight space. So in short, we can make large changes in context in one step (no matter how much time it takes to generate that step and how long the context grows).

English

0

1

30

Nilesh Gupta@nileshgupta2797·16 May

@rish2k1 I see I see! in this view also - at some context length, context becomes "slow" no? i.e. you need to update the context a lot to make the LLM output meaningfully differ?

English

1

0

1

35

Nilesh Gupta@nileshgupta2797·15 May

Very cool work; food for thought - at what context length context becomes slow and weight updates become fast? 🤔

Rishabh Agarwal@agarwl_

Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: gepa-ai.github.io/gepa/blog/2026…

English

1

14

2.3K

Rishabh Tiwari retweetledi

Rishabh Tiwari@rish2k1·15 May

interesting question, would love to add some nerdy comments, as RA suggests in his post the inspiration of "fast" and "slow" comes from Hinton and Plaut 1987 work, therefore we define "fast" weights as fast moving parameters (can make huge jumps in each update) and "slow" weights as gradually improving parameters (local changes). But one can arbitrarily scale compute in calculating the update for both parameters.

English

1

161

Rishabh Tiwari retweetledi

//TODO: fix later 🐳@enjoyingthewind·14 May

@nicbstme This itself is not enough, you also need to push stuff to weights periodically by retraining on the rollouts and accumulated context. Something like: x.com/KushaSareen/st…

Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English

1

3

176

Rishabh Tiwari retweetledi

Nikhil Parthasarathy@nikparth1·15 May

Nice! I've always thought this was one of the foundational principles of biological learning that we've never managed to get right in AI models. This is a nice iteration on how to instantiate the "fast and slow" idea in the modern LLM world that seems to work pretty well!

Rishabh Agarwal@agarwl_

Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: gepa-ai.github.io/gepa/blog/2026…

English

0

4

22

4.6K

Rishabh Tiwari@rish2k1·15 May

Thanks for sharing, I agree with the motivations and ideas you mentioned, for better understanding it can be seen as FST instantiation where: *slow weights* update rule = self distillation *fast weights* update rule = GEPA we did try one experiment in the same spirit in which we distilled FST fast-weights (gepa style prompt) back to the model using on-policy reverse KL (similar to SDFT paper) and leads to some learning but performs worse than FST w/ GEPA+RL (@LakshyAAAgrawal explained this result in more detail here: x.com/LakshyAAAgrawa…). The idea of combining RLVR signal with self distillation signal is also very interesting and we did try that as well some time back in a related project, we are planning to release that as well soon.

English

2

3

10

861

Andreas Kirsch 🇺🇦@BlackHC·15 May

@agarwl_ Btw what are your thoughts on x.com/BlackHC/status…

Andreas Kirsch 🇺🇦@BlackHC

Amazing and timely paper! Congrats to the authors! 👏👏👏 This opens up exciting prospects for sample-efficient continual learning in LLMs. But, more importantly, this is a perfect complement to training-efficient gradient-free optimization methods (GEPA, AlphaEvolve, etc) 🤞 With this result, we could use GEPA on critique feedback to find Pareto-optimal in-context changes to improve performance in an empirical way while creating useful sample trajectories at the same time. Together with context distillation (here called self distillation), we can move these learnings into the model parameters without catastrophic forgetting 🎊 And, if you want to have some RL-like reward shaping, you can just multiply the distillation loss by a per-sample or per-token advantage, too 😊 I've been thinking about GEPA+context distillation before, and the big open research question for me was catastrophic forgetting. I was hoping that like RL, it's more on-policy, and other works have shown that RL suffers less from catastrophic forgetting. This work already answers this question in the affirmative. Really exciting! This would be quite interesting and easy to try in e.g. Tinker probably 😊

English

1

0

3

837

Rishabh Tiwari retweetledi

Rishabh Agarwal@agarwl_·15 May

Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: gepa-ai.github.io/gepa/blog/2026…

English

18

73

566

69.4K

Rishabh Tiwari@rish2k1·15 May

@zhongwen2009 @KushaSareen @agarwl_ @Devvrit_Khatri @LakshyAAAgrawal @inderjit_ml @profjoeyg @KurtKeutzer Thanks for sharing

English

0

2

37

Zhongwen Xu@zhongwen2009·15 May

@KushaSareen @rish2k1 @agarwl_ @Devvrit_Khatri @LakshyAAAgrawal @inderjit_ml @profjoeyg @KurtKeutzer You may be interested in our work, which optimizes both fast weights in-episode, and "slow weights" (playbook, game rules in our paper) across episodes. zhongwenxu.notion.site/Cogito-Ergo-Lu…

English

1

0

3

150

Kusha Sareen@KushaSareen·13 May

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English

20

92

542

130.1K

Rishabh Tiwari retweetledi

lovish@louvishh·13 May

very cool work!!

Rishabh Tiwari@rish2k1

Very excited about this line of research of fast-slow learning, 1) potential to solve a lot of issues with current RL (eg. entropy collapse, sparse rewards) 2) an intuitive way of incorporating rich feedback with RL 3) provides a way to transfer knowledge of text-only based learning into the model 4) a great candidate for model-harness co-evolution, seeing a lot discussion on X lately about future models developing their own harness. 5) most importantly, can imagine these kinds of algorithms to be more suitable candidates for discovery that requires both extreme exploration but at the same time improving the underlying model capabilities. and much more ...

English

0

3

13

3.5K

Rishabh Tiwari retweetledi

Devvrit@Devvrit_Khatri·15 May

ICL lets models adapt rapidly to changing tasks (✅), but the weights stay frozen - leaving performance gains on the table (⚠️). Fine-tuning (like SFT, RL) reaches a higher perf ceiling (✅), but is slow, can hurt OOD performance, and often reduces plasticity (⚠️). Why not combine the strengths (✅) of both? We introduce Fast-Slow Training (FST): fast weights (prompts) quickly capture task-specific nuances, while slow weights (model parameters) internalize the more general, task-agnostic reasoning patterns that should persist across tasks. FST reaches a higher perf asymptote while being more efficient. Since prompts absorb more of the task-specific information, the parameters do not need to move as much. As a result, the model stays closer to the base model, and preserves more plasticity for learning new tasks!

Rishabh Agarwal@agarwl_

Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: gepa-ai.github.io/gepa/blog/2026…

English

1

14

51

12.6K

Rishabh Tiwari@rish2k1·15 May

@RajaPatnaik Thank you :)

English

0

41

Raja Patnaik@RajaPatnaik·15 May

@rish2k1 Big fan of this work!

English

1

0

1

72

Rishabh Tiwari@rish2k1·13 May

Very excited about this line of research of fast-slow learning, 1) potential to solve a lot of issues with current RL (eg. entropy collapse, sparse rewards) 2) an intuitive way of incorporating rich feedback with RL 3) provides a way to transfer knowledge of text-only based learning into the model 4) a great candidate for model-harness co-evolution, seeing a lot discussion on X lately about future models developing their own harness. 5) most importantly, can imagine these kinds of algorithms to be more suitable candidates for discovery that requires both extreme exploration but at the same time improving the underlying model capabilities. and much more ...

Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English

3

25

170

27.2K

Rishabh Tiwari retweetledi

Raja Patnaik@RajaPatnaik·14 May

Your RL post-training should co-evolve with prompt optimization, not run before it. New paper out of Berkeley — Fast-Slow Training (FST). 3× more sample efficient than RL alone. 70% less KL drift. And the first continual learning result that actually holds:

English

1

6

29

2.1K

Rishabh Tiwari@rish2k1·14 May

Great article, I see a future where learning algorithms will co-evolve model-parameters and harness around around it for continuous improvement. Just like prompt engineering is better handled by a principled algorithm like GEPA, soon harness engineering will be handled by class of algorithms like FST (fast-slow training). x.com/KushaSareen/st…

English

1

2

13

1K

Kangwook Lee@Kangwook_Lee·9 May

x.com/i/article/2052…

ZXX

25

42

473

82.2K

Rishabh Tiwari retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·13 May

Learning from rich textual feedback (errors, traces, partial reasoning) beats scalar reward alone for LLM optimization. GEPA demonstrated this for context-space optimization (prompts and agent harnesses), delivering frontier results at a fraction of the cost of RL. But context-only optimization is bounded by the base model's capability ceiling; weight updates can reach further. Very excited about this new line of work on Fast-Slow Training (FST), which interleaves context and model weight optimization! The idea is a clean division of labor between two interleaved loops: 🔹 Fast loop (context): GEPA reads rich rollout feedback updating the context layer. The context becomes a fast-updating scratchpad of what the model needs to know about this task, right now. 🔹 Slow loop (model parameters): RL updates the model's parameters conditioned on the evolving context. Because the prompt already carries task-specific nuances, the model parameters are freed from absorbing them and focus on what actually generalizes across tasks and pushes the frontier. ⦁ 3× more sample-efficient than RL on math, code, and physics reasoning ⦁ ~70% lower KL divergence from base at matched accuracy ⦁ Plasticity preserved: FST checkpoints respond better to additional RL on new tasks than RL-only ones ⦁ Continual learning across changing tasks (HoVer → CodeIO → Physics) where RL stalls the moment the task switches FST is a direction towards: ⦁ Addressing RL's pain points: entropy collapse, sparse rewards, long-horizon exploration ⦁ Providing a clean channel for rich feedback into weight updates ⦁ Demonstrating model-harness co-evolution ⦁ Discovery: Using fast context updates for broad exploration, while leveraging a continually improving model. Check out the full thread below:

Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English

13

43

186

33K

Rishabh Tiwari retweetledi

Michael Griffiths@msjgriffiths·13 May

Now, this is a great framing the focuses on the duality of updates in discrete space (prompts) versus continuous space (weights).

Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English

0

3

5

748

Rishabh Tiwari

Keşfet