Jyo Pari

157 posts

Jyo Pari banner
Jyo Pari

Jyo Pari

@jyo_pari

Working on continual learning | PhD @MIT

Boston Katılım Aralık 2021
903 Takip Edilen2.8K Takipçiler
Sabitlenmiş Tweet
Jyo Pari
Jyo Pari@jyo_pari·
What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.
Jyo Pari tweet media
English
133
511
3.2K
664.1K
Jyo Pari
Jyo Pari@jyo_pari·
Hard problems require more than bigger models, they require effective exploration at test time. 💡 @aviral_kumar2 will present new approaches for training LMs to scale test-time exploration, including solving IMO-level math problems. 🏅 🗓️ March 19, 4pm ET @scaleml
Jyo Pari tweet media
English
2
5
94
8.1K
Jyo Pari
Jyo Pari@jyo_pari·
As context windows grow 📈, continual learning matters more! @tianyuanzhang99 will present how to scale test-time training for effectively infinite context ♾ 🗓️ Feb 19, 3pm ET @scaleml
Jyo Pari tweet media
English
8
15
177
26K
Jyo Pari retweetledi
Locke Cai
Locke Cai@couplefire12·
RL for reasoning often rely on verifiers — great for math, but tricky for creative writing or open-ended research. Meet RARO: a new paradigm that teaches LLMs to reason via adversarial games instead of verification. No verifiers. No environments. Just demonstrations. 🧵👇
Locke Cai tweet media
English
24
78
611
177K
Jyo Pari
Jyo Pari@jyo_pari·
Next Tuesday, @shannonzshen will present hybrid chain-of-thought, a method that mixes latent and discrete tokens during decoding 🔥 🗓️ Nov 25, 3pm ET @scaleml
Jyo Pari tweet media
English
1
7
51
6.4K
Jyo Pari
Jyo Pari@jyo_pari·
Why do deep learning optimizers make progress even in the edge-of-stability regime? 🤔 @alex_damian_ will present theory that can describe the dynamics of optimization in this regime! 🗓️ Nov 17, 3pm ET @scaleml
Jyo Pari tweet media
English
0
10
73
7.8K
Jyo Pari retweetledi
idan shenfeld
idan shenfeld@IdanShenfeld·
Everyone’s talking about Kimi K2 Thinking and its impressive performance. No full report yet, but judging from Kimi K2\1.5 reports, it likely uses Policy Mirror Descent - an RL trick that’s quietly becoming standard in frontier labs. Let’s break down what it is:
idan shenfeld tweet media
English
12
46
477
58.8K
Jyo Pari retweetledi
Kevin Lu
Kevin Lu@_kevinlu·
in our new post, we walk through great prior work from @agarwl_ & the @Alibaba_Qwen team exploring on-policy distillation using an open source recipe: you can run our experiments on Tinker today! github.com/thinking-machi… i'm especially excited by the use of on-policy distillation to enable new "test-time training" personalization methods, allow the model to learn new domain knowledge without regressing on post-training capabilities
Thinking Machines@thinkymachines

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-policy…

English
14
29
370
95.3K
Jyo Pari retweetledi
Moritz Reuss
Moritz Reuss@moritz_reuss·
VLAs have become the fastest-growing subfield in robot learning. So where are we now? After reviewing ICLR 2026 submissions and conversations at CoRL, I wrote an overview of the current state of VLA research with some personal takes: is.gd/1pqw9w
English
11
106
533
53K
Jyo Pari
Jyo Pari@jyo_pari·
After weeks of learning about systems at @scaleml, we’re shifting gears to video foundation models. Thrilled to have @cloneofsimo sharing how to train them from scratch next Tuesday — no better person to learn from 🔥
Jyo Pari tweet media
English
5
11
127
30.5K
Jyo Pari
Jyo Pari@jyo_pari·
Next Tuesday, @scaleml hosts @kavnwang & Kristine Lu for a tutorial based on jax-ml.github.io/scaling-book/ 🚀 They'll cover distributed training/inference of large models, plus the math & tradeoffs of latency, throughput, and model size in GPU comms!
Jyo Pari tweet media
English
2
14
127
12.2K
Jyo Pari
Jyo Pari@jyo_pari·
@BlackHC @abeirami @IdanShenfeld This is a great question, we find that simply adding KL regularization to SFT isn’t enough. This is likely because their objectives are opposing and we posit that there should be more principled ways of incorporating the KL regularization.
English
0
0
4
99
Ahmad Beirami
Ahmad Beirami@abeirami·
This great work co-led by @IdanShenfeld and @jyo_pari shows that online RL leads to less forgetting because it inherently leads to a solution with a small reverse KL divergence! I'll try to discuss the significance of the result: 🧵
Jyo Pari@jyo_pari

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

English
2
4
32
7.5K