Kevin Lu

72 posts

Kevin Lu banner
Kevin Lu

Kevin Lu

@_kevinlu

Research @thinkymachines

SF 🏳️‍🌈 Katılım Ekim 2020
298 Takip Edilen10.5K Takipçiler
Kevin Lu retweetledi
Mira Murati
Mira Murati@miramurati·
Grateful to Jensen and @nvidia team for their support. Together, we’re working to deploy at least 1GW of Vera Rubin systems, bringing adaptable collaborative AI to everyone. thinkingmachines.ai/nvidia-partner…
Mira Murati tweet media
English
168
288
3.8K
536.3K
Kevin Lu
Kevin Lu@_kevinlu·
encouraging progress on continued test-time adaptation beyond model deployment! very excited about the future of personalized models, and developing reliable, easy-to-use pipelines to enable robust & personalized intelligence i think the "no TTT" baseline from Section 4.5 is particularly neat, justifying training with gradient steps at test time
Kevin Lu tweet media
Mert Yuksekgonul@mertyuksekgonul

How to get AI to make discoveries on open scientific problems? Most methods just improve the prompt with more attempts. But the AI itself doesn't improve. With test-time training, AI can continue to learn on the problem it’s trying to solve: test-time-training.github.io/discover.pdf

English
0
11
128
15.9K
Kevin Lu retweetledi
Boyuan Chen
Boyuan Chen@BoyuanChen0·
Introducing Large Video Planner (LVP-14B) — a robot foundation model that actually generalizes. LVP is built on video gen, not VLA. As my final work at @MIT, LVP has all its eval tasks proposed by third parties as a maximum stress test, but it excels!🤗 boyuan.space/large-video-pl…
English
21
96
577
94.1K
Kevin Lu
Kevin Lu@_kevinlu·
in the past couple months of closed beta, Tinker has been used to solve Putnam, has powered our blog posts, and has been accelerating internal research! excited to see the innovation from making trillion-parameter RL broadly available -- Tinker is a dream for multi-agent setups, personalization, and continual adaptation
Thinking Machines@thinkymachines

Tinker is now generally available. We also added support for advanced vision input models, Kimi K2 Thinking, and a simpler way to sample from models. thinkingmachines.ai/blog/tinker-ge…

English
2
10
184
26.5K
Kevin Lu retweetledi
Muyu He
Muyu He@HeMuyu0327·
On-policy distillation would revolutionize multi-turn tool-use training beyond RL, but neither Tinker nor TRL which implements on-policy supports anything other than single-turn distillation. We therefore have taken this upon ourselves and implemented this feature in native Tinker. Specifically, with a trainable Tinker client, a model can now call a list of tools, interact with tool results for multiple turns, and return tokens, logprobs, and reward masks sufficient for a distillation training job (p1-2). The engineering we have achieved is to implement tool calling and parsing for Tinker models, which lies in @thinkymachines 's TODO list in their tinker_cookbook code (p3). Apart from that, we also create a dedicated inference stream that spins up robust, multi-turn tool loop that can run alongside a training job and sync the weights in real time. It becomes easy to write a simple training loop with KL loss to run on-policy distillation with tool use. This opens the door for a new domain of application in agentic LLM because small/medium models now have access to dense, on-policy rewards from a swarm of SOTA large models (deepseek, gpt-oss). We will next up begin our training runs and see how they compare with traditional RL/SFT on multi-turn tool use.
Muyu He tweet mediaMuyu He tweet mediaMuyu He tweet media
English
11
33
282
19.9K
Kevin Lu
Kevin Lu@_kevinlu·
will be at #NeurIPS2025 — excited to chat about synthetic data, tinker, research, or just catching up :) we are also hiring and will be giving out tinker credits, so if you are interested in working with us, please reach out!
English
5
5
160
12.8K
Kevin Lu retweetledi
Astropulse
Astropulse@RealAstropulse·
Man being able to trick nano banana into making real pixels opens SO many doors
Astropulse tweet mediaAstropulse tweet mediaAstropulse tweet media
English
68
121
3.8K
224.9K
Kevin Lu retweetledi
Soumith Chintala
Soumith Chintala@soumithchintala·
thinking machines....the people are incredible
English
148
74
3.3K
801.8K
Kevin Lu retweetledi
Carlos Miguel Patiño
Carlos Miguel Patiño@cmpatino_·
We also replicate the "Distillation for personalization" results from @_kevinlu and @thinkymachines by improving the code performance of a model with SFT and then recovering it's IFEval scores with distillation.
Carlos Miguel Patiño tweet media
English
1
3
11
2.3K
Kevin Lu retweetledi
Kevin Lu
Kevin Lu@_kevinlu·
thanks! i think that's probably more of a feature -- you are basically distilling uncertainty / the teacher's value function into the student in this case. insofar as the signal afterwards is less informative, i think @WendaXu2 @agarwl_ & co have some interesting work: arxiv.org/abs/2410.11325
English
0
0
7
390
Igor Mordatch
Igor Mordatch@IMordatch·
@_kevinlu @agarwl_ @Alibaba_Qwen Excellent post! +1 that on-policy distillation + continual learning is very interesting. Though with teachers that use backtracking, do you not often get pulled to first token of "...wait, that's not right" after student starts faltering (with signal after that less informative)?
English
1
0
6
526
Kevin Lu
Kevin Lu@_kevinlu·
in our new post, we walk through great prior work from @agarwl_ & the @Alibaba_Qwen team exploring on-policy distillation using an open source recipe: you can run our experiments on Tinker today! github.com/thinking-machi… i'm especially excited by the use of on-policy distillation to enable new "test-time training" personalization methods, allow the model to learn new domain knowledge without regressing on post-training capabilities
Thinking Machines@thinkymachines

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-policy…

English
14
29
370
95.2K
strongsignal
strongsignal@strong_signal1·
@_kevinlu @agarwl_ @Alibaba_Qwen Awesome paper - wondering if you happened to do an ablation where you retrained the now distilled model with RL again e.g train Qwen 8B to improve AIME, then distill back to original, then take that model and did RL on it again (continuous continuous learning?)
English
1
0
1
590
duccio
duccio@ducciolvp·
@_kevinlu @agarwl_ @Alibaba_Qwen will it work if the "teacher" M is the same size as the student and on-policy distillation is used to recover some of the IF abilities lost by M during OOD SFT?
English
1
0
1
381
Thomas Ip
Thomas Ip@_thomasip·
@_kevinlu @agarwl_ @Alibaba_Qwen Can on-policy distillation somehow be applied when the teacher and student model uses a different vocabulary? Distilling a large qwen to a small qwen is good in theory but what if I want to distill another model family into qwen?
English
1
0
2
678
Kevin Lu
Kevin Lu@_kevinlu·
@agarwl_ @Alibaba_Qwen this "continual learning" problem was previously identified by @IdanShenfeld @jyo_pari @__howardchen, who have shown that on-policy methods regress significantly less than SFT when performing domain adaptation x.com/jyo_pari/statu…
Jyo Pari@jyo_pari

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

English
2
1
34
4.5K