Kevin Lu

72 posts

Kevin Lu

@_kevinlu

Research @thinkymachines

SF 🏳️‍🌈 Katılım Ekim 2020

298 Takip Edilen10.5K Takipçiler

Kevin Lu retweetledi

Mira Murati@miramurati·10 Mar

Grateful to Jensen and @nvidia team for their support. Together, we’re working to deploy at least 1GW of Vera Rubin systems, bringing adaptable collaborative AI to everyone. thinkingmachines.ai/nvidia-partner…

English

168

288

3.8K

536.3K

Kevin Lu@_kevinlu·23 Oca

encouraging progress on continued test-time adaptation beyond model deployment! very excited about the future of personalized models, and developing reliable, easy-to-use pipelines to enable robust & personalized intelligence i think the "no TTT" baseline from Section 4.5 is particularly neat, justifying training with gradient steps at test time

Mert Yuksekgonul@mertyuksekgonul

How to get AI to make discoveries on open scientific problems? Most methods just improve the prompt with more attempts. But the AI itself doesn't improve. With test-time training, AI can continue to learn on the problem it’s trying to solve: test-time-training.github.io/discover.pdf

English

128

15.9K

Kevin Lu retweetledi

Boyuan Chen@BoyuanChen0·5 Oca

Introducing Large Video Planner (LVP-14B) — a robot foundation model that actually generalizes. LVP is built on video gen, not VLA. As my final work at @MIT, LVP has all its eval tasks proposed by third parties as a maximum stress test, but it excels!🤗 boyuan.space/large-video-pl…

English

577

94.1K

Kevin Lu@_kevinlu·12 Ara

in the past couple months of closed beta, Tinker has been used to solve Putnam, has powered our blog posts, and has been accelerating internal research! excited to see the innovation from making trillion-parameter RL broadly available -- Tinker is a dream for multi-agent setups, personalization, and continual adaptation

Thinking Machines@thinkymachines

Tinker is now generally available. We also added support for advanced vision input models, Kimi K2 Thinking, and a simpler way to sample from models. thinkingmachines.ai/blog/tinker-ge…

English

184

26.5K

Kevin Lu retweetledi

Muyu He@HeMuyu0327·12 Ara

On-policy distillation would revolutionize multi-turn tool-use training beyond RL, but neither Tinker nor TRL which implements on-policy supports anything other than single-turn distillation. We therefore have taken this upon ourselves and implemented this feature in native Tinker. Specifically, with a trainable Tinker client, a model can now call a list of tools, interact with tool results for multiple turns, and return tokens, logprobs, and reward masks sufficient for a distillation training job (p1-2). The engineering we have achieved is to implement tool calling and parsing for Tinker models, which lies in @thinkymachines 's TODO list in their tinker_cookbook code (p3). Apart from that, we also create a dedicated inference stream that spins up robust, multi-turn tool loop that can run alongside a training job and sync the weights in real time. It becomes easy to write a simple training loop with KL loss to run on-policy distillation with tool use. This opens the door for a new domain of application in agentic LLM because small/medium models now have access to dense, on-policy rewards from a swarm of SOTA large models (deepseek, gpt-oss). We will next up begin our training runs and see how they compare with traditional RL/SFT on multi-turn tool use.

English

282

19.9K

Kevin Lu retweetledi

Yangjun Ruan@YangjunR·3 Ara

I’ll be attending #NeurIPS starting Wednesday as part of @thinkymachines! Feel free to DM me if you’d like to catch up, chat about research, or learn more about Thinky (we have openings!)🤝 job-boards.greenhouse.io/thinkingmachin…

English

165

16.6K

Kevin Lu@_kevinlu·3 Ara

will be at #NeurIPS2025 — excited to chat about synthetic data, tinker, research, or just catching up :) we are also hiring and will be giving out tinker credits, so if you are interested in working with us, please reach out!

English

160

12.8K

Kevin Lu retweetledi

Astropulse@RealAstropulse·22 Kas

Man being able to trick nano banana into making real pixels opens SO many doors

English

121

3.8K

224.9K

Kevin Lu retweetledi

Soumith Chintala@soumithchintala·18 Kas

thinking machines....the people are incredible

English

148

3.3K

801.8K

Kevin Lu retweetledi

Thinking Machines@thinkymachines·7 Kas

Science is best shared! Tell us about what you’ve built or discovered with Tinker, so we can tell the world about it on our blog. More details at thinkingmachines.ai/blog/call-for-…

English

450

143.8K

Kevin Lu retweetledi

Carlos Miguel Patiño@cmpatino_·29 Eki

We also replicate the "Distillation for personalization" results from @_kevinlu and @thinkymachines by improving the code performance of a model with SFT and then recovering it's IFEval scores with distillation.

English

2.3K

Kevin Lu@_kevinlu·29 Eki

thanks to multi-tenancy and the incredible engineering effort of the team, tinker is now both a joy to use, and super cheap! hope to see you try it out 🙂

Thinking Machines@thinkymachines

Starting Monday, November 3rd, Tinker is switching to a pricing plan that reflects compute usage. This will ensure we have sufficient capacity to clear our waitlist by the end of the year, allowing anyone to sign up and start Tinkering. tinker-console.thinkingmachines.ai/rate-card

English

15.5K

Kevin Lu@_kevinlu·29 Eki

excited to see what academics build using tinker!

Thinking Machines@thinkymachines

Today we’re announcing research and teaching grants for Tinker: credits for scholars and students to fine-tune and experiment with open-weight LLMs. Read more and apply at: thinkingmachines.ai/blog/tinker-re…

English

17.8K

Kevin Lu retweetledi

Li Dong@donglixp·28 Eki

On-policy + Reverse KLD = MiniLLM (arxiv.org/abs/2306.08543). Really nice blog by @thinkymachines. Exciting to see it being offered as a service!

Thinking Machines@thinkymachines

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-policy…

English

163

19.7K

Kevin Lu retweetledi

Thinking Machines@thinkymachines·28 Eki

We just added 4 new models to Tinker from the gpt-oss and DeepSeek-V3.1 families. Sign up for the waitlist: thinkingmachines.ai/tinker/

English

556

425.5K

Kevin Lu@_kevinlu·28 Eki

thanks! i think that's probably more of a feature -- you are basically distilling uncertainty / the teacher's value function into the student in this case. insofar as the signal afterwards is less informative, i think @WendaXu2 @agarwl_ & co have some interesting work: arxiv.org/abs/2410.11325

English

390

Igor Mordatch@IMordatch·28 Eki

@_kevinlu @agarwl_ @Alibaba_Qwen Excellent post! +1 that on-policy distillation + continual learning is very interesting. Though with teachers that use backtracking, do you not often get pulled to first token of "...wait, that's not right" after student starts faltering (with signal after that less informative)?

English

526

Kevin Lu@_kevinlu·27 Eki

in our new post, we walk through great prior work from @agarwl_ & the @Alibaba_Qwen team exploring on-policy distillation using an open source recipe: you can run our experiments on Tinker today! github.com/thinking-machi… i'm especially excited by the use of on-policy distillation to enable new "test-time training" personalization methods, allow the model to learn new domain knowledge without regressing on post-training capabilities

Thinking Machines@thinkymachines

English

370

95.2K

Kevin Lu@_kevinlu·28 Eki

@strong_signal1 @agarwl_ @Alibaba_Qwen I think this can be interesting in specific setups

English

370

strongsignal@strong_signal1·28 Eki

@_kevinlu @agarwl_ @Alibaba_Qwen Awesome paper - wondering if you happened to do an ablation where you retrained the now distilled model with RL again e.g train Qwen 8B to improve AIME, then distill back to original, then take that model and did RL on it again (continuous continuous learning?)

English

590

Kevin Lu@_kevinlu·28 Eki

@ducciolvp @agarwl_ @Alibaba_Qwen i think this is the setting we study in the personalization section!

English

308

duccio@ducciolvp·28 Eki

@_kevinlu @agarwl_ @Alibaba_Qwen will it work if the "teacher" M is the same size as the student and on-policy distillation is used to recover some of the IF abilities lost by M during OOD SFT?

English

381

Kevin Lu@_kevinlu·28 Eki

@_thomasip @agarwl_ @Alibaba_Qwen there are some details you have to get right, but it should be possible! you can create a map between tokenizers

English

482

Thomas Ip@_thomasip·28 Eki

@_kevinlu @agarwl_ @Alibaba_Qwen Can on-policy distillation somehow be applied when the teacher and student model uses a different vocabulary? Distilling a large qwen to a small qwen is good in theory but what if I want to distill another model family into qwen?

English

678

Kevin Lu@_kevinlu·27 Eki

@agarwl_ @Alibaba_Qwen @IdanShenfeld @jyo_pari @__howardchen ...it also happens to still work effectively using only a single prompt, and can be 10-100x cheaper compared to running SFT or RL.

English

2.3K

Kevin Lu@_kevinlu·27 Eki

@agarwl_ @Alibaba_Qwen this "continual learning" problem was previously identified by @IdanShenfeld @jyo_pari @__howardchen, who have shown that on-policy methods regress significantly less than SFT when performing domain adaptation x.com/jyo_pari/statu…

Jyo Pari@jyo_pari

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

English

4.5K

Keşfet

@nvidia @MIT @thinkymachines @WendaXu2 @agarwl_ @Alibaba_Qwen @strong_signal1 @elonmusk