Shuyao Tim Xu
122 posts

Shuyao Tim Xu
@TimXu222575
long horizon @Kimi_Moonshot





😢RLVR is powerful but expensive 🤯Imagine using <20% RLVR training while achieving 100% performance? Sounds surprising? We show that minimal RLVR training is enough to know where training is going, and predict future ckpts at no training cost! 📃tinyurl.com/minimal-rlvr 🧵[1/n]



Been working on text feedback / OPSD in Composer. Really interesting space, and much more to be explored.




Hot take: OPSD/SDFT/SDPO works if and only if off policy context distillation works in the same setup. In both algorithms, what and where is the hint matter the most. For example, placing the hint in system prompt or system reminder is better then in user prompt, as it leaks less

Karpathy has been quiet on X recently, now I finally understand why. He's been prepping hard for the Anthropic interview!








🚀 We’re hiring! DeepSeek is forming a new Harness team to build Code Harness from the ground up—may be you can call it DeepSeek Code or something like this hhh🤣🤣🤣 📍 Based in Beijing. Two roles open: 🧠 Harness Product Manager → app.mokahr.com/social-recruit… 👨💻 Harness R&D Engineer → app.mokahr.com/social-recruit… Research meets product—let's build it together. Hit the links and apply directly! 🔥 #DeepSeek #CodeHarness #AI #Hiring #ProductManager #Engineering #Beijing #Referral

The next step toward automating AI is automating RL environments Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time 4,504 tool-use tasks · 1,040 domains · 8,159 unique tools

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.



Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.






