Yulun Du

159 posts

Yulun Du

Yulun Du

@Yulun_Du

Scaling @Kimi_Moonshot prev @LTIatCMU Opinions are my own.

Pittsburgh, PA Katılım Ekim 2012
830 Takip Edilen2.9K Takipçiler
Sabitlenmiş Tweet
Yulun Du retweetledi
熊师傅 weight decay 了吗
AttnRes is not just a typical "novelty paper". it stems from a much bigger project, co-designed by both model research and infra teams, with considerations that go way beyond just "lower loss" or "better expressivity". here is the "ultra think pro xhigh" part from inference infra team: zhuanlan.zhihu.com/p/201752829528… translation from k2.5: 5wvb5ya5wncq4.ok.kimi.link you can always trust the kimi solidness.
English
3
17
153
27.9K
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Love this place. Just noticed someone I'm following is called: "[Chinese] weight decay [Chinese]" lol
Lucas Beyer (bl16) tweet media
English
6
1
149
86.3K
Yulun Du
Yulun Du@Yulun_Du·
@YizhouLiu0 Not sure what you mean by bad. This component is data dependent.
English
1
0
7
1.3K
Yulun Du
Yulun Du@Yulun_Du·
So insightful :) SGD is residual on weights. Now I’m wondering where else attention might be useful 🤓
Andrej Karpathy@karpathy

@Yulun_Du @ilyasut SGD is a ResNet too (the blocks of it are fwd+bwd), the residual stream is the weights so... 🤔 We're not taking the Attention is All You Need part literally enough? :D

English
2
3
238
31.7K
Yulun Du
Yulun Du@Yulun_Du·
I personally love this blog. Two of my colleagues had already recommended it to me. It has great taste and deep technical merit.
xjdr@_xjdr

Noumena.com/research

English
1
4
113
21.8K
Yulun Du
Yulun Du@Yulun_Du·
Really cool work from Xinyu and the MetaClaw team. It treats continuous agent improvement as both a product loop and a model training problem — skill injection for immediate gains, RL/OPD for longer-term evolution, and a setup simple enough for real use. And it uses Kimi K2.5 :) imo this is the kind of systems thinking that agent products need. 🫡
Xinyu Yang@Xinyu2ML

With one click, you can launch your own Kimi-2.5 as a production-level personalized agent.

English
2
8
44
7.1K
JingyuanLiu
JingyuanLiu@JingyuanLiu123·
Some updates: I've always been bullish on TML, and I actually joined TML this Monday Looking back, I am feeling so lucky that I have the privilege to work closely with the best optimization experts on the Muon optimizer ( @Jianlin_S from Kimi and @clu_cheng from Meta). Now I am so excited to be able to work with @jxbz and build new cool things! (On the other hand, there have always been some bad rumors about Meta TBD's potential failure. That's not true! From my personal experiences, it really has the best talents in the field, and I really enjoyed learning from the lab. The avocado model will for sure be great!)
JingyuanLiu@JingyuanLiu123

hmm I sort of disagree and I am bullish for TML. I think they really really have the top talents that I admire in the field, e.g. Jeremy and Sam for optimization, Songlin for Attn, Lia for MoE, Andrew for FSDPv2, and a bunch more folks it's just natural that it takes a while to publish good models: - dpsk starts to publish papers in 2023, even piblished dspkv2 (which I think is already amazing) in mid 2024 and nobody cares, until dpskv3 and r1 - msh took 10+ month to deliver a first not bad long ctx model in 2023 and be silent for the whole 2024 year, and starts to catch up gradually in 2025 - qwen starts to be a much better model than llama until qwen2.5, mid or late 2024, while the lab has been there forever it takes time to get infra and data done, but as long as you have good folks, and principled ways of doing science and experiments, some time or later, scaling laws will pay back

English
41
9
273
53.3K
Yulun Du
Yulun Du@Yulun_Du·
@Xinyu2ML 哥,你真牛逼,the Xin(yu)formation
中文
0
0
27
9.6K
Xinyu Yang
Xinyu Yang@Xinyu2ML·
中文发一下今天通义大会的内容吧,感觉是没有转机了 1. 首席hr自称这波调整是扩充更多人才,提供更多资源 2. 阿里是模型公司,qwen是集团的事情,而不只是基模的事情,集团来做大闭环,要快速发展,组织形式没沟通好 3. qwen是集团最重要的事情,希望人才来扩大,必然涉及到阵型变化,无论怎么变化希望大家做好。什么东西都不是没有代价的。用junyang一个人的脑子来处理肯定高效,但站着jingren的角度,需要考虑把zhouhao放在什么位置上比较高效,全过程没有考虑过政治因素(btw昨天高层的说法是,zhouhao比较担心一开始融不进qwen团队,所以主动要求把自己先放在jingren下面,高层就答应了) 4. 我们做的事情很宏大,100多个人肯定不够,需要扩张,很难照顾到每个人的想法 5. 吴妈说中国国情特殊,资源很难大家都满意,道歉没有更早知道资源的问题。说是中国最激进寻求算力的ceo,Qwen是第一优先级&尽了中国CEO最大的努力了。 6. 关于资源被集团卡脖子,吴妈说不知道被卡,心里一直优先级是最高的,问题是信息传递流程的问题 7. jingren说一直资源紧张,在做整体规划,然后说自己也是被架空的。然后说内部阿里云不好用是历史原因 8. 然后下面问junyang能不能回来,首席hr说:不能推上神坛&公司不能接受非理性的要求不计代价来挽留,并问台下那大家觉得自己是什么代价呢
中文
232
160
1.1K
1.3M
Zengzhi Wang
Zengzhi Wang@SinclairWang1·
Launch something big
English
1
0
2
741
Yulun Du
Yulun Du@Yulun_Du·
The gap between developers using AI coding agents (Codex, Claude Code, Kimi CLI) and those who aren't is indeed widening fast. Don't sleep on this and try Kimi CLI (as well as others) now. :)
Greg Brockman@gdb

Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.

English
0
0
9
1.2K