Yuandong Tian

1.1K posts

Yuandong Tian

Yuandong Tian

@tydsh

co-founder of Stealth Startup. ex-Meta FAIR Director. ex-Google. Reasoning, Optimization and Understanding LLM. Novelist in spare time. PhD in @CMU_Robotics.

California, USA Katılım Aralık 2009
922 Takip Edilen41.3K Takipçiler
Yuandong Tian
Yuandong Tian@tydsh·
In GTC in person next week. Welcome to meet and chat!
English
5
2
79
10.3K
Hieu Pham
Hieu Pham@hyhieu226·
I have made the difficult decision to leave @OpenAI. Working here and at @xai before was a once-in-a-lifetime experience. I have met the best people. Not the best people in AI. Not the best people in tech. Simply the best people. At these companies, I have helped creating extremely intelligent entities that will meaningfully improve our lives. The work makes me proud. But the intensive work came with a price. I cannot believe I would say this one day, but I am burnt out. All the mental health deteriorating that I used to scoff at is real, miserable, scary, and dangerous. I am going to take a break from frontier AI labs, and will take my family to my home country Vietnam. There, I will try something new, and also search for a cure for my conditions. I hope I will heal. Until then.
English
1.1K
416
14.1K
1.2M
Yuandong Tian
Yuandong Tian@tydsh·
Instead of getting clawdbot/moltbot/openclaw to work by exposing it with *all* my API keys, why not using AI coding agents to write the tools myself.. it doesn't take long, I know what's going on, can be highly customized, and is way safer....
English
21
7
138
22.3K
Ilyass Moummad
Ilyass Moummad@MoummadIlyass·
@tydsh the preprint link for the feature emergence paper if it's out? 😄
English
1
0
0
328
Yuandong Tian
Yuandong Tian@tydsh·
ICLR + 8 (including one solo on understanding feature emergence). Thanks all the collaborators (plus some good luck)! But I guess nowadays papers don't matter anymore. We should do something bigger😃
English
8
6
263
26.1K
Yuandong Tian
Yuandong Tian@tydsh·
Got a request from a professor wanting to cite my Zhihu new year blogpost in his paper, about my theory of "Fermi Level" for human society due to AI impact. So I translate it, together with building a personal blogpost site. yuandong-tian.com/blog/posts/ It only takes a few hours to nail down all the details, and it is only one of the concurrent workstreams. AI coding agents are just incredible nowadays!
English
6
11
201
19.8K
Yuandong Tian
Yuandong Tian@tydsh·
btw, the paper is super popular in alphaXiv:
Yuandong Tian tweet media
English
0
2
18
3.2K
Yuandong Tian
Yuandong Tian@tydsh·
🚨New work: Layer-specific token embedding Simple idea: We replace the up-projection output in FFN with look-up table indexed by token index. 🎯 substantially reduce the GPU memory load in decoding phase 🎯 Better performance with fewer training FLOPs 🎯 Improve interpretability and facilitate knowledge editing. It all stems from a thinking spark "the residual added from each layer must encode different semantics of the tokens". Great work from @RJ_Sadhukhan, @zechunliu and @BeidiChen and many others!
Infini-AI-Lab@InfiniAILab

Lookup memories are having a moment 😄 The whale 🐋 #deepseek dropped engram… and we dropped up-projections from our FFNs…perfect timing 😅 🥳 Introducing STEM: Scaling Transformers with Embedding Modules 🌱 A scalable way to boost parametric memory with extra perks: ✅ Stable training even at extreme sparsity ✅ Better quality for fewer training FLOPs (knowledge + reasoning + long-context gains) ✅ Efficient inference: ~33% FFN params removed + CPU offload & async prefetch ✅ More interpretable → seamless knowledge editing 🔧🧠 Looking forward to DeepSeek v4… feels like we’ve only scratched the surface of embedding-lookup scaling 👀 📄Paper: arxiv.org/abs/2601.10639 🌐 Website: infini-ai-lab.github.io/STEM 🔗 GitHub: github.com/Infini-AI-Lab/…

English
6
34
236
24.7K
Yuandong Tian
Yuandong Tian@tydsh·
Great work from my former postdoc @arreqe_ai ! Graduated from UC Merced, postdoc in Meta FAIR and now a research scientist in FAIR working on cutting-edge safety alignment combined with game theory😀
Arman Zharmagambetov@arreqe_ai

🚨 New Paper: Safety Alignment of LMs via Non-cooperative Games 🚨 arxiv.org/abs/2512.20806 tl;dr: We train Attacker LM and Defender LM to play against each others. This leads to a Defender with much better utility-safety tradeoff, and an Attacker that is quite useful for downstream red-teaming tasks. With @AnselmPaulus, @uralik1, @brandondamos, Remi Munos, Ivan Evtimov and @kamalikac

English
1
3
29
9.4K
Yuandong Tian
Yuandong Tian@tydsh·
For research, "(3) Crushing SoTA" is often the most visible to the community, but "(1) working on interesting but useless thing" and "(2) insist on the right way to do things" are the underlying forces that can potentially lead to massive paradigm shift in the end.
Jing Yu Koh@kohjingyu

I've observed 3 types of ways that great AI researchers work: 1) Working on whatever they find interesting, even if it's "useless" Whether something will be publishable, fundable, or obviously impactful, is irrelevant to what these people work on. They simply choose something that feels interesting, weird, beautiful, or off in a way they can't ignore. For many of these people, "interestingness" is also often strong research intuition for an important problem that hasn't fully materialized yet, but their ideas often end up being meaningful during the process of exploration. The canonical example for this in physics is Richard Feynman who got intrigued by the way that plates wobbled. He followed this curiosity on something that seemed like a useless endeavor, and it ended up feeding into deeper physics (and eventually won him a Nobel prize): "It was effortless. It was easy to play with these things. It was like uncorking a bottle: Everything flowed out effortlessly. I almost tried to resist it! There was no importance to what I was doing, but ultimately there was." The AI version of this that I've observed before is when someone obsesses over a "minor" failure case, a weird training dynamic, a small theoretical mismatch, or just something that most people think is pointless to chase down. These threads end up becoming interesting and impactful more often than you'd expect. The risk is that one can spend a long time on a pointless rabbit hole, but I've observed that the best researchers often have a very good sense for when an idea is a dead end vs. whether it's promising given more effort. 2) Working on what they feel extremely strongly is the "right" way to do something These people have a clear picture of how the field *should* progress, and they're willing to work on unpopular things to prove their vision. They'll commit to something that others think is wrong, premature, or not worth it. An interesting quantitative way of measuring this is the citation graph of a paper. If you see a paper that has been around for many years but only started getting cited a lot more in recent years, that means that they were early (and right!). An obvious example is diffusion, the first paper of which was as early as 2015 (Sohl-Dickstein et al., 2015) but the ideas only started getting real traction in 2021 or later. The failure mode here is getting stuck defending a pet theory long after it's been falsified. And there's obviously many examples in our community of people who do a lot of goal post shifting or beat a dead horse for many decades. But when these ideas are legitimately undervalued, they result in paradigm shifts instead of incremental progress. 3) Crushing SOTA There's a type of researcher who isn't necessarily the most "philosophically original" or creative, but they are extremely effective at pushing a system to its limits. You can give these people a pre-existing task and benchmarks, check in on them in a month, and they will have crushed SOTA. Obviously this is not about benchmark hacking or short term wins. It's a real skill to take a combinatorial space of noisy research ideas and papers and conduct a rigorous search and ablation process. I've also found that this type of researcher has great intuition about the field: a sense for which ideas will scale, which tweaks are meaningful, good values for hyperparameters, and quickly figuring out which papers are worth paying attention to. ————— I think that these archetypes are all concrete expressions of good "research taste". (1) is a taste for interesting questions, (2) is a taste for long term worldviews, and (3) is a taste for careful execution and science. The best researchers I know often have a preference for operating in one of these modes, but frequently weave in and out of each depending on the stage of the project.

English
7
15
249
42.6K