Victoria X Lin

1.4K posts

Victoria X Lin banner
Victoria X Lin

Victoria X Lin

@VictoriaLinML

MTS @thinkymachines | MoMa/MoT🖼 • RA-DIT🔍 • Llama4🦙 Prev: @AIatMeta @SFResearch • PhD @uwcse

San Francisco Bay Area 가입일 Aralık 2010
1K 팔로잉3.9K 팔로워
Victoria X Lin 리트윗함
Mira Murati
Mira Murati@miramurati·
Grateful to Jensen and @nvidia team for their support. Together, we’re working to deploy at least 1GW of Vera Rubin systems, bringing adaptable collaborative AI to everyone. thinkingmachines.ai/nvidia-partner…
Mira Murati tweet media
English
164
287
3.8K
532.9K
Victoria X Lin
Victoria X Lin@VictoriaLinML·
☕ Society will reward tremendously those who can effortlessly spot mistakes made by autonomous agents.
English
1
4
28
3.3K
Victoria X Lin 리트윗함
Tri Dao
Tri Dao@tri_dao·
This was a wild bug hunt, weeks of effort from @MayankMish98 to track down. The wrong init of Mamba2 in many reimplementations causes the layer to decay its states too quickly, focusing in short context instead. Pretraining is mostly about getting these little things right
Mayank Mishra@MayankMish98

We identified an issue with the Mamba-2 🐍 initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized). This bug is related to 2 main issues: 1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: github.com/fla-org/flash-…). 2. Skipping initialization due to meta device init for DTensors with FSDP-2 (github.com/fla-org/flash-… will fix this issue upon merging). The difference is substantial. Mamba-2 seems to be quite sensitive to the initialization. Check out our experiments at the 7B MoE scale: wandb.ai/mayank31398/ma… Special thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu 🙏 Also thanks to @SonglinYang4 for quickly helping in merging the PR.

English
2
20
374
31.5K
Victoria X Lin 리트윗함
Boris Cherny
Boris Cherny@bcherny·
I'm Boris and I created Claude Code. I wanted to quickly share a few tips for using Claude Code, sourced directly from the Claude Code team. The way the team uses Claude is different than how I use it. Remember: there is no one right way to use Claude Code -- everyones' setup is different. You should experiment to see what works for you!
English
916
5.9K
50.9K
9.1M
Victoria X Lin 리트윗함
Long Lian
Long Lian@LongTonyLian·
Love seeing parallel thinking & subagents pushing efficiency and performance on Kimi K2.5! 🚀 Also nice to see shared takeaways with our parallel reasoning work ThreadWeaver: 1️⃣ an auxiliary parallelization reward prevents collapse, and 2️⃣ the critical path is the key🔑
Long Lian tweet mediaLong Lian tweet media
Kimi.ai@Kimi_Moonshot

🥝 Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. 🔹 Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%) 🔹 Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%) 🔹 Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion. 🔹 Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup. - 🥝 K2.5 is now live on kimi.com in chat mode and agent mode. 🥝 K2.5 Agent Swarm in beta for high-tier users. 🥝 For production-grade coding, you can pair K2.5 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blogs/kimi-k2-… 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English
0
2
27
5.1K
Victoria X Lin 리트윗함
Victoria X Lin 리트윗함
Lilian Weng
Lilian Weng@lilianweng·
I’ve been telling people this a lot today: I enjoy so much working with people who care about what they are building and craftsmanship. It is a privilege to have a chance to work on something I’m passionate about, beyond making a living. I cherish it and don’t take it for granted.
English
63
63
1.6K
167.7K
Victoria X Lin 리트윗함
Astropics
Astropics@astropics·
Tonight is the first full Moon of the year, the Wolf Moon
Astropics tweet media
English
94
5K
22.6K
348K
Victoria X Lin 리트윗함
Boris Cherny
Boris Cherny@bcherny·
I'm Boris and I created Claude Code. Lots of people have asked how I use Claude Code, so I wanted to show off my setup a bit. My setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. There is no one correct way to use Claude Code: we intentionally build it in a way that you can use it, customize it, and hack it however you like. Each person on the Claude Code team uses it very differently. So, here goes.
English
1.3K
7K
54.2K
8M
Victoria X Lin
Victoria X Lin@VictoriaLinML·
The LLM needs to plan when and how to spawn the threads (the section in the parallel CoT); optionally, the LLM may also generate a summary when merging different threads together. The outlines and thread summary will lead to token overheads. In our paper, the LLM only generates outlines.
English
0
0
0
97
kaolin fire
kaolin fire@kaolinfire·
@professor_b_ At the very least you have to have the extra qps available. But am also curious. Any token overhead?
English
1
0
0
98
Victoria X Lin
Victoria X Lin@VictoriaLinML·
✨ Introducing ThreadWeaver 🧵⚡ — an approach that significantly reduces LLM reasoning latency on challenging problems by enabling models to adaptively spawn parallel reasoning threads and merge them later in the process. (An off-the-shelf reasoning LLM can be retrofitted to perform adaptive parallel reasoning with this approach, too!) ThreadWeaver was led by the amazing @LongTonyLian. It was developed based on the paradigm of adaptive parallel reasoning (arxiv.org/abs/2504.15466). For the first time, we show that adaptive parallel reasoning can achieve accuracy comparable to equally sized cutting-edge sequential reasoning models (e.g., 79.9% for ThreadWeaver vs. 78.3% for Qwen3-8B on AIME24, and 71.9% vs. 72.2% on average across six math reasoning benchmarks) while delivering substantial reductions in token latency (1.14× speedup on AIME24 and up to 1.53× across datasets). ThreadWeaver was designed to be fully compatible with standard LLM inference engines that support text-completion APIs (i.e., it does not introduce architectural changes or modify context representation to support adaptive parallelization). The approach can be directly applied to retrofit off-the-shelf reasoning LLMs through supervised fine-tuning (inspired by Multiverse) and a novel on-policy reinforcement learning method incorporating thread-wise advantage broadcast and a parallelization-aware reward design. ThreadWeaver opens the door to a future where models can understand both the problem structure and execution environment well enough to adaptively leverage available resources in the most efficient way to solve complex tasks. This is also the last research project I contributed to during my time at Meta 👩‍💻 For more details, be sure to check out @LongTonyLian's thread:
Long Lian@LongTonyLian

LLMs are getting crazily good at reasoning — but also crazily slow. Hard problems can make them think for hours. Why? Even with tons of GPUs, they still decode one. token. at. a. time.⏳ More GPUs ≠ faster answers Our ThreadWeaver🧵⚡asks: “Why not make LLMs think in parallel?” 🧵1/N👇

English
2
6
26
9.6K
Sarah Wooders
Sarah Wooders@sarahwooders·
The real AI danger is Waymos blocking all the intersections
English
1
0
2
668
Victoria X Lin 리트윗함
Thinking Machines
Thinking Machines@thinkymachines·
Tinker is now generally available. We also added support for advanced vision input models, Kimi K2 Thinking, and a simpler way to sample from models. thinkingmachines.ai/blog/tinker-ge…
English
48
173
1.7K
1.1M
Victoria X Lin 리트윗함
Ying Sheng
Ying Sheng@ying11231·
We've been running @radixark for a few months, started by many core developers in SGLang @lmsysorg and its extended ecosystem (slime @slime_framework , AReaL @jxwuyi). I left @xai in August — a place where I built deep emotions and countless beautiful memories. It was the best place I’ve ever worked, the place I watched grow from a few dozen people to hundreds, and it truly felt like home. What pushed me to make such a hard decision is the momentum of building SGLang open source and the mission of creating an ambitious future, within an open spirit that I learnt from my first job at @databricks after my PhD. We started SGLang in the summer of 2023 and made it public in January 2024. Over the past 2 years, hundreds of people have made great efforts to get to where they are today. We experienced several waves of growth after its first release. I still remember the many dark nights in the summer of 2024, I spent with @lm_zheng , @lsyincs , and @zhyncs42 debugging, while @ispobaoke single-handedly took on DeepSeek inference optimizations, seeing @GenAI_is_real and the community strike team tag-teaming on-call shifts non-stop. There are so many more who have joined that I'm out of space to call out, but they're recorded on the GitHub contributor list forever. The demands grow exponentially, and we have been pushed to make it a dedicated effort supported by RadixArk. It’s the step-by-step journey of a thousand miles that has carried us here today, and the same relentless Long March that will lead us into the tens of thousands of miles yet to come. The story never stops growing. Over the past year, we’ve seen something very clear: The world is full of people eager to build AI, but the infrastructure that makes it possible is not shared. The most advanced inference and training stacks live inside a few companies. Everyone else is forced to rebuild the same schedulers, compilers, serving engines, and training pipelines again and again — often under enormous pressure, with lots of duplicated effort and wasted insight. RadixArk was born to change that. Today, we’re building an infrastructure-first, deep-tech company with a simple and ambitious mission: "Make frontier-level AI infrastructure open and accessible to everyone." If the two values below resonate with you, come talk to us: (1) Engineering as an art. Infrastructure is a first-class citizen in RadixArk. We care about elegant design and code that lasts. Beneath every line of code lies the soul of the engineer who wrote it. (2) A belief in openness. We share what we build. We bet on long-term compounding through community, contribution, and giving more than we take. A product is defined by its users, yet it truly comes alive the moment functionality transcends mere utility and begins to embody aesthetics. Thanks to all the miles (the name of our first released RL framework; see below). radixark.ai
English
112
128
1.1K
538.5K
Victoria X Lin 리트윗함
Azalia Mirhoseini
Azalia Mirhoseini@Azaliamirh·
Thrilled to share that @annadgoldie and I are launching @RicursiveAI, a frontier lab enabling recursive self-improvement through AIs that design their own chips. Our vision for transforming chip design began with AlphaChip, an AI for layout optimization used to design four generations of TPUs, data center CPUs, and smartphones. AlphaChip offered a glimpse into a future where AI designs the silicon that fuels it. Ricursive extends this vision to the entire chip stack, building AI that architects, verifies, and implements silicon, enabling models and chips to co-evolve in a tight loop. We sat down with WSJ’s @berber_jin1 to discuss Ricursive: wsj.com/tech/this-ai-s…
Ricursive Intelligence@RicursiveAI

Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com

English
125
137
1.5K
225.8K
Victoria X Lin 리트윗함
Liwei Jiang
Liwei Jiang@liweijianglw·
Super happy to receive the Best Paper Award at #NeurIPS2025 for our Artificial Hivemind paper!! (Really enjoyed giving oral talk at NeurIPS as well!)
Liwei Jiang tweet mediaLiwei Jiang tweet media
Liwei Jiang@liweijianglw

⚠️Different models. Same thoughts.⚠️ Today’s AI models converge into an 𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥 𝐇𝐢𝐯𝐞𝐦𝐢𝐧𝐝 🐝, a striking case of mode collapse that persists even across heterogeneous ensembles. Our #neurips2025 𝐃&𝐁 𝐎𝐫𝐚𝐥 𝐩𝐚𝐩𝐞𝐫 (✨𝐭𝐨𝐩 𝟎.𝟑𝟓%✨) dives deep into this phenomenon, introducing 𝐈𝐧𝐟𝐢𝐧𝐢𝐭𝐲-𝐂𝐡𝐚𝐭, a real-world dataset of 26K real-world open-ended user queries spanning 17 open-ended categories + 31K dense human annotations (𝟐𝟓 𝐢𝐧𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭 𝐚𝐧𝐧𝐨𝐭𝐚𝐭𝐨𝐫𝐬 𝐩𝐞𝐫 𝐞𝐱𝐚𝐦𝐩𝐥𝐞) to push AI’s creative and discovery potential forward. Now you can build your favorite models to be truly original, diverse, and impactful in the open-ended real world. 📍Paper: arxiv.org/abs/2510.22954 📍Data: huggingface.co/collections/li… We also systematically reveal Artificial Hivemind across: 💥 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: not only do individual LLMs repeat themselves, but different models produce strikingly similar content, even when asked fully open-ended questions. 💥 𝐃𝐢𝐬𝐜𝐫𝐢𝐦𝐢𝐧𝐚𝐭𝐢𝐯𝐞 𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬: LLMs, LM judges, and reward models are systematically miscalibrated when rating alternative responses to open-ended queries. (1/N)

English
37
69
781
80.1K
Victoria X Lin 리트윗함
Long Lian
Long Lian@LongTonyLian·
Reinforcement learning is the key to test-time scaling. However, standard GRPO does not apply to parallel rollouts. We propose Parallel-aware GRPO (P-GRPO), which: 📡 Broadcasts advantages across all threads 🧘 Removes variance normalization (big win for stability; see Table 5 in the paper) ⏱️ Adds a parallelization-aware reward that encourages speed only when the answer is correct This is what teaches the model when and how to parallelize.
Long Lian tweet media
English
1
3
6
2.3K
Victoria X Lin 리트윗함
Long Lian
Long Lian@LongTonyLian·
Best part? You don’t need to patch your inference engine for parallel reasoning 🙅‍♂️🛠️. ThreadWeaver runs on standard AR inference engine. Only a lightweight state machine on top of any LLM server that supports text completion APIs (e.g., vLLM/SGLang without any patching). The state machine directly implements the Fork-Join reasoning paradigm and is also very simple: 1️⃣ Decode until 2️⃣ Spawn N async completions 3️⃣ Stop each at 4️⃣ Join + continue Zero model hacks. Zero KV cache mods. Max compatibility with existing inference optimizations ✨.
Long Lian tweet media
English
1
4
3
2.2K
Victoria X Lin 리트윗함
Xinyu Yang
Xinyu Yang@Xinyu2ML·
Really exciting to see new, high-quality work emerging in parallel reasoning. It’s also great to see some of the ideas from our Multiverse framework being adopted in this line of research. It is really interesting to see that parallel reasoning can be achieved without structural modification, which indicating a strong robustness among different attention masks. Also see improved performance and speedup on the challenging benchmarks. With the recent progress, it feels like parallel reasoning is transitioning from pure SFT into the RL era as demonstrated by both ThreadWeaver and Parallel-R1. . And now we’re even seeing new work pushing parallel reasoning into agentic tasks as well. Given all this momentum, it feels like the right moment to start thinking seriously about building efficient systems for these dynamic workflows. We might have something new coming this month that could be useful for anyone exploring this direction.
Long Lian@LongTonyLian

LLMs are getting crazily good at reasoning — but also crazily slow. Hard problems can make them think for hours. Why? Even with tons of GPUs, they still decode one. token. at. a. time.⏳ More GPUs ≠ faster answers Our ThreadWeaver🧵⚡asks: “Why not make LLMs think in parallel?” 🧵1/N👇

English
0
3
17
3.5K