Victoria X Lin

1.4K posts

Victoria X Lin banner
Victoria X Lin

Victoria X Lin

@VictoriaLinML

MTS @thinkymachines | MoMa/MoT๐Ÿ–ผ โ€ข RA-DIT๐Ÿ” โ€ข Llama4๐Ÿฆ™ Prev: @AIatMeta @SFResearch โ€ข PhD @uwcse

San Francisco Bay Area Sumali Aralฤฑk 2010
1K Sinusundan3.9K Mga Tagasunod
Victoria X Lin nag-retweet
Mira Murati
Mira Murati@miramuratiยท
Grateful to Jensen and @nvidia team for their support. Together, weโ€™re working to deploy at least 1GW of Vera Rubin systems, bringing adaptable collaborative AI to everyone. thinkingmachines.ai/nvidia-partnerโ€ฆ
Mira Murati tweet media
English
164
286
3.8K
533.2K
Victoria X Lin
Victoria X Lin@VictoriaLinMLยท
โ˜• Society will reward tremendously those who can effortlessly spot mistakes made by autonomous agents.
English
1
4
28
3.3K
Victoria X Lin nag-retweet
Tri Dao
Tri Dao@tri_daoยท
This was a wild bug hunt, weeks of effort from @MayankMish98 to track down. The wrong init of Mamba2 in many reimplementations causes the layer to decay its states too quickly, focusing in short context instead. Pretraining is mostly about getting these little things right
Mayank Mishra@MayankMish98

We identified an issue with the Mamba-2 ๐Ÿ initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized). This bug is related to 2 main issues: 1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: github.com/fla-org/flash-โ€ฆ). 2. Skipping initialization due to meta device init for DTensors with FSDP-2 (github.com/fla-org/flash-โ€ฆ will fix this issue upon merging). The difference is substantial. Mamba-2 seems to be quite sensitive to the initialization. Check out our experiments at the 7B MoE scale: wandb.ai/mayank31398/maโ€ฆ Special thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu ๐Ÿ™ Also thanks to @SonglinYang4 for quickly helping in merging the PR.

English
2
20
374
31.5K
Victoria X Lin nag-retweet
Boris Cherny
Boris Cherny@bchernyยท
I'm Boris and I created Claude Code. I wanted to quickly share a few tips for using Claude Code, sourced directly from the Claude Code team. The way the team uses Claude is different than how I use it. Remember: there is no one right way to use Claude Code -- everyones' setup is different. You should experiment to see what works for you!
English
916
5.9K
50.9K
9.1M
Victoria X Lin nag-retweet
Long Lian
Long Lian@LongTonyLianยท
Love seeing parallel thinking & subagents pushing efficiency and performance on Kimi K2.5! ๐Ÿš€ Also nice to see shared takeaways with our parallel reasoning work ThreadWeaver: 1๏ธโƒฃ an auxiliary parallelization reward prevents collapse, and 2๏ธโƒฃ the critical path is the key๐Ÿ”‘
Long Lian tweet mediaLong Lian tweet media
Kimi.ai@Kimi_Moonshot

๐Ÿฅ Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. ๐Ÿ”น Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%) ๐Ÿ”น Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%) ๐Ÿ”น Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion. ๐Ÿ”น Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5ร— faster compared with single-agent setup. - ๐Ÿฅ K2.5 is now live on kimi.com in chat mode and agent mode. ๐Ÿฅ K2.5 Agent Swarm in beta for high-tier users. ๐Ÿฅ For production-grade coding, you can pair K2.5 with Kimi Code: kimi.com/code - ๐Ÿ”— API: platform.moonshot.ai ๐Ÿ”— Tech blog: kimi.com/blogs/kimi-k2-โ€ฆ ๐Ÿ”— Weights & code: huggingface.co/moonshotai/Kimโ€ฆ

English
0
2
27
5.1K
Victoria X Lin nag-retweet
Victoria X Lin nag-retweet
Lilian Weng
Lilian Weng@lilianwengยท
Iโ€™ve been telling people this a lot today: I enjoy so much working with people who care about what they are building and craftsmanship. It is a privilege to have a chance to work on something Iโ€™m passionate about, beyond making a living. I cherish it and donโ€™t take it for granted.
English
63
63
1.6K
167.8K
Victoria X Lin nag-retweet
Astropics
Astropics@astropicsยท
Tonight is the first full Moon of the year, the Wolf Moon
Astropics tweet media
English
94
5K
22.6K
348.1K
Victoria X Lin nag-retweet
Boris Cherny
Boris Cherny@bchernyยท
I'm Boris and I created Claude Code. Lots of people have asked how I use Claude Code, so I wanted to show off my setup a bit. My setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. There is no one correct way to use Claude Code: we intentionally build it in a way that you can use it, customize it, and hack it however you like. Each person on the Claude Code team uses it very differently. So, here goes.
English
1.3K
7K
54.2K
8M
Victoria X Lin
Victoria X Lin@VictoriaLinMLยท
The LLM needs to plan when and how to spawn the threads (the section in the parallel CoT); optionally, the LLM may also generate a summary when merging different threads together. The outlines and thread summary will lead to token overheads. In our paper, the LLM only generates outlines.
English
0
0
0
97
kaolin fire
kaolin fire@kaolinfireยท
@professor_b_ At the very least you have to have the extra qps available. But am also curious. Any token overhead?
English
1
0
0
98
Victoria X Lin
Victoria X Lin@VictoriaLinMLยท
โœจ Introducing ThreadWeaver ๐Ÿงตโšก โ€” an approach that significantly reduces LLM reasoning latency on challenging problems by enabling models to adaptively spawn parallel reasoning threads and merge them later in the process. (An off-the-shelf reasoning LLM can be retrofitted to perform adaptive parallel reasoning with this approach, too!) ThreadWeaver was led by the amazing @LongTonyLian. It was developed based on the paradigm of adaptive parallel reasoning (arxiv.org/abs/2504.15466). For the first time, we show that adaptive parallel reasoning can achieve accuracy comparable to equally sized cutting-edge sequential reasoning models (e.g., 79.9% for ThreadWeaver vs. 78.3% for Qwen3-8B on AIME24, and 71.9% vs. 72.2% on average across six math reasoning benchmarks) while delivering substantial reductions in token latency (1.14ร— speedup on AIME24 and up to 1.53ร— across datasets). ThreadWeaver was designed to be fully compatible with standard LLM inference engines that support text-completion APIs (i.e., it does not introduce architectural changes or modify context representation to support adaptive parallelization). The approach can be directly applied to retrofit off-the-shelf reasoning LLMs through supervised fine-tuning (inspired by Multiverse) and a novel on-policy reinforcement learning method incorporating thread-wise advantage broadcast and a parallelization-aware reward design. ThreadWeaver opens the door to a future where models can understand both the problem structure and execution environment well enough to adaptively leverage available resources in the most efficient way to solve complex tasks. This is also the last research project I contributed to during my time at Meta ๐Ÿ‘ฉโ€๐Ÿ’ป For more details, be sure to check out @LongTonyLian's thread:
Long Lian@LongTonyLian

LLMs are getting crazily good at reasoning โ€” but also crazily slow. Hard problems can make them think for hours. Why? Even with tons of GPUs, they still decode one. token. at. a. time.โณ More GPUs โ‰  faster answers Our ThreadWeaver๐Ÿงตโšกasks: โ€œWhy not make LLMs think in parallel?โ€ ๐Ÿงต1/N๐Ÿ‘‡

English
2
6
26
9.6K
Sarah Wooders
Sarah Wooders@sarahwoodersยท
The real AI danger is Waymos blocking all the intersections
English
1
0
2
668
Victoria X Lin nag-retweet
Ying Sheng
Ying Sheng@ying11231ยท
We've been running @radixark for a few months, started by many core developers in SGLang @lmsysorg and its extended ecosystem (slime @slime_framework , AReaL @jxwuyi). I left @xai in August โ€” a place where I built deep emotions and countless beautiful memories. It was the best place Iโ€™ve ever worked, the place I watched grow from a few dozen people to hundreds, and it truly felt like home. What pushed me to make such a hard decision is the momentum of building SGLang open source and the mission of creating an ambitious future, within an open spirit that I learnt from my first job at @databricks after my PhD. We started SGLang in the summer of 2023 and made it public in January 2024. Over the past 2 years, hundreds of people have made great efforts to get to where they are today. We experienced several waves of growth after its first release. I still remember the many dark nights in the summer of 2024, I spent with @lm_zheng , @lsyincs , and @zhyncs42 debugging, while @ispobaoke single-handedly took on DeepSeek inference optimizations, seeing @GenAI_is_real and the community strike team tag-teaming on-call shifts non-stop. There are so many more who have joined that I'm out of space to call out, but they're recorded on the GitHub contributor list forever. The demands grow exponentially, and we have been pushed to make it a dedicated effort supported by RadixArk. Itโ€™s the step-by-step journey of a thousand miles that has carried us here today, and the same relentless Long March that will lead us into the tens of thousands of miles yet to come. The story never stops growing. Over the past year, weโ€™ve seen something very clear: The world is full of people eager to build AI, but the infrastructure that makes it possible is not shared. The most advanced inference and training stacks live inside a few companies. Everyone else is forced to rebuild the same schedulers, compilers, serving engines, and training pipelines again and again โ€” often under enormous pressure, with lots of duplicated effort and wasted insight. RadixArk was born to change that. Today, weโ€™re building an infrastructure-first, deep-tech company with a simple and ambitious mission: "Make frontier-level AI infrastructure open and accessible to everyone." If the two values below resonate with you, come talk to us: (1) Engineering as an art. Infrastructure is a first-class citizen in RadixArk. We care about elegant design and code that lasts. Beneath every line of code lies the soul of the engineer who wrote it. (2) A belief in openness. We share what we build. We bet on long-term compounding through community, contribution, and giving more than we take. A product is defined by its users, yet it truly comes alive the moment functionality transcends mere utility and begins to embody aesthetics. Thanks to all the miles (the name of our first released RL framework; see below). radixark.ai
English
112
128
1.1K
538.5K
Victoria X Lin nag-retweet
Azalia Mirhoseini
Azalia Mirhoseini@Azaliamirhยท
Thrilled to share that @annadgoldie and I are launching @RicursiveAI, a frontier lab enabling recursive self-improvement through AIs that design their own chips. Our vision for transforming chip design began with AlphaChip, an AI for layout optimization used to design four generations of TPUs, data center CPUs, and smartphones. AlphaChip offered a glimpse into a future where AI designs the silicon that fuels it. Ricursive extends this vision to the entire chip stack, building AI that architects, verifies, and implements silicon, enabling models and chips to co-evolve in a tight loop. We sat down with WSJโ€™s @berber_jin1 to discuss Ricursive: wsj.com/tech/this-ai-sโ€ฆ
Ricursive Intelligence@RicursiveAI

Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com

English
125
137
1.5K
225.8K
Victoria X Lin nag-retweet
Liwei Jiang
Liwei Jiang@liweijianglwยท
Super happy to receive the Best Paper Award at #NeurIPS2025 for our Artificial Hivemind paper!! (Really enjoyed giving oral talk at NeurIPS as well!)
Liwei Jiang tweet mediaLiwei Jiang tweet media
Liwei Jiang@liweijianglw

โš ๏ธDifferent models. Same thoughts.โš ๏ธ Todayโ€™s AI models converge into an ๐€๐ซ๐ญ๐ข๐Ÿ๐ข๐œ๐ข๐š๐ฅ ๐‡๐ข๐ฏ๐ž๐ฆ๐ข๐ง๐ ๐Ÿ, a striking case of mode collapse that persists even across heterogeneous ensembles. Our #neurips2025 ๐ƒ&๐ ๐Ž๐ซ๐š๐ฅ ๐ฉ๐š๐ฉ๐ž๐ซ (โœจ๐ญ๐จ๐ฉ ๐ŸŽ.๐Ÿ‘๐Ÿ“%โœจ) dives deep into this phenomenon, introducing ๐ˆ๐ง๐Ÿ๐ข๐ง๐ข๐ญ๐ฒ-๐‚๐ก๐š๐ญ, a real-world dataset of 26K real-world open-ended user queries spanning 17 open-ended categories + 31K dense human annotations (๐Ÿ๐Ÿ“ ๐ข๐ง๐๐ž๐ฉ๐ž๐ง๐๐ž๐ง๐ญ ๐š๐ง๐ง๐จ๐ญ๐š๐ญ๐จ๐ซ๐ฌ ๐ฉ๐ž๐ซ ๐ž๐ฑ๐š๐ฆ๐ฉ๐ฅ๐ž) to push AIโ€™s creative and discovery potential forward. Now you can build your favorite models to be truly original, diverse, and impactful in the open-ended real world. ๐Ÿ“Paper: arxiv.org/abs/2510.22954 ๐Ÿ“Data: huggingface.co/collections/liโ€ฆ We also systematically reveal Artificial Hivemind across: ๐Ÿ’ฅ ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ฏ๐ž ๐š๐›๐ข๐ฅ๐ข๐ญ๐ข๐ž๐ฌ: not only do individual LLMs repeat themselves, but different models produce strikingly similar content, even when asked fully open-ended questions. ๐Ÿ’ฅ ๐ƒ๐ข๐ฌ๐œ๐ซ๐ข๐ฆ๐ข๐ง๐š๐ญ๐ข๐ฏ๐ž ๐š๐›๐ข๐ฅ๐ข๐ญ๐ข๐ž๐ฌ: LLMs, LM judges, and reward models are systematically miscalibrated when rating alternative responses to open-ended queries. (1/N)

English
37
69
781
80.1K
Victoria X Lin nag-retweet
Long Lian
Long Lian@LongTonyLianยท
Reinforcement learning is the key to test-time scaling. However, standard GRPO does not apply to parallel rollouts. We propose Parallel-aware GRPO (P-GRPO), which: ๐Ÿ“ก Broadcasts advantages across all threads ๐Ÿง˜ Removes variance normalization (big win for stability; see Table 5 in the paper) โฑ๏ธ Adds a parallelization-aware reward that encourages speed only when the answer is correct This is what teaches the model when and how to parallelize.
Long Lian tweet media
English
1
3
6
2.3K
Victoria X Lin nag-retweet
Long Lian
Long Lian@LongTonyLianยท
Best part? You donโ€™t need to patch your inference engine for parallel reasoning ๐Ÿ™…โ€โ™‚๏ธ๐Ÿ› ๏ธ. ThreadWeaver runs on standard AR inference engine. Only a lightweight state machine on top of any LLM server that supports text completion APIs (e.g., vLLM/SGLang without any patching). The state machine directly implements the Fork-Join reasoning paradigm and is also very simple: 1๏ธโƒฃ Decode until 2๏ธโƒฃ Spawn N async completions 3๏ธโƒฃ Stop each at 4๏ธโƒฃ Join + continue Zero model hacks. Zero KV cache mods. Max compatibility with existing inference optimizations โœจ.
Long Lian tweet media
English
1
4
3
2.2K
Victoria X Lin nag-retweet
Xinyu Yang
Xinyu Yang@Xinyu2MLยท
Really exciting to see new, high-quality work emerging in parallel reasoning. Itโ€™s also great to see some of the ideas from our Multiverse framework being adopted in this line of research. It is really interesting to see that parallel reasoning can be achieved without structural modification, which indicating a strong robustness among different attention masks. Also see improved performance and speedup on the challenging benchmarks. With the recent progress, it feels like parallel reasoning is transitioning from pure SFT into the RL era as demonstrated by both ThreadWeaver and Parallel-R1. . And now weโ€™re even seeing new work pushing parallel reasoning into agentic tasks as well. Given all this momentum, it feels like the right moment to start thinking seriously about building efficient systems for these dynamic workflows. We might have something new coming this month that could be useful for anyone exploring this direction.
Long Lian@LongTonyLian

LLMs are getting crazily good at reasoning โ€” but also crazily slow. Hard problems can make them think for hours. Why? Even with tons of GPUs, they still decode one. token. at. a. time.โณ More GPUs โ‰  faster answers Our ThreadWeaver๐Ÿงตโšกasks: โ€œWhy not make LLMs think in parallel?โ€ ๐Ÿงต1/N๐Ÿ‘‡

English
0
3
17
3.5K