Victoria X Lin

1.4K posts

Victoria X Lin banner
Victoria X Lin

Victoria X Lin

@VictoriaLinML

MTS @thinkymachines | Native Multimodal Intelligence Prev: @AIatMeta @SFResearch • PhD @uwcse

San Francisco Bay Area Katılım Aralık 2010
1K Takip Edilen4K Takipçiler
Sabitlenmiş Tweet
Victoria X Lin
Victoria X Lin@VictoriaLinML·
✨We are showing some experiments with interaction models @ThinkyMachines: models that could see and hear continuously while processing tasks in the background and generating responses in real-time. Interaction models offer a glimpse into a future where people collaborate with AI the same way we do with other people. Read our announcement post to explore the capabilities this model unlocks.
Thinking Machines@thinkymachines

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English
1
4
42
5.6K
Victoria X Lin retweetledi
Steven Feng
Steven Feng@stevenyfeng·
We’re bringing back Stanford’s CS25 Transformers course tomorrow! 🤖 It’s open to everyone (in-person + online). Weekly talks (every Thursday) from top AI researchers. One of Stanford’s most popular AI seminar courses. Don’t miss out! More info below 👇 (1/7)
Steven Feng tweet media
English
10
93
645
58.2K
Victoria X Lin retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
We are offering grants of $100,000 + Tinker credits to researchers advancing the field of human-AI interactivity. Submit your proposals by June 19th! thinkingmachines.ai/news/interacti…
English
48
192
1.6K
564.3K
Victoria X Lin retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…
English
459
1.9K
15.6K
7.6M
Victoria X Lin retweetledi
Alexander Kirillov
Alexander Kirillov@_alex_kirillov_·
Working on the interaction models is a lot of fun at TML! I can't imagine doing that in a turn-based world. Building it from scratch makes a lot of things so much easier. I am very excited about the future of natively multi-modal, multi-stream, multi-task models.
English
4
7
178
21.4K
Victoria X Lin
Victoria X Lin@VictoriaLinML·
Could @Waymo roll out a “working pod” series where you can comfortably set up your laptop and get work done during the ride?
English
1
0
2
1.6K
Victoria X Lin retweetledi
Yuandong Tian
Yuandong Tian@tydsh·
My solo paper is accepted in ICLR'26 in Brazil. It discovers the training dynamics of grokking behaviors (phase transition memorization -> generalization) in basic settings, and derives provable scaling laws that enables such dynamics to happen. Unfortunately I won't be able to come to Brazil and present. Here is the poster I made, if people are interested to check: yuandong-tian.com/posters/poster… Will mention that paper in the upcoming invited workshop talks as well. Enjoy~
Yuandong Tian@tydsh

🚨New work: Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking (arxiv.org/abs/2509.21519) In this work we propose a mathematical framework, named Li2, that explains the dynamics of grokking (i.e., delayed generalization) in 2-layer nonlinear networks. Specifically, it 1️⃣ Tells exactly what features will emerge during training. 2️⃣ Gives provable scaling laws of generalization/memorization, i.e. O(M log M) data samples suffice for generalization behavior of group arithmetic task of order M group. 3️⃣ Provides a more fundamental explanation for the popular empirical hypothesis that "generalization circuits learn slower but is more efficient than memorization circuits". So how?

English
6
23
417
65.1K
weiyaow
weiyaow@weiyaow1·
After 8 years at Meta (FAIR/MSL) working on multi-modal perception and generations — Gradient-Blending, UVO, SAM3D — I've joined @thinkymachines this week to keep working on multi-modal. Excited for what's ahead.
English
22
12
337
37.7K
Yu Su
Yu Su@ysu_nlp·
Introducing @NeoCognition, the agent lab for specialized intelligence. Everyone needs experts, but human expertise does not scale. Backed by $40M seed funding, we build self-learning agents that specialize across domains to make expertise abundant.
English
92
135
875
182.8K
Victoria X Lin retweetledi
Akari Asai
Akari Asai@AkariAsai·
Not many PhD students know about compute grants, but they can make a huge difference. During my PhD, I got access to Stability AI's HPC cluster through a small proposal and used it for Self-RAG training. Great practical post by @_emliu!
Emmy Liu@_emliu

wrote a guide on getting compute grants as a student, something I wish I did more at the beginning of my PhD. It's honestly one of the highest ROI things you can do as a student (we've gotten 100k+ gpu hrs for roughly 2 weeks of work writing). nightingal3.github.io/blog/2026/04/1…

English
5
32
441
82.5K
Yuandong Tian
Yuandong Tian@tydsh·
Our work on post-training models for parallel thinking (ThreadWeaver) is now open sourced! Our Data Gen/SFT/RL recipes are now fully open😀. The idea is to1️⃣rewrite the sequential thinking traces to be parallel with LLMs,2️⃣design efficient kernels for training/inference and3️⃣smartly design the reward signal for RL. Thanks @LongTonyLian and @VictoriaLinML for the great work!
Long Lian@LongTonyLian

Our parallel reasoning project ThreadWeaver is now open-sourced 🎉! Check out our Data Gen/SFT/RL recipe at github.com/facebookresear… In case you don't know, ThreadWeaver 🧵⚡️ is the first parallel reasoning method to achieve comparable reasoning performance to widely-used sequential long-CoT LLMs, with up to 3x speedup across 6 challenging tasks.

English
3
24
244
32.2K
Victoria X Lin retweetledi
Long Lian
Long Lian@LongTonyLian·
Our parallel reasoning project ThreadWeaver is now open-sourced 🎉! Check out our Data Gen/SFT/RL recipe at github.com/facebookresear… In case you don't know, ThreadWeaver 🧵⚡️ is the first parallel reasoning method to achieve comparable reasoning performance to widely-used sequential long-CoT LLMs, with up to 3x speedup across 6 challenging tasks.
AK@_akhaliq

ThreadWeaver Adaptive Threading for Efficient Parallel Reasoning in Language Models

English
0
23
127
55.7K
Victoria X Lin retweetledi
Mira Murati
Mira Murati@miramurati·
Grateful to Jensen and @nvidia team for their support. Together, we’re working to deploy at least 1GW of Vera Rubin systems, bringing adaptable collaborative AI to everyone. thinkingmachines.ai/nvidia-partner…
Mira Murati tweet media
English
168
284
3.9K
558.7K
Victoria X Lin
Victoria X Lin@VictoriaLinML·
☕ Society will reward tremendously those who can effortlessly spot mistakes made by autonomous agents.
English
1
4
28
3.7K
Victoria X Lin retweetledi
Tri Dao
Tri Dao@tri_dao·
This was a wild bug hunt, weeks of effort from @MayankMish98 to track down. The wrong init of Mamba2 in many reimplementations causes the layer to decay its states too quickly, focusing in short context instead. Pretraining is mostly about getting these little things right
Mayank Mishra@MayankMish98

We identified an issue with the Mamba-2 🐍 initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized). This bug is related to 2 main issues: 1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: github.com/fla-org/flash-…). 2. Skipping initialization due to meta device init for DTensors with FSDP-2 (github.com/fla-org/flash-… will fix this issue upon merging). The difference is substantial. Mamba-2 seems to be quite sensitive to the initialization. Check out our experiments at the 7B MoE scale: wandb.ai/mayank31398/ma… Special thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu 🙏 Also thanks to @SonglinYang4 for quickly helping in merging the PR.

English
2
20
374
32.3K
Victoria X Lin retweetledi
Boris Cherny
Boris Cherny@bcherny·
I'm Boris and I created Claude Code. I wanted to quickly share a few tips for using Claude Code, sourced directly from the Claude Code team. The way the team uses Claude is different than how I use it. Remember: there is no one right way to use Claude Code -- everyones' setup is different. You should experiment to see what works for you!
English
924
5.9K
50.9K
9.2M
Victoria X Lin retweetledi
Long Lian
Long Lian@LongTonyLian·
Love seeing parallel thinking & subagents pushing efficiency and performance on Kimi K2.5! 🚀 Also nice to see shared takeaways with our parallel reasoning work ThreadWeaver: 1️⃣ an auxiliary parallelization reward prevents collapse, and 2️⃣ the critical path is the key🔑
Long Lian tweet mediaLong Lian tweet media
Kimi.ai@Kimi_Moonshot

🥝 Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. 🔹 Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%) 🔹 Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%) 🔹 Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion. 🔹 Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup. - 🥝 K2.5 is now live on kimi.com in chat mode and agent mode. 🥝 K2.5 Agent Swarm in beta for high-tier users. 🥝 For production-grade coding, you can pair K2.5 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blogs/kimi-k2-… 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English
0
2
27
5.3K