

Victoria X Lin
1.4K posts

@VictoriaLinML
MTS @thinkymachines | MoMa/MoT๐ผ โข RA-DIT๐ โข Llama4๐ฆ Prev: @AIatMeta @SFResearch โข PhD @uwcse





We identified an issue with the Mamba-2 ๐ initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized). This bug is related to 2 main issues: 1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: github.com/fla-org/flash-โฆ). 2. Skipping initialization due to meta device init for DTensors with FSDP-2 (github.com/fla-org/flash-โฆ will fix this issue upon merging). The difference is substantial. Mamba-2 seems to be quite sensitive to the initialization. Check out our experiments at the 7B MoE scale: wandb.ai/mayank31398/maโฆ Special thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu ๐ Also thanks to @SonglinYang4 for quickly helping in merging the PR.




๐ฅ Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. ๐น Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%) ๐น Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%) ๐น Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion. ๐น Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5ร faster compared with single-agent setup. - ๐ฅ K2.5 is now live on kimi.com in chat mode and agent mode. ๐ฅ K2.5 Agent Swarm in beta for high-tier users. ๐ฅ For production-grade coding, you can pair K2.5 with Kimi Code: kimi.com/code - ๐ API: platform.moonshot.ai ๐ Tech blog: kimi.com/blogs/kimi-k2-โฆ ๐ Weights & code: huggingface.co/moonshotai/Kimโฆ

๐ InfiniAI Lab @ CMU is hiring Postdocs! We are looking for outstanding postdoctoral researchers in ML systems and security to join InfiniAI Lab at Carnegie Mellon University. Research directions include (but are not limited to): ๐ค AI Agents & RL ๐ Machine Learning Security ๐ฅ Video Models ๐๏ธ AI Systems & Architecture Design We especially encourage candidates interested in applying for the CMUโBosch Institute (CBI) Postdoctoral Fellowship, which provides strong support for independent, high-impact research: ๐ carnegiebosch.cmu.edu/fellowships/inโฆ ๐๏ธ CBI application deadline: January 30, 2026 How to apply: Please fill out the form and send us an email via ๐ infini-ai-lab.cmu.edu/vacancies



LLMs are getting crazily good at reasoning โ but also crazily slow. Hard problems can make them think for hours. Why? Even with tons of GPUs, they still decode one. token. at. a. time.โณ More GPUs โ faster answers Our ThreadWeaver๐งตโกasks: โWhy not make LLMs think in parallel?โ ๐งต1/N๐



ThreadWeaver Adaptive Threading for Efficient Parallel Reasoning in Language Models


Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com



โ ๏ธDifferent models. Same thoughts.โ ๏ธ Todayโs AI models converge into an ๐๐ซ๐ญ๐ข๐๐ข๐๐ข๐๐ฅ ๐๐ข๐ฏ๐๐ฆ๐ข๐ง๐ ๐, a striking case of mode collapse that persists even across heterogeneous ensembles. Our #neurips2025 ๐&๐ ๐๐ซ๐๐ฅ ๐ฉ๐๐ฉ๐๐ซ (โจ๐ญ๐จ๐ฉ ๐.๐๐%โจ) dives deep into this phenomenon, introducing ๐๐ง๐๐ข๐ง๐ข๐ญ๐ฒ-๐๐ก๐๐ญ, a real-world dataset of 26K real-world open-ended user queries spanning 17 open-ended categories + 31K dense human annotations (๐๐ ๐ข๐ง๐๐๐ฉ๐๐ง๐๐๐ง๐ญ ๐๐ง๐ง๐จ๐ญ๐๐ญ๐จ๐ซ๐ฌ ๐ฉ๐๐ซ ๐๐ฑ๐๐ฆ๐ฉ๐ฅ๐) to push AIโs creative and discovery potential forward. Now you can build your favorite models to be truly original, diverse, and impactful in the open-ended real world. ๐Paper: arxiv.org/abs/2510.22954 ๐Data: huggingface.co/collections/liโฆ We also systematically reveal Artificial Hivemind across: ๐ฅ ๐๐๐ง๐๐ซ๐๐ญ๐ข๐ฏ๐ ๐๐๐ข๐ฅ๐ข๐ญ๐ข๐๐ฌ: not only do individual LLMs repeat themselves, but different models produce strikingly similar content, even when asked fully open-ended questions. ๐ฅ ๐๐ข๐ฌ๐๐ซ๐ข๐ฆ๐ข๐ง๐๐ญ๐ข๐ฏ๐ ๐๐๐ข๐ฅ๐ข๐ญ๐ข๐๐ฌ: LLMs, LM judges, and reward models are systematically miscalibrated when rating alternative responses to open-ended queries. (1/N)





LLMs are getting crazily good at reasoning โ but also crazily slow. Hard problems can make them think for hours. Why? Even with tons of GPUs, they still decode one. token. at. a. time.โณ More GPUs โ faster answers Our ThreadWeaver๐งตโกasks: โWhy not make LLMs think in parallel?โ ๐งต1/N๐