Fuli Luo

10 posts

Fuli Luo

Fuli Luo

@_LuoFuli

Now building @XiaomiMiMo. Previously @deepseek_ai

Katılım Kasım 2023
147 Takip Edilen40.5K Takipçiler
Sabitlenmiş Tweet
Fuli Luo
Fuli Luo@_LuoFuli·
MiMo-V2-Pro & Omni & TTS is out. Our first full-stack model family built truly for the Agent era. I call this a quiet ambush — not because we planned it, but because the shift from Chat to Agent paradigm happened so fast, even we barely believed it. Somewhere in between was a process that was thrilling, painful, and fascinating all at once. The 1T base model started training months ago. The original goal was long-context reasoning efficiency. Hybrid Attention carries real innovation, without overreaching — and it turns out to be exactly the right foundation for the Agent era. 1M context window. MTP inference for ultra-low latency and cost. These architectural decisions weren't trendy. They were a structural advantage we built before we needed it. What changed everything was experiencing a complex agentic scaffold — what I'd call orchestrated Context — for the first time. I was shocked on day one. I tried to convince the team to use it. That didn't work. So I gave a hard mandate: anyone on MiMo Team with fewer than 100 conversations tomorrow can quit. It worked. Once the team's imagination was ignited by what agentic systems could do, that imagination converted directly into research velocity. People ask why we move so fast. I saw it firsthand building DeepSeek R1. My honest summary: — Backbone and Infra research has long cycles. You need strategic conviction a year before it pays off. — Posttrain agility is a different muscle: product intuition driving evaluation, iteration cycles compressed, paradigm shifts caught early. — And the constant: curiosity, sharp technical instinct, decisive execution, full commitment — and something that's easy to underestimate: a genuine love for the world you're building for. We will open-source — when the models are stable enough to deserve it. From Beijing, very late, not quite awake.
English
275
467
5.2K
1.5M
Fuli Luo
Fuli Luo@_LuoFuli·
Imagination is the ceiling of productivity in the new era. Inspiring imagination is the core of management in the age of Claw.
English
14
26
326
48.6K
Fuli Luo retweetledi
LMSYS Org
LMSYS Org@lmsysorg·
SGLang + Miles: Rollout Routing Replay (R3) is Now Live! 🎉 We're excited to announce that SGLang and Miles now support Rollout Routing Replay (R3) for stable reinforcement learning training on MoE models! Training MoE models with RL has been notoriously unstable, often leading to catastrophic collapse. The problem? Routing inconsistency between inference and training engines. R3 fixes this by recording expert routing decisions during inference and replaying them during training. The impact is significant: dramatically reduced training-inference discrepancy by reusing inference routing decisions, preventing training collapse. R3 has full distributed training support with DataParallel Attention and all parallelism strategies, supported models include Qwen3-30B-A3B, deepseek_v2, etc. Try it out and let us know your results! 🚀
LMSYS Org tweet media
English
7
19
214
104.1K
Fuli Luo retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Xiaomi has just launched MiMo-V2-Flash, a 309B open weights reasoning model that scores 66 on the Artificial Analysis Intelligence Index. This release elevates Xiaomi to alongside other leading AI model labs. Key benchmarking takeaways: ➤ Strengths in Agentic Tool Use and Competition Math: MiMo-V2-Flash scores 95% on τ²-Bench Telecom and 96% on AIME 2025, demonstrating strong performance on agentic tool-use workflows and competition-style mathematical reasoning. MiMo-V2-Flash currently leads the τ²-Bench Telecom category among evaluated models ➤ Cost competitive: The full Artificial Analysis evaluation suite cost just $53 to run. This is supported by MiMo-V2-Flash’s highly competitive pricing of $0.10 per million input and $0.30 per million output, making it particularly attractive for cost-sensitive deployments and large-scale production workloads. This is similar to DeepSeek V3.2 ($54 total cost to run), and well below GPT-5.2 ($1,294 total cost to run) ➤ High token usage: MiMo-V2-Flash is demonstrates high verbosity and token usage relative to other models in the same intelligence tier, using ~150M reasoning tokens across the Artificial Analysis Intelligence suite ➤ Open weights: MiMo-V2-Flash is open weights and is 309B parameters with 15B active at inference time. Weights are released under a MIT license, continuing the trend of Chinese AI model labs open sourcing their frontier models See below for further analysis:
Artificial Analysis tweet media
English
21
69
580
214.7K
Fuli Luo
Fuli Luo@_LuoFuli·
MiMo-V2-Flash is live. It’s just step 2 on our AGI roadmap, but I wanted to dump some notes on the engineering choices that actually moved the needle. Architecture: We settled on a Hybrid SWA. It’s simple, elegant, and in our internal benchmarks, it outperformed other Linear Attention variants on long context reasoning. Plus, a fixed KV cache just plays way nicer with current infra. Note: Window size 128 turned out to be the magic number (512 actually degraded performance). Also, sink values are non-negotiable—don't skip them. MTP (Multi-Token Prediction): This is underrated for efficient RL. Aside from the first layer, it needs surprisingly little fine-tuning to hit high accept length. With a 3-layer MTP, we're seeing >3 accept length and ~2.5x speedup in coding tasks. It effectively solves the GPU idle time from long-tail samples in small-batch On-Policy RL. We didn't get to squeeze it into the RL loop this time due to deadlines, but it’s a perfect fit. We open-sourced the 3-layer MTPs so you can develop with it. Posttrain with MOPD: We adopted On-Policy-Distillation from Thinking Machine to merge multiple RL models, and the efficiency gains were wild. We matched the teacher model's performance using less than 1/50th the compute of a standard SFT+RL pipeline. There’s a clear path here for a self-reinforcing loop where the student evolves into a stronger teacher. Huge props to my team. They sculpted these ideas from scratch into production in just a few months. Full breakdown is in the tech report. If this kind of pragmatic engineering resonates with you, we should talk.
English
79
113
1.2K
394.2K
Fuli Luo retweetledi
Xiaomi MiMo
Xiaomi MiMo@XiaomiMiMo·
⚡ Faster than Fast. Designed for Agentic AI. Introducing Xiaomi MiMo-V2-Flash — our new open-source MoE model: 309B total params, 15B active. Blazing speed meets frontier performance. 🔥 Highlights: 🏗️ Hybrid Attention: 5:1 interleaved 128-window SWA + Global | 256K context 📈 Performance: ⚔️ Matches DeepSeek-V3.2 on general benchmarks — at a fraction of the latency 🏆 SWE-Bench Verified: 73.4% | SWE-Bench Multilingual: 71.7% — new SOTA for open-source models 🚀 Speed: 150 output tokens/s with Day-0 support from @lmsysorg🤝 🤗 Model: hf.co/XiaomiMiMo/MiM… 📝 Blog Post: mimo.xiaomi.com/blog/mimo-v2-f… 📄 Technical Report: github.com/XiaomiMiMo/MiM… 🎨 AI Studio: aistudio.xiaomimimo.com
Xiaomi MiMo tweet media
English
91
301
1.9K
551.3K
Fuli Luo
Fuli Luo@_LuoFuli·
Intelligence will inevitably evolve from language to the physical world, unlocking spatial intelligence for multi-modal perception, reasoning, generation, and action—essential for true AGI. I'm working on building this at @XiaomiMiMo, spearheading a creative and talented team!
Fuli Luo tweet media
English
26
24
364
137K