

Runxin Xu
35 posts

@pigjunebaba
AI researcher @deepseek_ai | @PKU1898 | @SJTU1896 Opinions are my own.



🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n


🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗 Open-source weights: huggingface.co/deepseek-ai/De…


🚀 DeepSeek-V3-0324 is out now! 🔹 Major boost in reasoning performance 🔹 Stronger front-end development skills 🔹 Smarter tool-use capabilities ✅ For non-complex reasoning tasks, we recommend using V3 — just turn off “DeepThink” 🔌 API usage remains unchanged 📜 Models are now released under the MIT License, just like DeepSeek-R1! 🔗 Open-source weights: huggingface.co/deepseek-ai/De…



🚀 DeepSeek-V3-0324 is out now! 🔹 Major boost in reasoning performance 🔹 Stronger front-end development skills 🔹 Smarter tool-use capabilities ✅ For non-complex reasoning tasks, we recommend using V3 — just turn off “DeepThink” 🔌 API usage remains unchanged 📜 Models are now released under the MIT License, just like DeepSeek-R1! 🔗 Open-source weights: huggingface.co/deepseek-ai/De…

[1/n] Super excited to introduce our comprehensive survey on Long Context Language Models (LCLM), a collaborative effort between amazing researchers from @NJU1902 , @PKU1898 , CASIA, @AlibabaGroup , @BytedanceTalk , @TencentGlobal & Kuaishou! 🚀 Our survey covers 3 core aspects: 1️⃣ Building powerful & efficient LCLMs 2️⃣ Training & deployment infrastructure 3️⃣ Evaluation & analysis & applications A systematic review of all LCLM research to date! 📊 📄 Paper: arxiv.org/abs/2503.17407 🤗 HF: huggingface.co/papers/2503.17… 💻 Github: github.com/LCLM-Horizon/A… #LCLM #NLP #MachineLearning #AI

🚀 Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: 🔧 Cross-node EP-powered batch scaling 🔄 Computation-communication overlap ⚖️ Load balancing Statistics of DeepSeek's Online Service: ⚡ 73.7k/14.8k input/output tokens per second per H800 node 🚀 Cost profit margin 545% 💡 We hope this week's insights offer value to the community and contribute to our shared AGI goals. 📖 Deep Dive: bit.ly/4ihZUiO


🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costs—without compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning. 📖 For more details, check out our paper here: arxiv.org/abs/2502.11089






For friends of open source: imo the highest leverage thing you can do is help construct a high diversity of RL environments that help elicit LLM cognitive strategies. To build a gym of sorts. This is a highly parallelizable task, which favors a large community of collaborators.

🚨BREAKING: #DeepSeek just topped BOTH US & China iOS free charts - ZERO paid ads! 🎉🚀 We've known since day 1: #AI products ≠ mobile apps. When core tech makes quantum leaps, even simple interfaces become magic. Time to double down on hardcore innovation. 💪

We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the results are surprisingly strong. 🚀 Starting from Qwen2.5-Math-7B (base model), we perform RL on it directly. No SFT, no reward model, just 8K MATH examples for verification, the resultant model achieves (pass@1) 33.3% on AIME, 62.5% on AMC, and 77.2% on MATH, outperforming Qwen2.5-math-7B-instruct and being comparable to PRIME and rStar-MATH that use >50x more data and more complicated components. 🚀 Increased CoT length and self-reflection emerge We share the details and our findings in the blog: hkust-nlp.notion.site/simplerl-reason Training code and implementation details here: github.com/hkust-nlp/simp…

🚀 DeepSeek-R1 is here! ⚡ Performance on par with OpenAI-o1 📖 Fully open-source model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Try DeepThink at chat.deepseek.com today! 🐋 1/n