Yonggan Fu

6 posts

Yonggan Fu

@YongganFu

Research Scientist @ NVIDIA Research; PhD @ Georgia Institute of Technology

Santa Clara, California Katılım Ekim 2024

89 Takip Edilen73 Takipçiler

Sabitlenmiş Tweet

Yonggan Fu@YongganFu·1 Ara

👀Your small LMs (SLMs) are… not that fast? 🚀At NVIDIA Research, we release 𝐍𝐞𝐦𝐨𝐭𝐫𝐨𝐧-𝐅𝐥𝐚𝐬𝐡 (NeurIPS 2025), a hybrid SLM family designed around real-world latency and trained from scratch with 1B/3B sizes, achieving SOTA accuracy, latency, and throughput. 🌟𝐍𝐞𝐦𝐨𝐭𝐫𝐨𝐧-𝐅𝐥𝐚𝐬𝐡 𝐡𝐚𝐬 𝐛𝐞𝐞𝐧 𝐢𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐞𝐝 𝐢𝐧𝐭𝐨 𝐓𝐑𝐓𝐋𝐋𝐌 𝐟𝐨𝐫 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧-𝐠𝐫𝐚𝐝𝐞 𝐢𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 with up to 41K tokens/second on a single H100 GPU! Try it following the instructions in our HF repo. Will share more details at NeurIPS’25 (poster on Thursday, 11am–2pm)! 𝐏𝐚𝐩𝐞𝐫 𝐋𝐢𝐧𝐤: arxiv.org/pdf/2511.18890 🤗 𝐇𝐅 𝐦𝐨𝐝𝐞𝐥𝐬: Nemotron-Flash-1B: huggingface.co/nvidia/Nemotro… Nemotron-Flash-3B: huggingface.co/nvidia/Nemotro… Nemotron-Flash-3B-Instruct: huggingface.co/nvidia/Nemotro…

English

16.3K

Yonggan Fu@YongganFu·22 Oca

@wen_kaiyue @xingyudang @vfleaking @tengyuma @percyliang @SonglinYang4 Thank you for acknowledging our Nemotron-Flash! Glad to see similar insights here

English

326

Kaiyue Wen@wen_kaiyue·21 Oca

(7/n) This blog is done jointly with @xingyudang, Kaifeng Lyu (@vfleaking ), @tengyuma and @percyliang. We would like to specially thank @SonglinYang4 for motivating this blog post into existence. Also check out the great concurrent works: arxiv.org/abs/2511.18890 and arxiv.org/abs/2601.08393!

English

5.4K

Kaiyue Wen@wen_kaiyue·21 Oca

(1/n) Introducing Hyperball — an optimizer wrapper that keeps weight & update norm constant and lets you control the effective (angular) step size directly. Result: sustained speedups across scales + strong hyperparameter transfer.

English

121

687

197.5K

Yonggan Fu@YongganFu·7 Oca

Check our Nemotron-ToolOrchestra, which orchestrates frontier models to hit #1 on GAIA!

NVIDIA AI Developer@NVIDIAAIDev

Nemotron-ToolOrchestra just hit #1 on the GAIA benchmark 🔥 Congratulations to the team on their leaderboard-topping submission!

English

Yonggan Fu retweetledi

Shizhe Diao@shizhediao·27 Kas

🚀 Excited to share ToolOrchestra, an end-to-end RL training framework for orchestrating tools and agentic workflows. Everyone’s building agent workflows these days — connecting tools, APIs, and LLMs like LEGO. 🧩 But here are our findings: 👉 Just prompting the agent workflow won’t cut it. It’s not how you build the best agent. 👉 Without learning, workflows plateau fast. It’s time to bring RL fine-tuning 🔥back into agent development. (1/n)

English

348

67.7K

Yonggan Fu retweetledi

Pavlo Molchanov@PavloMolchanov·26 Kas

🚀 Introducing Hymba-1.5B: a new hybrid architecture for efficient small language models! ✅ Outperforms Llama, Qwen, and SmolLM2 with 6-12x less training ✅ Massive reductions in KV cache size & good throughput boost ✅ Combines Mamba & Attention in a Hybrid Parallel Architecture ✅ Base and Instruct with open license on HF 🤗 HF: tinyurl.com/hymba1-5b-hf 📚 Arxiv: arxiv.org/abs/2411.13676 🐙 GitHub: github.com/NVlabs/hymba Long post with analysis and insights

English

244

52.7K

Yonggan Fu retweetledi

Pavlo Molchanov@PavloMolchanov·22 Kas

Sharing our team’s latest work on Hymba - an efficient small language model with hybrid architecture. Tech report: arxiv.org/abs/2411.13676 Discover the tradeoff between Mamba and Attention, how they can be combined, how attention sink and forced-to-attend phenomena can be mitigated, and how KV cache can be shared across layers. Learn how we built a model with end-to-end ecosystem: data selection, architecture analysis and design, training Base and Instruct models and open them to the community. Did I mention that our Hymba-1.5B Base model outperforms LLaMA 3.2-3B while being trained on 7× fewer tokens and achieving 12× higher throughput? More details and model links come soon!

English

494

97.6K

Keşfet

@wen_kaiyue @xingyudang @vfleaking @tengyuma @percyliang @SonglinYang4 @elonmusk @BarackObama