




HanRong YE @ Nvidia
397 posts

@leoyerrrr
Nvidia Research Scientist | Led OmniVinci, MM-Ego, X-VILA..







🚀 Introducing Nemotron-Cascade 2: our new best-in-class 30B-A3B MoE model. 🥇 Gold Medal at IMO 2025, IOI 2025, and the ICPC World Finals. 🔥 Outperforms Qwen3.5-35B-A3B across Math, Code Reasoning, alignment, and instruction following. 🔓 Great reproducibility: Model weights, SFT, and RL data are open! Check out our technical report and huggingface page for more details and insights 👇 📰 Technical Report: t.co/dFC00m6RZU 🤗 Model & Data: t.co/4QJqfTOt6I

🎉 Congrats to @nvidia on the release of Nemotron 3 Super — day-0 support in vLLM v0.17.1! Verified on NVIDIA GPUs. 120B hybrid MoE, only 12B active at inference. Big upgrades over the previous Nemotron Super: - 5x higher throughput - 2x higher accuracy on Artificial Analysis Intelligence Index - Multi-Token Prediction (MTP) for faster long-form generation - Configurable thinking budget — dial accuracy vs token cost per task - 1M token context window Supports BF16, FP8, and NVFP4. Fully open: weights, datasets, recipes. Blog: vllm.ai/blog/nemotron-… 🤝 Thanks @NVIDIAAIDev Nemotron team and vLLM community contributors!

nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.

The paper is now available: huggingface.co/papers/2602.06… More updates coming soon!




📈 According to the official test results from OmniVideoBench using our open-sourced model weights, OmniVinci has once again claimed #1 performance in the 7B LLM category! In addition, the model has been out for just two weeks and downloads have already soared past 6K 🌟





👀Your small LMs (SLMs) are… not that fast? 🚀At NVIDIA Research, we release 𝐍𝐞𝐦𝐨𝐭𝐫𝐨𝐧-𝐅𝐥𝐚𝐬𝐡 (NeurIPS 2025), a hybrid SLM family designed around real-world latency and trained from scratch with 1B/3B sizes, achieving SOTA accuracy, latency, and throughput. 🌟𝐍𝐞𝐦𝐨𝐭𝐫𝐨𝐧-𝐅𝐥𝐚𝐬𝐡 𝐡𝐚𝐬 𝐛𝐞𝐞𝐧 𝐢𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐞𝐝 𝐢𝐧𝐭𝐨 𝐓𝐑𝐓𝐋𝐋𝐌 𝐟𝐨𝐫 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧-𝐠𝐫𝐚𝐝𝐞 𝐢𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 with up to 41K tokens/second on a single H100 GPU! Try it following the instructions in our HF repo. Will share more details at NeurIPS’25 (poster on Thursday, 11am–2pm)! 𝐏𝐚𝐩𝐞𝐫 𝐋𝐢𝐧𝐤: arxiv.org/pdf/2511.18890 🤗 𝐇𝐅 𝐦𝐨𝐝𝐞𝐥𝐬: Nemotron-Flash-1B: huggingface.co/nvidia/Nemotro… Nemotron-Flash-3B: huggingface.co/nvidia/Nemotro… Nemotron-Flash-3B-Instruct: huggingface.co/nvidia/Nemotro…

Nvidia silently dropped Orchestrator-8B 👀 “On the Humanity's Last Exam (HLE) benchmark, ToolOrchestrator-8B achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being approximately 2.5x more efficient.” huggingface.co/nvidia/Orchest…

This decision is deeply concerning and unfair to authors who invested significant effort into the rebuttal process. For example, our students did an amazing job turning a paper with negative reviews into an 8/6/6/6. This policy feels like it just resets all that hard work to zero.

Yep

Tempted to look up reviewer names? Don’t. As an author, you simply cannot emotionally detach your work from the person behind the review. This results in grudges that might last decades. I assure you, if someone disliked your paper, you do *not* want to know who it was. Unfortunately, this makes this leak a major issue as the research community relies on the double-blind review process in order to provide objective and fair evaluations.




