Yang Chen

113 posts

Yang Chen banner
Yang Chen

Yang Chen

@ychenNLP

Research Scientist @NVIDIA | PhD @GeorgiaTech| RL and LLM reasoning

Katılım Eylül 2018
477 Takip Edilen1.2K Takipçiler
Sabitlenmiş Tweet
Yang Chen
Yang Chen@ychenNLP·
We released Nemotron Cascade 2 30B A3B. What makes this release especially meaningful to me is that it reflects a 1.5-year journey at NVIDIA around one core idea: improving AI math reasoning through self-improvement at test time. Each project tackled a different part of that problem. With AceMath (24 Q4), we built an external verifier model to identify the right solution during test-time scaling. With AceReason (24 Q1-2), we scaled the reasoning capabilities of the model through RL so the model could spend more time reflecting while solving problems. Along the way, we found a general, simple and effective RL recipe that we’ve kept using since. And now with Cascade 2 (25 Q1), we’ve pushed that effort further: the model can generate hypotheses, verify them, and refine them on its own. That self-improvement loop is what enabled IMO gold-level performance at 30B level. From MATH500, to AIME, and now IMO Proof. This team is THE BEST. Technical report: huggingface.co/papers/2603.19…
Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English
4
17
103
9.3K
Yang Chen retweetledi
Boxin Wang
Boxin Wang@wbx_life·
🔥 The era of ultra-efficient reasoning is here: our Nemotron-Cascade-2-30B-A3B currently trending at #1 on huggingface 🤗 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025 -- all from a model with only 3B active parameters. 🤯 ⚡ SOTA alignment and instruction following capabilties even compared with larger LLMs The secret sauce? Cascade RL. 🧬 1️⃣ Cascade RL not only pushes the model limits on each domain, but also generates elite "teachers" for every expert domain. 2️⃣ Multi-domain on-policy distillation uses expert teachers to keep the student model sharp, mitigating domain shifts and matching expert-level performance. 💻 Read the blog: research.nvidia.com/labs/nemotron/… 📑 Check the paper: arxiv.org/abs/2603.19220 🤗 Get the weights: huggingface.co/collections/nv… #AI #MachineLearning #NVIDIA #LLM
Boxin Wang tweet media
Bryan Catanzaro@ctnzr

Thank you to everyone in the community who is testing and using Nemotron models. It's great to see Nemotron-Cascade-2, Nemotron-3-Super and Nemotron-3-Nano trending on HF. The Nemotron team is working hard to incorporate all your feedback into Nemotron 4. And yes, Nemotron 3 Ultra is still on track for release. huggingface.co/models?pipelin…

English
7
26
202
21.3K
Yang Chen retweetledi
Bryan Catanzaro
Bryan Catanzaro@ctnzr·
Thank you to everyone in the community who is testing and using Nemotron models. It's great to see Nemotron-Cascade-2, Nemotron-3-Super and Nemotron-3-Nano trending on HF. The Nemotron team is working hard to incorporate all your feedback into Nemotron 4. And yes, Nemotron 3 Ultra is still on track for release. huggingface.co/models?pipelin…
Bryan Catanzaro tweet media
English
18
36
202
43.4K
Yang Chen retweetledi
Sudo su
Sudo su@sudoingX·
the hype around this model settled fast. good. now i can test it without the noise. NVIDIA released nemotron cascade. 30B total, 3B active. fits on a single RTX 3090. hybrid mamba MoE. gold medal on the international math olympiad with only 3 billion active parameters. they say it beats qwen on math, code, and reasoning. i tested qwen 3.5 35B-A3B on a single 3090 at 112 tok/s. now same card, same tests, different architecture. mamba vs deltanet. nvidia vs alibaba. receipts incoming tonight.
Sudo su tweet media
Sudo su@sudoingX

testing cascade 2 on a single 3090 right now. same card i tested qwen 3.5 35B-A3B on at 112 tok/s. same active params, same VRAM tier, different hybrid architectures. mamba vs deltanet head to head. numbers coming tonight. if a spark lands on my desk next you'll get those numbers too.

English
49
17
604
61.7K
Justin Thyme 🇺🇸🐿️
@ychenNLP this model is soooooo good it's been days now and I still cannot get over how well this model reasons, coherently and correctly; I know it's got limitations, but it is a JOY to use; I can TRUST it; this is the first small model that I can trust you have bottled magic; thank you
English
1
0
1
71
Yang Chen
Yang Chen@ychenNLP·
We released Nemotron Cascade 2 30B A3B. What makes this release especially meaningful to me is that it reflects a 1.5-year journey at NVIDIA around one core idea: improving AI math reasoning through self-improvement at test time. Each project tackled a different part of that problem. With AceMath (24 Q4), we built an external verifier model to identify the right solution during test-time scaling. With AceReason (24 Q1-2), we scaled the reasoning capabilities of the model through RL so the model could spend more time reflecting while solving problems. Along the way, we found a general, simple and effective RL recipe that we’ve kept using since. And now with Cascade 2 (25 Q1), we’ve pushed that effort further: the model can generate hypotheses, verify them, and refine them on its own. That self-improvement loop is what enabled IMO gold-level performance at 30B level. From MATH500, to AIME, and now IMO Proof. This team is THE BEST. Technical report: huggingface.co/papers/2603.19…
Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English
4
17
103
9.3K
Prompt Injection
Prompt Injection@PromptInjection·
@ychenNLP Btw... while we're on the subject... will there also be a Cascade 120B or 500B? I like the SFT style (the way the models speak) way more than the one from the "normal" Nemotrons.
English
1
0
1
41
Yang Chen
Yang Chen@ychenNLP·
@bnjmn_marie that's strange. I don't see this with cascade 2 model. we report avg@64. I see it only fluctuates around 83 - 96.
English
1
0
5
832
Benjamin Marie
Benjamin Marie@bnjmn_marie·
Qwen3.5 can score very high on benchmarks like AIME and Live Code Bench. And still be wildly unstable, like most LLMs. This is what technical report typically don't show. For instance, run AIME (math questions) 32 times, and you may get 32 different outcomes. Get (extremely) unlucky, and the score can land below 50%. That’s the difference between “I tried it” and actual evaluation. To know how good a model is, you need repeated runs on multiple tasks. This is also why quantized models sometimes look better than the original on tiny benchmarks: variance is doing a lot of the work. Note: screenshot is AIME24 + AIME25 in parallel with Qwen3.5 122B. It shows the answer distribution for 32 runs. The answers are mostly correct, on average.
Benjamin Marie tweet media
English
12
8
123
16.2K
Yang Chen retweetledi
David Hendrickson
David Hendrickson@TeksEdge·
⚠️ Why are you using Qwen3.5-35B as your local coding agent (for ur🦞Clawdbot)? 🛑 Switch your local model right now. ⬇️ Drop: Qwen3.5-35B-A3B ⬆️ Load: NVIDIA Nemotron-Cascade-2-30B-A3B Would 🪙 50 tps w/100K context window sway you to drop local Qwen3.5? @nvidia's new 30B Cascade 2 model Benchmaxxes Qwen in the two metrics that actually matter for coding agents: 🧠 Superior Code Reasoning ⚙️ Instruction Following & Function Calling Whether you are running a 64GB 🪟 AMD AI Max+ PC or a 64GB 🍎 Mac Studio (shoutout to the 8-bit MLX release for @Apple fans), this is the new champion for local dev work. 🏆💻
David Hendrickson tweet media
David Hendrickson@TeksEdge

🚀 Nemotron-Cascade-2 was built from the raw underperforming Nemotron-3-Nano-30B-A3B-Base (zero tuning) & Benchmaxxed Hard to beat Qwen3.5-35B-A3B. ⁉️ Is this bad if open sourced? Here’s what happened 👇 ✅ They loaded it with 4.4M competition math samples + 816K proofs ✅ Heavy SFT + Cascade RL (GRPO + MOPD distillation) ✅ Teachers: DeepSeek-V3.2, GPT-OSS-120B & Qwen3-235B ✅ Forced step-by-step thinking + Python tool calling 🎯 Goal: Turn a weak base into a tiny math olympiad monster (30B total / 3B active MoE) 🏆 Results (👀 verified): • IMO 2025: 35/42 (Gold medal performance) • AIME 2025: 92.4% (98.6% with tools) — beats Qwen 91.9% • HMMT Feb 2025: 94.6% • LiveCodeBench v6: 87.2% which crushes Qwen by 12+ pts ❓Is that so bad? Hell no. It’s brilliant specialization! They engineered a 30B math genius instead of another generic chatbot. @NVIDIA just showed smart post-training > raw scale. 🔥

English
34
24
331
35K
Yang Chen retweetledi
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
Nvidia Nemotron-Cascade-2-30B-A3B is on mlx in 4,6 and 8bit! It's a super model! Try it!
Ivan Fioravanti ᯅ tweet mediaIvan Fioravanti ᯅ tweet media
English
7
11
152
9.7K
Yang Chen retweetledi
DailyPapers
DailyPapers@HuggingPapers·
NVIDIA just released Nemotron-Cascade 2 on Hugging Face A 30B MoE model with 3B activated parameters that achieves gold medal performance at IMO and IOI 2025.
DailyPapers tweet media
English
8
43
318
27.6K
Yang Chen retweetledi
Yang Chen retweetledi
ollama
ollama@ollama·
Nemotron-Cascade-2 is now available to run with Ollama. ollama run nemotron-cascade-2 To run it locally with OpenClaw: ollama launch openclaw --model nemotron-cascade-2 This model from NVIDIA delivers strong reasoning and agentic capabilities on par with models with up to 20x more parameters.
English
30
67
568
38.8K
Yang Chen retweetledi
Zhuolin Yang
Zhuolin Yang@lucas110550·
It’s been a busy but remarkable spring season, with frontier labs releasing their powerful models with impressive results at large scale. Still, we are excited to see our 30B-A3B MoE model could match or even outperform frontier-series models on general domains with our cascade RL pipeline design. Still few bullets here: * Cascade RL is still robust. We observed minimal drop across all RL stages on various domains. * MOPD is a magic. We saw this (or it’s variants) was applied in frontier lab’s tech report, and it is super useful on aggregating multi-domain’s expertise throughout your cascade RL training pipeline. I would describe it as "learn from your slice of life, in parallel worlds". * For competitive coding domain, yes I’m finally outclassed, but proud that this model is stronger than I am. I really feel my "Aja Huang" moment. Hope you enjoy this spring gift. Model weights: huggingface.co/collections/nv… Tech report: research.nvidia.com/labs/nemotron/…
Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English
0
8
36
3K