Yang Chen (@ychenNLP) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

We released Nemotron Cascade 2 30B A3B. What makes this release especially meaningful to me is that it reflects a 1.5-year journey at NVIDIA around one core idea: improving AI math reasoning through self-improvement at test time. Each project tackled a different part of that problem. With AceMath (24 Q4), we built an external verifier model to identify the right solution during test-time scaling. With AceReason (24 Q1-2), we scaled the reasoning capabilities of the model through RL so the model could spend more time reflecting while solving problems. Along the way, we found a general, simple and effective RL recipe that we’ve kept using since. And now with Cascade 2 (25 Q1), we’ve pushed that effort further: the model can generate hypotheses, verify them, and refine them on its own. That self-improvement loop is what enabled IMO gold-level performance at 30B level. From MATH500, to AIME, and now IMO Proof. This team is THE BEST. Technical report: huggingface.co/papers/2603.19…

Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English

4

17

103

9.3K

Yang Chen@ychenNLP·15h

@wbx_life 🐂🐂🐂

QME

0

1

326

Yang Chen retweetledi

Boxin Wang@wbx_life·17h

🔥 The era of ultra-efficient reasoning is here: our Nemotron-Cascade-2-30B-A3B currently trending at #1 on huggingface 🤗 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025 -- all from a model with only 3B active parameters. 🤯 ⚡ SOTA alignment and instruction following capabilties even compared with larger LLMs The secret sauce? Cascade RL. 🧬 1️⃣ Cascade RL not only pushes the model limits on each domain, but also generates elite "teachers" for every expert domain. 2️⃣ Multi-domain on-policy distillation uses expert teachers to keep the student model sharp, mitigating domain shifts and matching expert-level performance. 💻 Read the blog: research.nvidia.com/labs/nemotron/… 📑 Check the paper: arxiv.org/abs/2603.19220 🤗 Get the weights: huggingface.co/collections/nv… #AI #MachineLearning #NVIDIA #LLM

Bryan Catanzaro@ctnzr

Thank you to everyone in the community who is testing and using Nemotron models. It's great to see Nemotron-Cascade-2, Nemotron-3-Super and Nemotron-3-Nano trending on HF. The Nemotron team is working hard to incorporate all your feedback into Nemotron 4. And yes, Nemotron 3 Ultra is still on track for release. huggingface.co/models?pipelin…

English

7

26

202

21.3K

Yang Chen retweetledi

Wenliang Dai@Wenliang_Dai·22h

🚀🚀🚀

Bryan Catanzaro@ctnzr

Thank you to everyone in the community who is testing and using Nemotron models. It's great to see Nemotron-Cascade-2, Nemotron-3-Super and Nemotron-3-Nano trending on HF. The Nemotron team is working hard to incorporate all your feedback into Nemotron 4. And yes, Nemotron 3 Ultra is still on track for release. huggingface.co/models?pipelin…

ART

0

1

6

194

Yang Chen retweetledi

Bryan Catanzaro@ctnzr·22h

Thank you to everyone in the community who is testing and using Nemotron models. It's great to see Nemotron-Cascade-2, Nemotron-3-Super and Nemotron-3-Nano trending on HF. The Nemotron team is working hard to incorporate all your feedback into Nemotron 4. And yes, Nemotron 3 Ultra is still on track for release. huggingface.co/models?pipelin…

English

18

36

202

43.4K

Yang Chen retweetledi

Sudo su@sudoingX·1d

the hype around this model settled fast. good. now i can test it without the noise. NVIDIA released nemotron cascade. 30B total, 3B active. fits on a single RTX 3090. hybrid mamba MoE. gold medal on the international math olympiad with only 3 billion active parameters. they say it beats qwen on math, code, and reasoning. i tested qwen 3.5 35B-A3B on a single 3090 at 112 tok/s. now same card, same tests, different architecture. mamba vs deltanet. nvidia vs alibaba. receipts incoming tonight.

Sudo su@sudoingX

testing cascade 2 on a single 3090 right now. same card i tested qwen 3.5 35B-A3B on at 112 tok/s. same active params, same VRAM tier, different hybrid architectures. mamba vs deltanet head to head. numbers coming tonight. if a spark lands on my desk next you'll get those numbers too.

English

49

17

604

61.7K

Yang Chen retweetledi

Yangyi Chen@YangyiChen6666·22h

Honored to be in the team! 🥳

Bryan Catanzaro@ctnzr

Thank you to everyone in the community who is testing and using Nemotron models. It's great to see Nemotron-Cascade-2, Nemotron-3-Super and Nemotron-3-Nano trending on HF. The Nemotron team is working hard to incorporate all your feedback into Nemotron 4. And yes, Nemotron 3 Ultra is still on track for release. huggingface.co/models?pipelin…

English

0

2

11

346

Yang Chen retweetledi

Wei Ping@_weiping·22h

SO PROUD! 🚀

Bryan Catanzaro@ctnzr

Thank you to everyone in the community who is testing and using Nemotron models. It's great to see Nemotron-Cascade-2, Nemotron-3-Super and Nemotron-3-Nano trending on HF. The Nemotron team is working hard to incorporate all your feedback into Nemotron 4. And yes, Nemotron 3 Ultra is still on track for release. huggingface.co/models?pipelin…

English

0

5

20

1.2K

Yang Chen@ychenNLP·22h

so proud!🔥🔥🔥

Bryan Catanzaro@ctnzr

Thank you to everyone in the community who is testing and using Nemotron models. It's great to see Nemotron-Cascade-2, Nemotron-3-Super and Nemotron-3-Nano trending on HF. The Nemotron team is working hard to incorporate all your feedback into Nemotron 4. And yes, Nemotron 3 Ultra is still on track for release. huggingface.co/models?pipelin…

English

0

2

10

168

Yang Chen@ychenNLP·1d

@looking5452 🙏🏻 will keep improving

English

0

2

44

Justin Thyme 🇺🇸🐿️@looking5452·1d

@ychenNLP this model is soooooo good it's been days now and I still cannot get over how well this model reasons, coherently and correctly; I know it's got limitations, but it is a JOY to use; I can TRUST it; this is the first small model that I can trust you have bottled magic; thank you

English

1

0

1

71

Yang Chen@ychenNLP·1d

We released Nemotron Cascade 2 30B A3B. What makes this release especially meaningful to me is that it reflects a 1.5-year journey at NVIDIA around one core idea: improving AI math reasoning through self-improvement at test time. Each project tackled a different part of that problem. With AceMath (24 Q4), we built an external verifier model to identify the right solution during test-time scaling. With AceReason (24 Q1-2), we scaled the reasoning capabilities of the model through RL so the model could spend more time reflecting while solving problems. Along the way, we found a general, simple and effective RL recipe that we’ve kept using since. And now with Cascade 2 (25 Q1), we’ve pushed that effort further: the model can generate hypotheses, verify them, and refine them on its own. That self-improvement loop is what enabled IMO gold-level performance at 30B level. From MATH500, to AIME, and now IMO Proof. This team is THE BEST. Technical report: huggingface.co/papers/2603.19…

Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English

4

17

103

9.3K

Yang Chen@ychenNLP·1d

@wbx_life 🫡🫡🫡

QME

0

1

34

Boxin Wang@wbx_life·1d

🐐🐐

Yang Chen@ychenNLP

We released Nemotron Cascade 2 30B A3B. What makes this release especially meaningful to me is that it reflects a 1.5-year journey at NVIDIA around one core idea: improving AI math reasoning through self-improvement at test time. Each project tackled a different part of that problem. With AceMath (24 Q4), we built an external verifier model to identify the right solution during test-time scaling. With AceReason (24 Q1-2), we scaled the reasoning capabilities of the model through RL so the model could spend more time reflecting while solving problems. Along the way, we found a general, simple and effective RL recipe that we’ve kept using since. And now with Cascade 2 (25 Q1), we’ve pushed that effort further: the model can generate hypotheses, verify them, and refine them on its own. That self-improvement loop is what enabled IMO gold-level performance at 30B level. From MATH500, to AIME, and now IMO Proof. This team is THE BEST. Technical report: huggingface.co/papers/2603.19…

ART

1

0

3

736

Yang Chen@ychenNLP·1d

@PromptInjection Thanks! The team is working very hard on this

English

0

1

18

Prompt Injection@PromptInjection·1d

@ychenNLP Btw... while we're on the subject... will there also be a Cascade 120B or 500B? I like the SFT style (the way the models speak) way more than the one from the "normal" Nemotrons.

English

1

0

1

41

Yang Chen@ychenNLP·1d

@bnjmn_marie that's strange. I don't see this with cascade 2 model. we report avg@64. I see it only fluctuates around 83 - 96.

English

1

0

5

832

Benjamin Marie@bnjmn_marie·1d

Qwen3.5 can score very high on benchmarks like AIME and Live Code Bench. And still be wildly unstable, like most LLMs. This is what technical report typically don't show. For instance, run AIME (math questions) 32 times, and you may get 32 different outcomes. Get (extremely) unlucky, and the score can land below 50%. That’s the difference between “I tried it” and actual evaluation. To know how good a model is, you need repeated runs on multiple tasks. This is also why quantized models sometimes look better than the original on tiny benchmarks: variance is doing a lot of the work. Note: screenshot is AIME24 + AIME25 in parallel with Qwen3.5 122B. It shows the answer distribution for 32 runs. The answers are mostly correct, on average.

English

12

8

123

16.2K

Yang Chen retweetledi

David Hendrickson@TeksEdge·2d

⚠️ Why are you using Qwen3.5-35B as your local coding agent (for ur🦞Clawdbot)? 🛑 Switch your local model right now. ⬇️ Drop: Qwen3.5-35B-A3B ⬆️ Load: NVIDIA Nemotron-Cascade-2-30B-A3B Would 🪙 50 tps w/100K context window sway you to drop local Qwen3.5? @nvidia's new 30B Cascade 2 model Benchmaxxes Qwen in the two metrics that actually matter for coding agents: 🧠 Superior Code Reasoning ⚙️ Instruction Following & Function Calling Whether you are running a 64GB 🪟 AMD AI Max+ PC or a 64GB 🍎 Mac Studio (shoutout to the 8-bit MLX release for @Apple fans), this is the new champion for local dev work. 🏆💻

David Hendrickson@TeksEdge

🚀 Nemotron-Cascade-2 was built from the raw underperforming Nemotron-3-Nano-30B-A3B-Base (zero tuning) & Benchmaxxed Hard to beat Qwen3.5-35B-A3B. ⁉️ Is this bad if open sourced? Here’s what happened 👇 ✅ They loaded it with 4.4M competition math samples + 816K proofs ✅ Heavy SFT + Cascade RL (GRPO + MOPD distillation) ✅ Teachers: DeepSeek-V3.2, GPT-OSS-120B & Qwen3-235B ✅ Forced step-by-step thinking + Python tool calling 🎯 Goal: Turn a weak base into a tiny math olympiad monster (30B total / 3B active MoE) 🏆 Results (👀 verified): • IMO 2025: 35/42 (Gold medal performance) • AIME 2025: 92.4% (98.6% with tools) — beats Qwen 91.9% • HMMT Feb 2025: 94.6% • LiveCodeBench v6: 87.2% which crushes Qwen by 12+ pts ❓Is that so bad? Hell no. It’s brilliant specialization! They engineered a 30B math genius instead of another generic chatbot. @NVIDIA just showed smart post-training > raw scale. 🔥

English

34

24

331

35K

Yang Chen retweetledi

Ivan Fioravanti ᯅ@ivanfioravanti·5d

Nvidia Nemotron-Cascade-2-30B-A3B is on mlx in 4,6 and 8bit! It's a super model! Try it!

English

7

11

152

9.7K

Yang Chen retweetledi

DailyPapers@HuggingPapers·5d

NVIDIA just released Nemotron-Cascade 2 on Hugging Face A 30B MoE model with 3B activated parameters that achieves gold medal performance at IMO and IOI 2025.

English

8

43

318

27.6K

Yang Chen retweetledi

Dan McAteer@daniel_mac8·5d

Nvidia released Nemotron-Cascade 2. A 30B-A3 MoE open model on par with Kimi K2.5 on LiveCodeBench. It achieved IMO gold level!

Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English

7

29

276

28.5K

Yang Chen retweetledi

ollama@ollama·4d

Nemotron-Cascade-2 is now available to run with Ollama. ollama run nemotron-cascade-2 To run it locally with OpenClaw: ollama launch openclaw --model nemotron-cascade-2 This model from NVIDIA delivers strong reasoning and agentic capabilities on par with models with up to 20x more parameters.

English

30

67

568

38.8K

Yang Chen retweetledi

AK@_akhaliq·4d

Nvidia just released Nemotron-Cascade 2 on Hugging Face paper: huggingface.co/papers/2603.19… model: huggingface.co/collections/nv…

English

7

16

44

7.1K

Yang Chen retweetledi

Zhuolin Yang@lucas110550·5d

It’s been a busy but remarkable spring season, with frontier labs releasing their powerful models with impressive results at large scale. Still, we are excited to see our 30B-A3B MoE model could match or even outperform frontier-series models on general domains with our cascade RL pipeline design. Still few bullets here: * Cascade RL is still robust. We observed minimal drop across all RL stages on various domains. * MOPD is a magic. We saw this (or it’s variants) was applied in frontier lab’s tech report, and it is super useful on aggregating multi-domain’s expertise throughout your cascade RL training pipeline. I would describe it as "learn from your slice of life, in parallel worlds". * For competitive coding domain, yes I’m finally outclassed, but proud that this model is stronger than I am. I really feel my "Aja Huang" moment. Hope you enjoy this spring gift. Model weights: huggingface.co/collections/nv… Tech report: research.nvidia.com/labs/nemotron/…

Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English

0

8

36

3K

Yang Chen

Keşfet