Jian Zhang

107 posts

Jian Zhang

@JianZhangCS

Director & Distinguished Scientist @Nvidia Nemotron Post-training | Co-founder, CTO at @NexusflowX | Ex-Director of ML at @SambaNovaAI | PhD in ML at @Stanford

Palo Alto, CA Katılım Haziran 2017

238 Takip Edilen594 Takipçiler

Jian Zhang retweetledi

Bryan Catanzaro@ctnzr·1d

Today we're releasing Nemotron 3 Nano Omni. Audio, Video, Image, Text ➡️ Text Ask questions about all your data. Amazing efficiency powered by the Nemotron Hybrid SSM MoE architecture. State of the art multimodal intelligence.

English

348

24.1K

Jian Zhang retweetledi

NVIDIA@nvidia·1d

x.com/i/article/2049…

ZXX

120

741

100.6K

Jian Zhang retweetledi

NVIDIA AI Developer@NVIDIAAIDev·26 Mar

@huggingface Nemotron 3 Super is now a leading reasoning foundation for @OpenClaw 🦞 and complex agentic workflows, with 1.5M+ downloads in its first two weeks. 🤗 huggingface.co/nvidia/NVIDIA-…

English

192

87.3K

Jian Zhang@JianZhangCS·26 Mar

Golden time for open model coalitions. We are in a mission to open all our data, infra and research to the ecosystem! @NVIDIAAI @nvidia Come and join us for the Nemotron open model Journey! Checkout our recent research on PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost: arxiv.org/abs/2603.21383

NVIDIA@nvidia

Different voices. Same answer: open models. NVIDIA Founder and CEO Jensen Huang sat down with the leaders from @mistralai, @bfl_ml, @cursor_ai, @LangChain, @perplexity_ai, @reflection_ai, @thinkymachines, @allen_ai, @evidenceopen, and AMP PBC to discuss the rapid rise of open frontier models. Get the top takeaways from the frontier of AI: nvda.ws/4uR6eEV

English

Jian Zhang retweetledi

Wei Ping@_weiping·20 Mar

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English

142

897

160.6K

Jian Zhang retweetledi

Together AI@togethercompute·11 Mar

🚀 NVIDIA Nemotron 3 Super is now available on Together AI. A 120B hybrid MoE model with 12B active parameters, delivers leaing efficiency and accuracy for multi-agent AI systems. Run Nemotron 3 Super on Together’s Dedicated inference with reliable infrastructure and 99.9% SLAacross coding, reasoning, and agentic pipelines production workloads.

English

142

9.9K

Jian Zhang@JianZhangCS·12 Mar

@NVIDIAAI @ArtificialAnlys

QAM

Jian Zhang@JianZhangCS·11 Mar

Nemotron 3 Super is live! So far the most intelligent agentic reasoning model in the Nemotron family, with world leading efficiency and openness. Super particularly marks our first infra & research milestone in agentic reinforcement learning scaling up. Stay tuned for more infra, data and agentic generalization research we will open to the ecosystem. 🤗 Huggingface: lnkd.in/gWfamwwX 📜 Tech Report: lnkd.in/gRFFJxKm 🤸‍♂️NeMo-Gym (RL env data and orchestration): github.com/NVIDIA-NeMo/Gym 🤸NeMo-RL (RL training): github.com/NVIDIA-NeMo/RL

English

1.5K

Jian Zhang retweetledi

Bryan Catanzaro@ctnzr·11 Mar

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nemotron/… And yes, Ultra is coming!

English

205

1.2K

206.3K

Jian Zhang@JianZhangCS·11 Mar

🔍 Super is inside Perplexity for Pro & Max users now! Lets surf the web.

English

Jian Zhang@JianZhangCS·11 Mar

🦞 Super is the #1 open model in PinchBench for OpenClaw. pinchbench.com Watch out and it could pinch you.

English

Jian Zhang retweetledi

Oleksii Kuchaiev@kuchaev·11 Mar

Nemotron 3 Super is here — 120B total / 12B active, Hybrid SSM Latent MoE, designed for Blackwell. Truly open: permissive license, open data, open training infra. See analysis on @ArtificialAnlys Details in thread 🧵below:

English

279

29.7K

Jian Zhang retweetledi

Artificial Analysis@ArtificialAnlys·11 Mar

NVIDIA has released Nemotron 3 Super, a 120B (12B active) open weights reasoning model that scores 36 on the Artificial Analysis Intelligence Index with a hybrid Mamba-Transformer MoE architecture We were given access to this model ahead of launch and evaluated it across intelligence, openness, and inference efficiency. Key takeaways ➤ Combines high openness with strong intelligence: Nemotron 3 Super performs strongly for its size and is substantially more intelligent than any other model with comparable openness ➤ Nemotron 3 Super scored 36 on the Artificial Analysis Intelligence Index, +17 points ahead of the previous Super release and +12 points from Nemotron 3 Nano. Compared to models in a similar size category, this places it ahead of gpt-oss-120b (33), but behind the recently-released Qwen3.5 122B A10B (42). ➤ Focused on efficient intelligence: we found Nemotron 3 Super to have higher intelligence than gpt-oss-120b while enabling ~10% higher throughput per GPU in a simple but realistic load test ➤ Supported today for fast serverless inference: providers including @DeepInfra and @LightningAI are serving this model at launch with speeds of up to 484 tokens per second Model details 📝 Nemotron 3 Super has 120.6B total and 12.7B active parameters, along with a 1 million token context window and hybrid reasoning support. It is published with open weights and a permissive license, alongside open training data and methodology disclosure 📐 The model has several design features enabling efficient inference, including using hybrid Mamba-Transformer and LatentMoE architectures, multi-token prediction, and NVFP4 quantized weights 🎯 NVIDIA pre-trained Nemotron 3 Super in (mostly) NVFP4 precision, but moved to BF16 for post-training. Our evaluation scores use the BF16 weights 🧠 We benchmarked Nemotron 3 Super in its highest-effort reasoning mode ("regular"), the most capable of the model's three inference modes (reasoning-off, low-effort, and regular)

English

485

93.9K

Jian Zhang@JianZhangCS·20 Şub

there you go for GTC

Oleksii Kuchaiev@kuchaev

NVIDIA GTC 2026 takes place March 16–19 in the heart of Silicon Valley—San Jose, CA. It’s my favorite AI event of the year, bringing together cutting-edge research and real-world business implementation (THE HARDWARE DEMOS on the show floor is a must see!!!). I’ll be speaking about Nemotron post-training on March 17, and there will be many more exciting talks, sessions, and demos throughout the week. Virtual attendance is free and you can get 25% off for in-person pass, using my employee code: nvidia.com/gtc/?ncid=GTC-… to save 25% on your conference pass.

English

157

Jian Zhang retweetledi

Oleksii Kuchaiev@kuchaev·20 Şub

English

1.7K

Jian Zhang@JianZhangCS·16 Ara

🚨Nemotron 3 Nano on LM Arena! Come and give it a try!

Arena.ai@arena

🚨Text Leaderboard Update @NvidiaAI has begun rolling out the open Nemotron 3 model family, starting with Nemotron 3 Nano (30B-A3B): a new 30B hybrid reasoning model, with a 1M context window. It currently ranks #120 on the Text leaderboard with a score of 1328, and #47 among open models. Among open models, Nemotron 3 performs best in Math and Coding categories, with strong results across IT, Science, Business, and Mathematics on the Occupational leaderboard. Read more about Nemotron 3 Nano’s performance with real-world use in thread 🧵

English

1.2K

Jian Zhang@JianZhangCS·15 Ara

Thank you @percyliang for the insights and support. Excited to release all the open artifacts to the whole ecosystem and community. Looking forward to the next Marin as well.

Percy Liang@percyliang

This is not just another strong open model. Nemotron actually releases training data (!), RL environments, and training code. This is a big difference: almost all model developers just want people to use their models; NVIDIA is enabling people to make their own models. We are excited to incorporate these assets into the next Marin models! Congrats to the @nvidia team!

English

2.3K

Jian Zhang retweetledi

Artificial Analysis@ArtificialAnlys·15 Ara

NVIDIA has just released Nemotron 3 Nano, a ~30B MoE model that scores 52 on the Artificial Analysis Intelligence Index with just ~3B active parameters Hybrid Mamba-Transformer architecture: Nemotron 3 Nano combines the hybrid Mamba-Transformer approach @NVIDIAAI has used on previous Nemotron models with a moderate-sparsity MoE architecture, enabling highly efficient inference, particularly at longer sequence lengths Small-model improvements: with 31.6B total and 3.6B active parameters, Nemotron 3 Nano scores 52 on our Intelligence Index, in line with OpenAI’s gpt-oss-20b (high). This represents a +6 point lead on the similarly-sized Qwen3 30B A3B 2507 and +15 improvement on NVIDIA’s previous Nemotron Nano 9B V2 (a dense model) High openness: Nemotron 3 Nano follows other recent NVIDIA models in open licensing and releases of data and methodology for the community to use and replicate - it scores an 67 on the Artificial Analysis Openness Index, in line with previous Nemotron Nano models Key model details: ➤ 1 million token context window, with text only support ➤ Supports reasoning and non-reasoning modes ➤ Released under the NVIDIA Open Model License; the model is freely available for commercial use or training of derivative models ➤ On launch, the model is being made available with a range of serverless inference providers including @baseten, @DeepInfra, @FireworksAI_HQ, @togethercompute and @friendliai, and it is available now on Hugging Face for local inference or self-deployment See below for our full analysis and key announcement links from NVIDIA 👇

English

286

110.4K

Jian Zhang retweetledi

NVIDIA Newsroom@nvidianewsroom·15 Ara

NEWS: NVIDIA announces the NVIDIA Nemotron 3 family of open models, data, and libraries, offering a transparent and efficient foundation for building specialized agentic AI across industries. Nemotron 3 features a hybrid mixture-of-experts (MoE) architecture and new open Nemotron pretraining and post-training datasets, paired with NeMo Gym, an open-source reinforcement learning library that enables scalable, verifiable agent training. Read more: nvda.ws/4oNUTBm

English

205

1.2K

203.7K

Keşfet

@huggingface @openclaw @NVIDIAAI @nvidia @ArtificialAnlys @DeepInfra @LightningAI @percyliang