Yoshi Suhara

185 posts

Yoshi Suhara

@suhara

Building Small Language Models @nvidia

Santa Clara, CA Katılım Haziran 2007

299 Takip Edilen357 Takipçiler

Yoshi Suhara@suhara·18 Mar

The model was compressed from Nano 9B v2 using Nemotron Elastic (arxiv.org/abs/2511.16664) and post-trained on Nemotron 3 post-training data using a new recipe designed to improve accuracy in reasoning-off mode. Please see the blog and the Nemotron Elastic paper for details.

English

104

Yoshi Suhara@suhara·18 Mar

Quantized variants are also available! FP8 🤗 huggingface.co/nvidia/NVIDIA-… GGUF (Q4_K_M) 🤗huggingface.co/nvidia/NVIDIA-…

English

142

Yoshi Suhara@suhara·18 Mar

Nemotron 3 Nano 4B is here! A compact hybrid Mamba-Transformer model optimized for edge AI, with top-tier instruction following, strong tool use, and low-VRAM efficiency across NVIDIA Jetson, Spark, and RTX platforms. HF 🤗 huggingface.co/nvidia/NVIDIA-… Blog📝 huggingface.co/blog/nvidia/ne…

English

2.5K

Yoshi Suhara retweetledi

NVIDIA AI Developer@NVIDIAAIDev·11 Mar

Introducing NVIDIA Nemotron 3 Super 🎉 Open 120B-parameter (12B active) hybrid Mamba-Transformer MoE model Native 1M-token context Built for compute-efficient, high-accuracy multi-agent applications Plus, fully open weights, datasets and recipes for easy customization and deployment. 🧵

English

106

819

134.7K

Yoshi Suhara retweetledi

NVIDIA Japan@NVIDIAJapan·18 Şub

【新モデル公開🚀Nemotron-Nano-9B-v2-Japanese】本日、NVIDIA は Nejumi Leaderboard 4 のパラメータ数 10B 以下において、最先端の性能 (SOTA) を達成した NVIDIA Nemotron-Nano-9B-v2-Japanese を公開しました。商用利用可能です。 huggingface.co/blog/nvidia/ne…

日本語

330

1.3K

304.9K

Yoshi Suhara retweetledi

Pavlo Molchanov@PavloMolchanov·28 Oca

🚀 New NVIDIA report: NVFP4 + Quantization-Aware Distillation (QAD) FP4 inference without quality collapse. Key idea: distill a BF16 teacher into an NVFP4 student using KL loss - much more robust than PTQ/QAT, especially after SFT/RL. 🔥 Near-BF16 accuracy ⚡ ~2-3× throughput, ~1.8× memory savings vs FP8 🧠 Works for LLMs and VLMs (Nemotron Nano, Super, VL) Technical report: huggingface.co/nvidia/NVIDIA-… Research blog: research.nvidia.com/labs/nemotron/… Hugging Face models: research.nvidia.com/labs/nemotron/…

We just launched an ultra-efficient NVFP4 precision version of Nemotron 3 Nano that delivers up to 4x higher throughput on Blackwell B200. Using our new Quantization Aware Distillation method, the NVFP4 version achieves up to 99.4% accuracy of BF16. Nemotron 3 Nano NVFP4: nvda.ws/4t63z9y Tech Report: nvda.ws/4bj3pp0

English

114

15.4K

Yoshi Suhara retweetledi

Jian Zhang@JianZhangCS·15 Ara

🚀@Nvidia Nemotron 3 Nano is live! Nemotron 3 Nano is the world's most efficient open MoE with an Hybrid-MoE architecture and 1M context length. 🔥 Strong in reasoning, agentic and chat tasks with leading accuracy among AA index, Tau2, SWE Bench. 🔥 Up to 3.3X higher throughput comparing to other open MoE at similar sizes 🔥 A fully open recipe with data, infra released to the community Checkout the new model architecture and reinforcement learning technologies we used below: 😊 Huggingface: huggingface.co/collections/nv… 📢 Research blog: nvda.ws/48RusVt 🛣️Nemo RL & Nemo Gym (RL environment orchestration): github.com/NVIDIA-NeMo/RL & github.com/NVIDIA-NeMo/Gym Kudos to the teams for months of hard work! We are excited to keep building the Nemotron 3 model family and empower the community.

English

247

25.3K

Yoshi Suhara retweetledi

Georgi Gerganov@ggerganov·15 Ara

In collaboration with NVIDIA, the new Nemotron 3 Nano model is fully supported in llama.cpp Nemotron 3 Nano features an efficient hybrid, Mamba, MoE architecture. It's a promising model, suitable for local AI applications on mid-range hardware. The large context window makes it a great choice for a variety of use cases and applications. The efficiency of llama.cpp and the unique context management features of the `llama-server` tool allows us to deploy and use this model on a wide-range of hardware. With recent code contributions by engineering teams at NVIDIA and open-source collaborators, we can run this model very efficiently across the entire spectrum of NVIDIA GPUs. Learn more at @NVIDIA_AI_PC developer.nvidia.com/blog/inside-nv…

English

405

27.1K

Yoshi Suhara retweetledi

Bryan Catanzaro@ctnzr·15 Ara

Today, @NVIDIA is launching the open Nemotron 3 model family, starting with Nano (30B-3A), which pushes the frontier of accuracy and inference efficiency with a novel hybrid SSM Mixture of Experts architecture. Super and Ultra are coming in the next few months.

English

222

1.2K

504.7K

Yoshi Suhara@suhara·24 Eyl

I’m proud to be part of this amazing cross-team, cross-country collaboration supporting Sovereign AI for Japan 🇯🇵! HF 🤗: huggingface.co/datasets/nvidi… Blog 📝: huggingface.co/blog/nvidia/ne…

NVIDIA Japan@NVIDIAJapan

🚀 新データセット公開: Nemotron-Personas-Japan 日本の人口構成や文化に合わせて作られた、初のオープンな日本語合成データセットを公開しました。商用利用も可能です。 nvda.ws/47Y6A3O

English

1.5K

Yoshi Suhara retweetledi

Shizhe Diao@shizhediao·18 Eyl

Thrilled to share that CLIMB has been accepted to the NeurIPS DB track! 🍀 Feeling so lucky to work with such an amazing team. #NeurIPS2025

Thrilled to share my first project at NVIDIA! ✨ Today’s language models are pre-trained on vast and chaotic Internet texts, but these texts are unstructured and poorly understood. We propose CLIMB — Clustering-based Iterative Data Mixture Bootstrapping — a fully automated framework that reorganizes pre-training data into clusters and iteratively search the best mixture. CLIMB does three things: ➤ Embeds and clusters web-scale data semantically. ➤ Searches, iteratively and efficiently, for optimal data mixtures using a lightweight proxy model + predictor loop. ➤ Learns how different domains interact, and how the right mix can unlock downstream performance we didn’t know was possible. On paper, the gains are real: ➤ Our 1B model, trained on CLIMB mixtures with 400B tokens, outperforms LLaMA 3.2-1B. ➤ In some specific domains e.g., Social Sciences, we see up to +5% improvements. ➤ We open-sourced ClimbLab (1.2T tokens across 20 domains) and ClimbMix (400B tokens, outperforming existing baselines under the same budget). The real win isn’t just numbers, it’s the idea that we can bootstrap searching 🔎 . This improves the data efficiency a lot. We hope CLIMB can be a small step toward more transparent, structured, and efficient pertaining. One where we curate not by filtering noise, but by discovering signal. We’d love to hear from others exploring the frontiers of data-centric AI. Let’s CLIMB together! 🔗 Read our paper: arxiv.org/abs/2504.13161 📂 Datasets available on Hugging Face: huggingface.co/collections/nv… 🌐 Project page: research.nvidia.com/labs/lpr/climb (check cluster visualizations) 🗨️ Discussion: huggingface.co/papers/2504.13…

English

Yoshi Suhara retweetledi

Artificial Analysis@ArtificialAnlys·27 Ağu

NVIDIA has released Nemotron Nano 9B V2, a small 9B reasoning model that scores 43 on the Artificial Analysis Intelligence Index, the highest yet for <10B models Nemotron 9B V2 is the first Nemotron model pre-trained by @NVIDIA. Previous Nemotron models have been developed by post-training on Meta Llama models. Architecture & Training: The model uses a hybrid Mamba-Transformer architecture. NVIDIA pre-trained a 12B parameter base model and applied post-training with a range of techniques including RLHF and GRPO. The final 9B size was pruned from this model and re-trained with the base model as a teacher. Small-model frontier: with only 9B parameters, Nemotron Nano 9B V2 is placed ahead of Llama 4 Maverick on our leaderboard, equal to Solar Pro 2 with reasoning and trails just behind gpt-oss-20B (high). Along with this model, NVIDIA released a 6.6-trillion token subset of their pre-training data for public use on @huggingface Key model details: ➤ 128k token context window ➤ Supports reasoning and non-reasoning modes (with ‘/no_think’ settings in the system prompt) ➤ Released under the NVIDIA Open Model License, and not additionally covered by Meta’s Llama license like prior Nemotron models - this means that there is no limitation on use by large companies or requirement to keep ‘Nemotron’ in the name of derivative models ➤ No serverless inference providers are yet serving the model, but it is available now on Hugging Face for local inference or self-deployment See below for our full analysis and key announcement links from NVIDIA 👇

English

528

69.4K

Yoshi Suhara@suhara·24 Ağu

RT @shizhediao: ✨ Alongside NVIDIA-Nemotron-Nano-v2-9B, we’re also open-sourcing its pre-training dataset. At NVIDIA, we remain committed…

English

Yoshi Suhara retweetledi

NVIDIA AI Developer@NVIDIAAIDev·19 Ağu

We're excited to share leaderboard-topping 🏆 NVIDIA Nemotron Nano 2, a groundbreaking 9B parameter open, multilingual reasoning model that's redefining efficiency in AI and earned the leading spot on the @ArtificialAnlys Intelligence Index leaderboard among open models within the same parameter range. It's built on a unique hybrid Transformer-Mamba architecture, a combination that delivers the same accuracy you expect, but with higher throughput. This enables it to achieve high performance/cost, making it perfect for real-world applications like customer service agents and chatbots. 🏗️ Hybrid Architecture: By combining the strengths of Transformer and Mamba architectures, achieves up to 6X faster throughput compared to other 8B open models and highest reasoning accuracy. 🏦 Thinking Budget: Reduces unnecessary token generation to cut costs by up to 60%, making it an ideal solution for balancing performance and total cost of ownership (TCO). 🔢 Open Datasets: The training datasets of this model are fully open, giving maximum transparency in using the model for enterprise applications. 🤗 Technical details on @HuggingFace ➡️ nvda.ws/3JfcKST 🏆 Leaderboard ➡️ nvda.ws/47B7iUh

English

141

8.1K

Yoshi Suhara retweetledi

Oleksii Kuchaiev@kuchaev·18 Ağu

We are excited to release Nvidia-Nemotron-Nano-V2 model! This is a 9B hybrid SSM model with open base model and training data. This model also supports runtime "thinking" budget control. HF collection with base and post trained models: huggingface.co/collections/nv…

English

297

65.4K

Yoshi Suhara retweetledi

Pavlo Molchanov@PavloMolchanov·19 Ağu

📢New efficient Hybrid-SLM from NVIDIA-Nemotron-Nano-v2-9B: ❗️6x faster than Qwen3-8B because of Hybrid (Mamba2+Attention) design. We tried something new: pretrain & align a 12B reasoning model → compress to 9B. First real stab at reasoning-model compression. Key takeaways from compression: ▪️Target was 23GB GPUs + room for a 650M vision encoder → design via compression, not bespoke architecture. ▪️Distillation loss went down, but benchmarks didn’t - unlike Base compression. ▪️Reasoning compression needs light post-training alignment. ▪️Applied both Minitron + Puzzle. ▪️Dropped 2 attn layers to hit 128k context; KV cache dominated. ▪️Depth: 62 → 56 (fewer tanked accuracy). ▪️FFN: 20,480 → 15,680 (−23%). ▪️Hidden dimension: 5120 → 4480. ▪️Mamba heads → small gains <15%, mostly avoided. Distilled on 136B tokens, context grown 8k → 262k. It is required if we want to preserve long-context of 128k. 📰 Report: research.nvidia.com/labs/adlr/file… 🤗 HF: huggingface.co/collections/nv…

English

6.2K

Yoshi Suhara@suhara·7 Haz

A new video game benchmark for LLM agents, designed across various game titles! Happy to be part of this wonderful collaboration with @dongmin_park11 and the amazing team @Krafton_AI!

Dongmin Park@dongmin_park11

🚨New Paper Alert As a game company, @Krafton_AI is actively exploring how to apply LLM agents to video games. We present Orak—a foundational video gaming benchmark for LLM agents! Includes Pokémon, StarCraft II, Slay the Spire, Darkest Dungeon, Ace Attorney, and more in🧵

English

744

Yoshi Suhara@suhara·20 May

Llama Nemotron Nano 4B matches the previous Nano 8B accuracy with only half the size! 🤗 huggingface.co/nvidia/Llama-3…

NVIDIA AI Developer@NVIDIAAIDev

🤝 Meet NVIDIA Llama Nemotron Nano 4B, an open reasoning model that provides leading accuracy and compute efficiency across scientific tasks, coding, complex math, function calling, and instruction following for edge agents. ✨ Achieves higher accuracy and 50% higher throughput than other leading open models with 8 billion parameters or fewer 📗 Supports hybrid reasoning, optimized for low-cost inference 👨‍💻 Deploy at the edge with NVIDIA Jetson and NVIDIA RTX GPUs, maximizing security, flexibility, and cost Now on @huggingface 📥 huggingface.co/nvidia/Llama-3…

English

417

Yoshi Suhara retweetledi

Oleksii Kuchaiev@kuchaev·16 May

NeMo RL is now open source! It replaces NeMo-Aligner and is the toolkit we use to post train next generations of our models. Give it a try github.com/NVIDIA/NeMo-RL

English

394

25K

Yoshi Suhara retweetledi

Shaokun Zhang@ShaokunZhang1·13 May

Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: arxiv.org/pdf/2505.00024 💻 Code: github.com/NVlabs/Tool-N1 (Please consider giving us a ⭐️ to stay updated on the upcoming code release!) 🧠 Why this matters: Existing tool-call models rely heavily on supervised reasoning traces from stronger models—costly, brittle, and often imitative. We ask: Can LLMs learn to reason directly from tool success signals? 📦 What we did: – Train Qwen2.5-7B/14B with simple binary reward on tool-call correctness + reasoning format in R1-style – No reasoning traces needed – Evaluate on BFCL, API-Bank, and ACEBench – Also study the role of SFT, RL, and widely adopted SFT-then-RL recipes in training Tool-Calling models. 📈 Key findings: – Tool-N1-7B/14B obviously outperform GPT-4o and open baselines on all benchmarks – Widely adopted SFT+RL paradigm doesn’t necessarily lead to better performance than Pure RL. – Binary reward > fine-grained reward, esp. for real-world queries – Scaling works: bigger = better gains under our RL setup 🌟 Takeaway: Reasoning doesn’t have to be taught. With just a binary signal, LLMs can learn to reason and act. Tool-N1 sets a new direction for scalable, supervision-light tool calling model training

English

358

40.4K

Keşfet

@Nvidia @NVIDIA_AI_PC @nvidia @NVIDIA @huggingface @shizhediao @ArtificialAnlys @HuggingFace