Mostofa Patwary

13 posts

Mostofa Patwary

@mapatwary

Katılım Ağustos 2009

16 Takip Edilen63 Takipçiler

Mostofa Patwary retweetledi

Bryan Catanzaro@ctnzr·11 Mar

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nemotron/… And yes, Ultra is coming!

English

205

1.2K

207.4K

Mostofa Patwary@mapatwary·15 Ara

Nemotron 3 Nano 30B-A3B is live! Open, efficient MoE, and topping benchmarks! Supports up to 1M token context. We open weights, data, and even the training recipes. Blog post, technical reports, links to data and models: nvda.ws/48RusVt

English

366

Mostofa Patwary@mapatwary·27 Ağu

Enjoy the math pretraining dataset from our team used to train our recently released Nvidia-Nemotron-Nano-V2 model!

Rabeeh Karimi@KarimiRabeeh

We just released Nemotron-CC-Math 🚀 Equations on web aren’t just LaTeX-they’re in MathML,<pre> tags,inline,even images.Code shows up just as many ways. Most parsers drop it. Nemotron-CC-Math(133B tokens) reprocesses CommonCrawl math pages to capture math equations +code reliably

English

871

Mostofa Patwary@mapatwary·18 Ağu

Excited to share our team’s reasoning model with 6X speedup and improved accuracy—plus the release of the base models and pretraining datasets for everyone to explore!

Bryan Catanzaro@ctnzr

Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the models, datasets, and tech report are here: research.nvidia.com/labs/adlr/NVID…

English

137

Mostofa Patwary retweetledi

NVIDIA AI Developer@NVIDIAAIDev·11 May

Curating high-quality pretraining datasets is crucial for accurate #LLMs. 💬 With our Nemotron-CC pipeline, now in the NeMo Curator GitHub repo, you can process text, image, and video data at scale. Get an overview of the pipeline and how you can use it to generate high-quality tokens for training or fine-tuning LLMs. ➡️ nvda.ws/44r5POU

English

25.8K

Mostofa Patwary retweetledi

Shrimai@shrimai_·1 May

Announcing Nemotron-CrossThink: pushes RL for LLMs beyond math into general purpose reasoning. We curate diverse, verifiable QA from web crawl + open sets & apply structured templates. 📉 28% token efficiency gain per correct answer. 📂 Dataset now released!

English

2.2K

Mostofa Patwary@mapatwary·14 Nis

Nemotron-H base models (8B/47B/56B): A family of Hybrid Mamba-Transformer LLMs are now available on HuggingFace: huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… Technical Report: arxiv.org/abs/2504.03624 Blog: research.nvidia.com/labs/adlr/nemo…

English

5.2K

Mostofa Patwary retweetledi

Bryan Catanzaro@ctnzr·22 Mar

Nemotron-H: A family of Hybrid Mamba-Transformer LLMs. * Hybrid architecture means up to 3X faster at the same accuracy * Trained in FP8 * Great for VLMs * Weights and instruct versions to come soon. research.nvidia.com/labs/adlr/nemo…

English

101

630

201.1K

Mostofa Patwary@mapatwary·4 Ara

Excited to release Nemotron-CC, our Common Crawl based large scale dataset with 6.3T tokens. Enjoy building stronger models with this high quality dataset.

Markus Kliegl@MarkusKliegl

We are excited to release Nemotron-CC, our high quality Common Crawl based 6.3 trillion tokens dataset for LLM pretraining (4.4T globally deduplicated original tokens and 1.9T synthetically generated tokens). Compared to the leading open DCLM dataset, Nemotron-CC enables to either create a 4x larger dataset of similar quality or increase the MMLU by more than 5 points using a high quality subset of the tokens. Blog post: research.nvidia.com/labs/adlr/Nemo… Paper: arxiv.org/abs/2412.02595 Dataset: data.commoncrawl.org/contrib/Nemotr… We thank the Common Crawl Foundation for hosting the dataset. (with @SudanSudanDan *, Ying Lin*, @KezhiKong*, Joseph Jennings, @BrandonNor90881, @MostofaPatwary, @MohammadShoeybi, @ctnzr)

English

Mostofa Patwary retweetledi

Bryan Catanzaro@ctnzr·18 Haz

I'm glad to see Nemotron-4-340B doing well on the LMSYS Arena leaderboard! Lots more work to do to improve the Nemotron family. Most importantly, though, I hope this model supports the development of AI across the community.

Arena.ai@arena

Chatbot Arena update! @NVIDIAAI's Nemotron-4-340B has just edged past Llama-3-70B to become the new best open model on Arena leaderboard! Key highlights: - Impressive performance in longer queries - Balanced multilingual capabilities - Robust performance in "Hard Prompts" Congrats @NVIDIAAI for this remarkable milestrone & contribution to the open community! Check out more plots below👇

English

108

13K

Mostofa Patwary retweetledi

Jupinder Parmar@jupi_parmar·28 Şub

Excited to share our recent work @nvidia building Nemotron-4 15B! Trained on 8T tokens, it achieves strong performance on a variety of downstream evaluation areas with especially impressive multilingual performance. Paper: arxiv.org/abs/2402.16819

English

1.6K

Mostofa Patwary retweetledi

Shrimai@shrimai_·27 Şub

🚀Introducing Nemotron-4 15B by @nvidia! 🎉 With 15B parameters and trained on 8T tokens, it's impressive in multilingual AI. Outperforms all similarly-sized models and dominates in multilingual tasks, even surpassing models 4x larger! #NVIDIA #Nemotron4 arxiv.org/pdf/2402.16819…

English

128

15.6K

Mostofa Patwary retweetledi

Aran Komatsuzaki@arankomatsuzaki·27 Şub

NVIDIA presents Nemotron-4 15B - Multilingual 15B LLM on 8T tokens - Outperforms or achieves competitive performance on all existing similarly-sized open models - Outperforming models over 4x larger on multilingual tasks arxiv.org/abs/2402.16819

English

292

25.8K

Keşfet

@nvidia @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine