

Mostofa Patwary
13 posts





We just released Nemotron-CC-Math 🚀 Equations on web aren’t just LaTeX-they’re in MathML,<pre> tags,inline,even images.Code shows up just as many ways. Most parsers drop it. Nemotron-CC-Math(133B tokens) reprocesses CommonCrawl math pages to capture math equations +code reliably

Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the models, datasets, and tech report are here: research.nvidia.com/labs/adlr/NVID…






We are excited to release Nemotron-CC, our high quality Common Crawl based 6.3 trillion tokens dataset for LLM pretraining (4.4T globally deduplicated original tokens and 1.9T synthetically generated tokens). Compared to the leading open DCLM dataset, Nemotron-CC enables to either create a 4x larger dataset of similar quality or increase the MMLU by more than 5 points using a high quality subset of the tokens. Blog post: research.nvidia.com/labs/adlr/Nemo… Paper: arxiv.org/abs/2412.02595 Dataset: data.commoncrawl.org/contrib/Nemotr… We thank the Common Crawl Foundation for hosting the dataset. (with @SudanSudanDan *, Ying Lin*, @KezhiKong*, Joseph Jennings, @BrandonNor90881, @MostofaPatwary, @MohammadShoeybi, @ctnzr)

Chatbot Arena update! @NVIDIAAI's Nemotron-4-340B has just edged past Llama-3-70B to become the new best open model on Arena leaderboard! Key highlights: - Impressive performance in longer queries - Balanced multilingual capabilities - Robust performance in "Hard Prompts" Congrats @NVIDIAAI for this remarkable milestrone & contribution to the open community! Check out more plots below👇





