Mostofa Patwary

13 posts

Mostofa Patwary

Mostofa Patwary

@mapatwary

Katılım Ağustos 2009
16 Takip Edilen63 Takipçiler
Mostofa Patwary retweetledi
Bryan Catanzaro
Bryan Catanzaro@ctnzr·
Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nemotron/… And yes, Ultra is coming!
Bryan Catanzaro tweet media
English
62
205
1.2K
207.4K
Mostofa Patwary
Mostofa Patwary@mapatwary·
Nemotron 3 Nano 30B-A3B is live! Open, efficient MoE, and topping benchmarks! Supports up to 1M token context. We open weights, data, and even the training recipes. Blog post, technical reports, links to data and models: nvda.ws/48RusVt
English
0
3
12
366
Mostofa Patwary retweetledi
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
Curating high-quality pretraining datasets is crucial for accurate #LLMs. 💬 With our Nemotron-CC pipeline, now in the NeMo Curator GitHub repo, you can process text, image, and video data at scale. Get an overview of the pipeline and how you can use it to generate high-quality tokens for training or fine-tuning LLMs. ➡️ nvda.ws/44r5POU
NVIDIA AI Developer tweet media
English
3
16
55
25.8K
Mostofa Patwary retweetledi
Shrimai
Shrimai@shrimai_·
Announcing Nemotron-CrossThink: pushes RL for LLMs beyond math into general purpose reasoning. We curate diverse, verifiable QA from web crawl + open sets & apply structured templates. 📉 28% token efficiency gain per correct answer. 📂 Dataset now released!
Shrimai tweet media
English
1
8
29
2.2K
Mostofa Patwary retweetledi
Bryan Catanzaro
Bryan Catanzaro@ctnzr·
Nemotron-H: A family of Hybrid Mamba-Transformer LLMs. * Hybrid architecture means up to 3X faster at the same accuracy * Trained in FP8 * Great for VLMs * Weights and instruct versions to come soon. research.nvidia.com/labs/adlr/nemo…
Bryan Catanzaro tweet media
English
18
101
630
201.1K
Mostofa Patwary retweetledi
Bryan Catanzaro
Bryan Catanzaro@ctnzr·
I'm glad to see Nemotron-4-340B doing well on the LMSYS Arena leaderboard! Lots more work to do to improve the Nemotron family. Most importantly, though, I hope this model supports the development of AI across the community.
Arena.ai@arena

Chatbot Arena update! @NVIDIAAI's Nemotron-4-340B has just edged past Llama-3-70B to become the new best open model on Arena leaderboard! Key highlights: - Impressive performance in longer queries - Balanced multilingual capabilities - Robust performance in "Hard Prompts" Congrats @NVIDIAAI for this remarkable milestrone & contribution to the open community! Check out more plots below👇

English
1
17
108
13K
Mostofa Patwary retweetledi
Jupinder Parmar
Jupinder Parmar@jupi_parmar·
Excited to share our recent work @nvidia building Nemotron-4 15B! Trained on 8T tokens, it achieves strong performance on a variety of downstream evaluation areas with especially impressive multilingual performance. Paper: arxiv.org/abs/2402.16819
Jupinder Parmar tweet media
English
0
6
26
1.6K
Mostofa Patwary retweetledi
Shrimai
Shrimai@shrimai_·
🚀Introducing Nemotron-4 15B by @nvidia! 🎉 With 15B parameters and trained on 8T tokens, it's impressive in multilingual AI. Outperforms all similarly-sized models and dominates in multilingual tasks, even surpassing models 4x larger! #NVIDIA #Nemotron4 arxiv.org/pdf/2402.16819…
Shrimai tweet media
English
3
30
128
15.6K
Mostofa Patwary retweetledi
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
NVIDIA presents Nemotron-4 15B - Multilingual 15B LLM on 8T tokens - Outperforms or achieves competitive performance on all existing similarly-sized open models - Outperforming models over 4x larger on multilingual tasks arxiv.org/abs/2402.16819
Aran Komatsuzaki tweet media
English
3
74
292
25.8K