Mostofa Patwary

12 posts

Mostofa Patwary

Mostofa Patwary

@MostofaPatwary

Applied Deep Learning Research at NVIDIA

Santa Clara, CA Katılım Mayıs 2018
33 Takip Edilen111 Takipçiler
Mostofa Patwary
Mostofa Patwary@MostofaPatwary·
Nemotron-H model family with 4X throughput gains without compromising accuracy! Blog: developer.nvidia.com/blog/nemotron-… Weights: huggingface.co/collections/nv…
NVIDIA AI Developer@NVIDIAAIDev

👀 Nemotron-H tackles large-scale reasoning while maintaining speed -- with 4x the throughput of comparable transformer models.⚡ See how #NVIDIAResearch accomplished this using a hybrid Mamba-Transformer architecture, and model fine-tuning ➡️ nvda.ws/43PMrJm

English
0
1
3
196
Mostofa Patwary retweetledi
Bryan Catanzaro
Bryan Catanzaro@ctnzr·
I'm glad to see Nemotron-4-340B doing well on the LMSYS Arena leaderboard! Lots more work to do to improve the Nemotron family. Most importantly, though, I hope this model supports the development of AI across the community.
Arena.ai@arena

Chatbot Arena update! @NVIDIAAI's Nemotron-4-340B has just edged past Llama-3-70B to become the new best open model on Arena leaderboard! Key highlights: - Impressive performance in longer queries - Balanced multilingual capabilities - Robust performance in "Hard Prompts" Congrats @NVIDIAAI for this remarkable milestrone & contribution to the open community! Check out more plots below👇

English
1
17
108
13K
Mostofa Patwary retweetledi
Vincent Liu
Vincent Liu@vincentjliu·
NVIDIA joins the open-source LLM race offering a 15B-parameter model that demonstrates strong capabilities in English, multi-lingual, and coding tasks arxiv.org/abs/2402.16819
English
0
3
9
520
Mostofa Patwary retweetledi
AK
AK@_akhaliq·
Nvidia announces Nemotron-4 15B introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.
AK tweet media
English
20
134
746
151.8K
Greg Diamos
Greg Diamos@GregoryDiamos·
Congrats @MostofaPatwary and team on your new Megatron language model with 530B parameters. I once wondered what a language model trained using all of the energy from the sun would be like. You plenty of orders of magnitude to keep going.
English
1
0
5
0
Mostofa Patwary retweetledi
Mohammad Shoeybi
Mohammad Shoeybi@MohammadShoeybi·
Excited to share that our work on scaling up language model training with Megatron will appear at SuperComputing 2021! We achieve 502 petaFLOP/s on 3072 GPUs (per-GPU throughput of 52% of theoretical peak) on a model with 1 trillion parameters. Paper: arxiv.org/abs/2104.04473
English
2
19
70
0
Mostofa Patwary retweetledi
raulpuri.eth
raulpuri.eth@TheRealRPuri·
Excited to share our #acl2020 work on Large Scale Multi-Actor Generative Dialog Modeling, done with amazing coauthors Alex Boyd, @MostofaPatwary, Mohammad Shoeybi, @ctnzr! Join our QA session 5-6 UTC tonight and 21-22pm UTC tomorrow. arxiv.org/abs/2005.06114
English
1
6
23
0
Mostofa Patwary retweetledi
Baidu Research
Baidu Research@BaiduResearch·
Experimental Evaluation of Mixed Precision Training for End-to-End Language Modeling. On NVIDIA Tesla V100 GPU, we achieved a 4.1X performance improvement for the whole application with minimal changes. #MachineLearning #AI bit.ly/2Ixo9cw
English
0
16
32
0