Saurav Muralidharan

252 posts

Saurav Muralidharan

Saurav Muralidharan

@srv_m

Research Scientist @NVIDIA | Making LLMs More Efficient

Katılım Mart 2008
247 Takip Edilen193 Takipçiler
Saurav Muralidharan
Saurav Muralidharan@srv_m·
Today we are releasing the first model in the NVIDIA Nemotron 3 family: Nemotron 3 Nano! Nemotron 3 Nano is truly open, efficient, and achieves class-leading accuracy on reasoning and agentic tasks. Check it out today! 🚀 research.nvidia.com/labs/nemotron/…
English
0
0
4
51
Saurav Muralidharan retweetledi
@·
Sharing our team’s latest work on Hymba - an efficient small language model with hybrid architecture. Tech report: arxiv.org/abs/2411.13676 Discover the tradeoff between Mamba and Attention, how they can be combined, how attention sink and forced-to-attend phenomena can be
 tweet media
English
10
90
494
97.6K
Saurav Muralidharan retweetledi
@·
🚀 @NeurIPSConf Spotlight! 🥳 Imagine fine-tuning an LLM with just a sparsity mask! In our latest work, we freeze the LLM and use 2:4 structured sparsity to learn binary masks for each linear layer. Thanks to NVIDIA Ampere’s 2:4 sparsity, we can achieve up to 2x compute
GIF
English
2
34
154
13.8K
Saurav Muralidharan retweetledi
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
👀 Experience high-efficiency NVIDIA Llama-3.1-Nemotron-51B - a NAS-optimized model achieving 2x throughput while preserving accuracy runs on a single H100 GPU. ✨Try out the Llama-3.1-Nemotron-51B NIM through the API from ai.nvidia.com or download from @huggingface.
NVIDIA AI Developer tweet media
English
13
33
148
13.3K
Saurav Muralidharan retweetledi
@·
Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2). Remarkably, NVLM 1.0 shows improved text-only
 tweet media
English
12
119
471
168.6K
Saurav Muralidharan retweetledi
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
Today we released Mistral-NeMo-Minitron 8B, a pruned and distilled version of the open @MistralAI NeMo 12B model, achieving high accuracy across nine popular benchmarks for chatbots, virtual assistants, content generation, coding, and educational tools. ➡️
NVIDIA AI Developer tweet media
English
12
71
220
45.1K
Saurav Muralidharan retweetledi
@·
🌟 The best 8B Base model via pruning and distillation! 🚀 Introducing Mistral-NeMo-Minitron-8B-Base model we derived from the recent Mistral-NeMo-12B. Our recipe: finetune teacher on 100B tokens, prune to 8B params, run teacher-student distillation on <400B tokens. Result: the
 tweet media
English
4
50
149
20.7K
Mervin Praison
Mervin Praison@MervinPraison·
NVIDIA Llama 3.1 Minitron 4B: Created from Llama 3.1 8B. Here is how 🚀 40x Fewer Tokens 💰 1.8x Cost Savings 📈 16% Performance Boost 🧠 4 Billion Parameters ⚖️ On Par with 8B Models 🔄 Pruning & Distillation ⚡ Efficient AI Model Creation 🛠️ Less Training Data Needed nvda.ws/3WM4OeR @NVIDIAAI @nvidia @AIatMeta @PavloMolchanov @Ahmad_Al_Dahle @darrinpjohnson @NVIDIAAIDev Sub: @MervinPraison" target="_blank" rel="nofollow noopener">youtube.com/@MervinPraison
English
1
24
191
10.6K
Saurav Muralidharan retweetledi
NVIDIA AI Developer
NVIDIA AI Developer@NVIDIAAIDev·
See how our #NVIDIAResearch team has developed a method to efficiently create smaller, accurate language models by using structured weight pruning and knowledge distillation - offering several advantages for developers: ✅ 16% better performance on MMLU scores ✅ 40x fewer
NVIDIA AI Developer tweet media
English
4
26
110
10.5K
Saurav Muralidharan
Saurav Muralidharan@srv_m·
@cataluna84 Hi, we do have plans to release the code, but the timeline is a bit unclear due to the legal approvals we need to obtain. In the next few weeks, hopefully!
English
0
0
1
33
Mayank Bhaskar
Mayank Bhaskar@cataluna84·
@srv_m Do you plan to release the full pruning & distillation code along with evaluation & benchmarks, so that we can try this method on new models?
English
1
0
0
30
Saurav Muralidharan
Saurav Muralidharan@srv_m·
🤖 Excited to announce Minitron, a new family of language models obtained through a combination of weight pruning and knowledge distillation! Our models are available on HF with a permissive license. Give them a try today!
@

🚀 40x Faster Model Training via Pruning and Distillation! Permissive Minitron-4B and Minitron-8B models! 🔗 Paper: arxiv.org/abs/2407.14679 🔗 GitHub: github.com/NVlabs/Minitron 🔗 Models on HF: bit.ly/4ffjnQj Key highlights of 4B/8B models: 📊 2.6B/6.2B active

English
1
1
6
340
Saurav Muralidharan retweetledi
@·
🚀 Introducing Flextron - a Many-in-One LLM - Oral at ICML! Train one model and get many optimal models for each GPU at inference without any additional retraining. 🌟 🔗 Paper: arxiv.org/abs/2406.10260 Main benefits with only 5% post-training finetuning: ✅ Best model for
GIF
English
5
61
194
30.4K
Saurav Muralidharan
Saurav Muralidharan@srv_m·
Check out our latest work, Flextron, an elastic LLM that supports zero-shot flexible deployment at a variety of model scales and sizes. Flextron models achieve SoTA performance and are also input-adaptive (heterogeneous MoE).
@

Tired of training varying-size LLMs to fit various GPU memory and latency requirements? Check out Flextron! Our new ICML (Oral) paper shows how to train one model deployable across GPU series. Learn more: cairuisi.github.io/Flextron/🚀

English
1
0
3
210
Saurav Muralidharan retweetledi
@·
We are now having full conversations with Figure 01, thanks to our partnership with OpenAI. Our robot can: - describe its visual experience - plan future actions - reflect on its memory - explain its reasoning verbally Technical deep-dive 🧵:
English
141
644
2.7K
669.5K