Saurav Muralidharan

252 posts

Saurav Muralidharan

@srv_m

Research Scientist @NVIDIA | Making LLMs More Efficient

Katılım Mart 2008

247 Takip Edilen193 Takipçiler

Saurav Muralidharan@srv_m·15 Ara

Today we are releasing the first model in the NVIDIA Nemotron 3 family: Nemotron 3 Nano! Nemotron 3 Nano is truly open, efficient, and achieves class-leading accuracy on reasoning and agentic tasks. Check it out today! 🚀 research.nvidia.com/labs/nemotron/…

English

Saurav Muralidharan retweetledi

@·22 Kas

Sharing our team’s latest work on Hymba - an efficient small language model with hybrid architecture. Tech report: arxiv.org/abs/2411.13676 Discover the tradeoff between Mamba and Attention, how they can be combined, how attention sink and forced-to-attend phenomena can be

English

494

97.6K

Saurav Muralidharan retweetledi

@·4 Kas

We are hiring researchers working in LLM and VLM efficiency! Applications are open for PhD students graduating in 2025; and senior researchers with PhD. Check requirements for the position. Apply here: nvidia.wd5.myworkdayjobs.com/NVIDIAExternal… Senior researchers: nvidia.wd5.myworkdayjobs.com/NVIDIAExternal…

English

7.4K

Saurav Muralidharan retweetledi

@·27 Eyl

🚀 @NeurIPSConf Spotlight! 🥳 Imagine fine-tuning an LLM with just a sparsity mask! In our latest work, we freeze the LLM and use 2:4 structured sparsity to learn binary masks for each linear layer. Thanks to NVIDIA Ampere’s 2:4 sparsity, we can achieve up to 2x compute

GIF

English

154

13.8K

Saurav Muralidharan retweetledi

NVIDIA AI Developer@NVIDIAAIDev·23 Eyl

👀 Experience high-efficiency NVIDIA Llama-3.1-Nemotron-51B - a NAS-optimized model achieving 2x throughput while preserving accuracy runs on a single H100 GPU. ✨Try out the Llama-3.1-Nemotron-51B NIM through the API from ai.nvidia.com or download from @huggingface.

English

148

13.3K

Saurav Muralidharan retweetledi

@·18 Eyl

Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2). Remarkably, NVLM 1.0 shows improved text-only

English

119

471

168.6K

Saurav Muralidharan retweetledi

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·22 Ağu

LLM Pruning and Distillation in Practice: The Minitron Approach abs: arxiv.org/abs/2408.11796 models: huggingface.co/nvidia/Mistral… huggingface.co/nvidia/Llama-3… huggingface.co/nvidia/Llama-3… Compressing Llama 3.1 8B and Mistral NeMo 12B to 4B and 8B, respectively, with teacher correction, weight pruning, and distillation (Minitron approach from NVIDIA).

Tanishq Mathew Abraham, Ph.D. tweet media

English

349

25.6K

Saurav Muralidharan retweetledi

NVIDIA AI Developer@NVIDIAAIDev·21 Ağu

Today we released Mistral-NeMo-Minitron 8B, a pruned and distilled version of the open @MistralAI NeMo 12B model, achieving high accuracy across nine popular benchmarks for chatbots, virtual assistants, content generation, coding, and educational tools. ➡️

English

220

45.1K

Saurav Muralidharan retweetledi

@·21 Ağu

🌟 The best 8B Base model via pruning and distillation! 🚀 Introducing Mistral-NeMo-Minitron-8B-Base model we derived from the recent Mistral-NeMo-12B. Our recipe: finetune teacher on 100B tokens, prune to 8B params, run teacher-student distillation on <400B tokens. Result: the

English

149

20.7K

Saurav Muralidharan@srv_m·17 Ağu

@MervinPraison Thank you for the video! Your explanation is very clear and easy to follow.

English

705

Mervin Praison@MervinPraison·17 Ağu

NVIDIA Llama 3.1 Minitron 4B: Created from Llama 3.1 8B. Here is how 🚀 40x Fewer Tokens 💰 1.8x Cost Savings 📈 16% Performance Boost 🧠 4 Billion Parameters ⚖️ On Par with 8B Models 🔄 Pruning & Distillation ⚡ Efficient AI Model Creation 🛠️ Less Training Data Needed nvda.ws/3WM4OeR @NVIDIAAI @nvidia @AIatMeta @PavloMolchanov @Ahmad_Al_Dahle @darrinpjohnson @NVIDIAAIDev Sub: @MervinPraison" target="_blank" rel="nofollow noopener">youtube.com/@MervinPraison

English

191

10.6K

Saurav Muralidharan@srv_m·17 Ağu

RT @PavloMolchanov: 🚨 Llama-3.1-Minitron-4B-Width-Base is now live on HF! 🔗 huggingface.co/nvidia/Llama-3… ‼️ Important: Use a specific commit as…

English

Saurav Muralidharan retweetledi

NVIDIA AI Developer@NVIDIAAIDev·14 Ağu

See how our #NVIDIAResearch team has developed a method to efficiently create smaller, accurate language models by using structured weight pruning and knowledge distillation - offering several advantages for developers: ✅ 16% better performance on MMLU scores ✅ 40x fewer

English

110

10.5K

Saurav Muralidharan retweetledi

@·14 Ağu

🚀 We've pruned LLaMa3.1 down to 4B parameters, delivering a smaller and more efficient model! Based on our recent paper: arxiv.org/abs/2407.14679 📖 Learn all about it in our blog: developer.nvidia.com/blog/how-to-pr… 🔗 META's announcement: ai.meta.com/blog/nvidia-ll… 👐 Checkpoints at HF this

English

309

58K

Saurav Muralidharan@srv_m·1 Ağu

@cataluna84 Hi, we do have plans to release the code, but the timeline is a bit unclear due to the legal approvals we need to obtain. In the next few weeks, hopefully!

English

Mayank Bhaskar@cataluna84·31 Tem

@srv_m Do you plan to release the full pruning & distillation code along with evaluation & benchmarks, so that we can try this method on new models?

English

Saurav Muralidharan@srv_m·23 Tem

🤖 Excited to announce Minitron, a new family of language models obtained through a combination of weight pruning and knowledge distillation! Our models are available on HF with a permissive license. Give them a try today!

🚀 40x Faster Model Training via Pruning and Distillation! Permissive Minitron-4B and Minitron-8B models! 🔗 Paper: arxiv.org/abs/2407.14679 🔗 GitHub: github.com/NVlabs/Minitron 🔗 Models on HF: bit.ly/4ffjnQj Key highlights of 4B/8B models: 📊 2.6B/6.2B active

English

340

Saurav Muralidharan retweetledi

@·18 Tem

@MistralAI and @nvidia announce Mistral-NeMo 12B, an awesome bite-size model released under Apache 2.0 that we jointly trained. FP8 aligned checkpoint and 128k context window, great benchmark scores. blogs.nvidia.com/blog/mistral-n… mistral.ai/news/mistral-n…

English

9.8K

Saurav Muralidharan retweetledi

@·17 Tem

🚀 Introducing Flextron - a Many-in-One LLM - Oral at ICML! Train one model and get many optimal models for each GPU at inference without any additional retraining. 🌟 🔗 Paper: arxiv.org/abs/2406.10260 Main benefits with only 5% post-training finetuning: ✅ Best model for

GIF

English

194

30.4K

Saurav Muralidharan@srv_m·19 Haz

More details here: cairuisi.github.io/Flextron/

English

Saurav Muralidharan@srv_m·19 Haz

Check out our latest work, Flextron, an elastic LLM that supports zero-shot flexible deployment at a variety of model scales and sizes. Flextron models achieve SoTA performance and are also input-adaptive (heterogeneous MoE).

Tired of training varying-size LLMs to fit various GPU memory and latency requirements? Check out Flextron! Our new ICML (Oral) paper shows how to train one model deployable across GPU series. Learn more: cairuisi.github.io/Flextron/🚀

English

210

Saurav Muralidharan retweetledi

@·13 Mar

We are now having full conversations with Figure 01, thanks to our partnership with OpenAI. Our robot can: - describe its visual experience - plan future actions - reflect on its memory - explain its reasoning verbally Technical deep-dive 🧵:

English

141

644

2.7K

669.5K

Keşfet

@NeurIPSConf @huggingface @MistralAI @MervinPraison @NVIDIAAI @nvidia @AIatMeta @PavloMolchanov