Marc Sun

Georgi Gerganov@ggerganov

298

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

738

13.6K

6.5M

Georgi Gerganov@ggerganov·20 Şub

Today ggml.ai joins Hugging Face Together we will continue to build ggml, make llama.cpp more accessible and empower the open-source community. Our joint mission is to make local AI easy and efficient to use by everyone on their own hardware.

I've started a company: ggml.ai From a fun side project just a few months ago, ggml has now become a useful library and framework for machine learning with a great open-source community

English

139

232

1.6K

294.8K

Marc Sun@_marcsun·20 Şub

@ggerganov Let's go !

English

1

134

Marc Sun@_marcsun·12 Şub

@TheZachMueller @LambdaAPI 🐐

QME

0

1

87

Zach Mueller@TheZachMueller·12 Şub

Over the last month I've been digging into model inference; what's the best out-of-the-box tokens/s on our hardware, and how do you benchmark it? Our model-inference revamp is now live, with model cards built to answer exactly this (in a community-focused way):

English

60

13.3K

Marc Sun retweetledi

Lysandre@LysandreJik·2 Şub

The PyTorch Conference is coming to Europe the 7-8th April 2026! It'll be great to see all of you, don't hesitate to come and talk there!

English

NVIDIA AI Developer@NVIDIAAIDev

3

13

582

Marc Sun retweetledi

Dan Alistarh@DAlistarh·2 Şub

Happy to release Quartet II, a new method that pushes the frontier of 4-bit LLM training in NVFP4. Fully-quantized pre-training in NVFP4 can now match FP8/FP16 quality much more closely, while maintaining full hardware acceleration! [1/4]

English

5

25

170

19.3K

Marc Sun retweetledi

Pavlo Molchanov@PavloMolchanov·28 Oca

🚀 New NVIDIA report: NVFP4 + Quantization-Aware Distillation (QAD) FP4 inference without quality collapse. Key idea: distill a BF16 teacher into an NVFP4 student using KL loss - much more robust than PTQ/QAT, especially after SFT/RL. 🔥 Near-BF16 accuracy ⚡ ~2-3× throughput, ~1.8× memory savings vs FP8 🧠 Works for LLMs and VLMs (Nemotron Nano, Super, VL) Technical report: huggingface.co/nvidia/NVIDIA-… Research blog: research.nvidia.com/labs/nemotron/… Hugging Face models: research.nvidia.com/labs/nemotron/…

We just launched an ultra-efficient NVFP4 precision version of Nemotron 3 Nano that delivers up to 4x higher throughput on Blackwell B200. Using our new Quantization Aware Distillation method, the NVFP4 version achieves up to 99.4% accuracy of BF16. Nemotron 3 Nano NVFP4: nvda.ws/4t63z9y Tech Report: nvda.ws/4bj3pp0

English

3

17

114

15.2K

Marc Sun@_marcsun·29 Oca

If you could fix ONE thing about `Trainer` in transformers, what would it be? Share your feedback: github.com/huggingface/tr… Thanks @UnslothAI @axolotl_ai and others for trusting and building on top of Trainer. We want to make sure you all get the best experience.

English

5

10

1.8K

Marc Sun retweetledi

NVIDIA AI Developer@NVIDIAAIDev·28 Oca

Transformers v5 just dropped 👀 3M+ installs/day. 400+ architectures. Quantization-first. PyTorch-native. We use @huggingface Transformers across the NVIDIA AI stack — from NeMo training to TensorRT-LLM inference. Congrats to the Hugging Face team on 5 years of work that powers millions of developers 🤝

Lysandre@LysandreJik

Transformers v5's FINAL, stable release is out 🔥 Transformers' biggest release. The big Ws of this release: - Performance, especially for MoE (6x-11x speedups) - No more slow/fast tokenizers -> way simpler API, explicit backends, better performance - dynamic weight loading: way faster, and enabling: MoE now working w/ {quants, tp, peft, ...} We have a migration guide on the main branch; please take a look at it in case you run into issues. Come in our GH issues if you still do after reading it 😀

English

3

24

184

11.8K

Marc Sun retweetledi

NVIDIA AI Developer@NVIDIAAIDev·28 Oca

We just launched an ultra-efficient NVFP4 precision version of Nemotron 3 Nano that delivers up to 4x higher throughput on Blackwell B200. Using our new Quantization Aware Distillation method, the NVFP4 version achieves up to 99.4% accuracy of BF16. Nemotron 3 Nano NVFP4: nvda.ws/4t63z9y Tech Report: nvda.ws/4bj3pp0

English

24

87

700

129.2K

Marc Sun retweetledi

Lysandre@LysandreJik·26 Oca

Transformers v5's FINAL, stable release is out 🔥 Transformers' biggest release. The big Ws of this release: - Performance, especially for MoE (6x-11x speedups) - No more slow/fast tokenizers -> way simpler API, explicit backends, better performance - dynamic weight loading: way faster, and enabling: MoE now working w/ {quants, tp, peft, ...} We have a migration guide on the main branch; please take a look at it in case you run into issues. Come in our GH issues if you still do after reading it 😀

English

87

435

75.3K

Marc Sun retweetledi

LMSYS Org@lmsysorg·12 Oca

🙌🙌🙌

Sayak Paul@RisingSayak

You can run ANY pipeline from Diffusers in @sgl_project and benefit from the open tooling for optimized inference in the space 🔥 Combine SGLang's optims + Diffusers' flexible options for optims to suit your needs 🤗 Kudos to @adarshxs for leading the work here!

ART

3

21

2.2K

Marc Sun retweetledi

Sayak Paul@RisingSayak·12 Oca

You can run ANY pipeline from Diffusers in @sgl_project and benefit from the open tooling for optimized inference in the space 🔥 Combine SGLang's optims + Diffusers' flexible options for optims to suit your needs 🤗 Kudos to @adarshxs for leading the work here!

English

2

9

33

8.5K

Marc Sun retweetledi

Radical Numerics@RadicalNumerics·12 Oca

Scaling scientific world models requires co-designing architectures, training objectives, and numerics. Today, we share the first posts in our series on low-precision pretraining, starting with NVIDIA's NVFP4 recipe for stable 4-bit training. Part 1: radicalnumerics.ai/blog/nvfp4-par… Part 2: radicalnumerics.ai/blog/nvfp4-par… We cover floating point fundamentals, heuristics, custom CUDA kernels, and stabilization techniques. Future entries will cover custom recipes and results on hybrid architectures.

English

93

525

66.8K

Marc Sun@_marcsun·24 Ara

@qubitium @art_zucker @huggingface I wish we had some too but no unfortunately 🙂‍↕️

English

2

0

1

139

Qubitium@qubitium·24 Ara

@art_zucker @_marcsun Does @huggingface have access to Ascent NPU I can borrow for a few hours to validate kernel?

English

0

90

Qubitium@qubitium·24 Ara

Does anyone have a vm with Ascend NPU that they can share with me for like 8 hours? I need this hw to complete the Huawei NPU support for GPT-QModel. Thanks! @Huawei github.com/ModelCloud/GPT…

English

0

174

Marc Sun@_marcsun·24 Ara

@TheZachMueller @StasBekman @art_zucker @kadirnardev 🫡 will do

English

2

71

Zach Mueller@TheZachMueller·24 Ara

@StasBekman @art_zucker @kadirnardev Now to annoy @_marcsun ;)

English

0

1

194

Stas Bekman@StasBekman·20 Ara

I'm compiling a list of good fp8 training libraries: - transformer_engine github.com/NVIDIA/Transfo… - torchao github.com/pytorch/ao/blo… - MS-AMP github.com/Azure/MS-AMP/t… Which one(s) do you prefer - ease of use, stability, support? Which ones am I missing? Thank you!

English