Sawyer Bowerman (@_soyr_) - Twitter پروفائل

As AI and LLMs become more and more powerful, the boundaries for efficient inference simply break down. If you want to experiment with quantized models as well, you can follow the code in the video, and check out huggingface.co/RedHatAI to pick any model you'd like!

Red Hat AI@RedHat_AI

What compression looks like on @vllm_project. Same Gemma 4 31B. Red Hat AI's quantized version runs at nearly 2x tokens/sec, half the memory, 99%+ accuracy retained. Open source. Quantized with LLM Compressor. Links in comments. 🙏 @_soyr_ for the 2-minute demo.

English

143

Sawyer Bowerman ری ٹویٹ کیا

DevoxxUK@DevoxxUK·11 Mar

The Wildcard Ever wanted to do your daily stand-up in @Minecraft? @_soyr_ shows us how (and why!) they did it. Don't miss these sessions, May 6-7! Register here: devoxx.co.uk

English

Sawyer Bowerman

دریافت کریں