Red Hat AI

2.3K posts

Red Hat AI

@RedHat_AI

Accelerating AI innovation with open platforms and community. The future of AI is open.

Katılım Mayıs 2018

2.1K Takip Edilen10.7K Takipçiler

Sabitlenmiş Tweet

Red Hat AI@RedHat_AI·1 Nis

Red Hat B200 submission beat the best B300 submission by 50% on Qwen3-VL server scenario in MLPerf v6.0. Not newer hardware. Better software and performance tuning. We submitted across diverse scenarios and accelerators, including both NVIDIA and AMD GPUs. Shoutout to @vllm_project for hardware flexibility. More from our v6.0 submission: - GPT-OSS-120B on B200: 93,071 tok/s, 8% ahead of the next B200 result. First MLPerf submission for a model this scale on Kubernetes (OpenShift AI + llm-d). - Whisper on H200: 36,396 tok/s, 13% faster than the best H200 submitter. @vllm_project was used by 20 submitters in MLPerf v6.0, up from 5 in v5.1. It ran across every GPU in our stack: NVIDIA H200, B200, L40S, and AMD MI350X. Full breakdown: redhat.com/en/blog/red-ha…

English

5.7K

Red Hat AI@RedHat_AI·1d

[vLLM Office Hours #49] Latest Trends in AI Agent Applications and vLLM - May 14, 2026 x.com/i/broadcasts/1…

Français

321

Red Hat AI@RedHat_AI·1d

Happening today!

Red Hat AI@RedHat_AI

vLLM Office Hours is back Thursday 🗓️ Topic: Latest Trends in AI Agent Applications and @vllm_project 2:00 PM ET | 11:00 AM PT Join us live with @AIatAMD to dig into where agentic AI and vLLM intersect. Get a calendar invite: red.ht/office-hours

English

640

Red Hat AI@RedHat_AI·2d

English

1.2K

Red Hat AI@RedHat_AI·2d

At what token volume does self-hosting beat the API bill? @TheEntAIShow built a calculator to find out: plug in your volume, model mix, and GPU costs and it does the math. theenterpriseaishow.com/calculator

English

752

Red Hat AI retweetledi

Cedric Clyburn@cedricclyburn·3d

We're LIVE here at @RedHatSummit in Atlanta and the energy is electric ⚡️ This morning's keynote was all about "AI Sovereignty": 🤖 Freedom to choose & run models your way 📑 Your data, under your control 🔍 Full visibility to audit AI actions Watch @matthicksj and team's keynote 👇 youtube.com/live/PgMSUGL4N… @RedHat and partners are going all-in on AI Infrastructure for the future of models and agents, including: ⚡️ @_llm_d_: distributed AI inference with 3x output throughput + 10x faster time to first token 🚀 @vllm_project: the standard for running any model on any accelerator

YouTube

English

584

Red Hat AI retweetledi

Red Hat@RedHat·4d

Every organization fights its own IT villains—from legacy lock-in to the reality of scaling #AI. But with open source, you aren't isolated; you’re part of a global community. At #RHSummit and beyond, that community and Red Hat experts have your back. Meet our heroes who are building for what's next ⬇️

English

3.3K

Red Hat AI retweetledi

vLLM@vllm_project·4d

vLLM tops the Artificial Analysis leaderboard 🎉 vLLM tops @ArtificialAnlys on DeepSeek V3.2 and ranks among the top deployments of MiniMax-M2.5 and Qwen 3.5 397B. The leading deployments of these models are now open source. How each result was built: 🔹 DeepSeek V3.2 — Aggressive op fusion across the attention path collapsed ~33 per-layer kernels down toward ~10. 🔹 MiniMax-M2.5 — Custom EAGLE3 draft trained against the target's own token distribution via TorchSpec, plus a custom QK-norm fusion for MiniMax's TP-aware attention. 🔹 Qwen 3.5 397B — Targeted fusions plus a QK-norm fix for Qwen's linear-attention path. Every optimization is in vLLM main or on its way upstream. Huge thank you to @inferact, @digitalocean, @nvidia, @RedHat_AI, and the vLLM community 🙏 Full breakdown 👇 vllm.ai/blog/vllm-tops…

English

144

20.7K

Red Hat AI@RedHat_AI·4d

Worth a scroll!

Eldar Kurtić@_EldarKurtic

TurboQuant has drawn a lot of attention recently, but the accompanying evals didn't tell the full story. So we ran what I believe is the first comprehensive study of TurboQuant: where it helps, where it falls short, and how it impacts accuracy, latency, and throughput. Findings:

English

3.3K

Red Hat AI retweetledi

Michael Hentrich@michentr·4d

Whether you're in Atlanta or at home, our "Virtual Red Hat Summit Experience" is your hub for 50+ interactive demos. Dive into @RedHat_AI, @OpenShift, sovereignty, and more at your own pace. sprou.tt/1DtEsWcWeCr

English

707

Red Hat AI@RedHat_AI·5d

Qwen3-8B now has a DFlash speculator! 82.2% first-token acceptance on math reasoning. 3.74 avg tokens accepted per step. Built with the Speculators library. Training compute sponsored by @modal. 🙏 huggingface.co/RedHatAI/Qwen3…

English

170

11.7K

Red Hat AI@RedHat_AI·5d

Michael Goin (@mgoin_) walks through @vllm_project v0.20.0. 752 commits. 320 contributors. 123 new. 🚀 🎉 DeepSeek V4, TurboQuant 2-bit KV cache, MXFP4 for MoE on Blackwell, FA4 as MLA prefill default, @PyTorch 2.11 + CUDA 13.0, Transformers V5, and a lot more. ~8 minutes.

English

100

8.2K

Red Hat AI retweetledi

Red Hat@RedHat·28 Nis

Introducing an agentic OS prototype that combines: 🔄 Fedora-bootc for OS lifecycle and transactional updates 📦 Rootless Podman for workload isolation 🦞 OpenClaw for a robust agent runtime Check out the demo and start building: red.ht/4uizcg5.

English

3.7K

Red Hat AI retweetledi

Red Hat@RedHat·5 May

The lab was just the beginning. Now, we’re talking ROI. 💼 At #RHSummit, we'll bridge the gap between AI innovation and enterprise reality. What’s on the agenda? ✅ Hands-on labs with Red Hat AI specialists. ✅ Deep dives into the Red Hat AI Factory with NVIDIA. ✅ Real-world case studies from the world's biggest brands. Ready to operationalize your AI strategy? Learn more: red.ht/498hPpT

English

1.6K

Red Hat AI retweetledi

Dipika Sikka 🍁@dsikka84·1 May

Our LLM Compressor team has released NVFP4 and FP8 Kimi-K2.6 checkpoints - check them out! huggingface.co/RedHatAI/Kimi-… huggingface.co/RedHatAI/Kimi-…

English

279

22.8K

Red Hat AI@RedHat_AI·30 Nis

[vLLM Office Hours #48] vLLM Project and Tool Calling Update - April 30, 2026 x.com/i/broadcasts/1…

English

739

Red Hat AI retweetledi

llm-d@_llm_d_·30 Nis

Standard metrics often miss what happens between instances in P/D (Prefill/Decode) mode. Bridge the gap with: 🟣 llm-d tracing: Full request from Gateway to GPU. 🟣 vLLM NIXL Metrics: Real-time KV cache transfer. 🟣 True E2E Latency: Moving beyond TTFT. youtube.com/watch?v=Tz3mCN…

YouTube

English

1.9K

Red Hat AI retweetledi

Michael Goin@mgoin_·30 Nis

vLLM 0.20.0 is a massive update!! I’ll be covering the juicy bits in detail tomorrow’s office hours

Red Hat AI@RedHat_AI

vLLM Office Hours is tomorrow 🗓️ @mgoin_ is dropping the latest project updates, including what's new in v0.20.0. Then we'll cover the current state of tool calling in @vllm_project and what's coming next. Get a calendar invite: red.ht/office-hours Watch on YouTube Live: youtube.com/live/N4qRxarKY…

English

2.2K

Red Hat AI@RedHat_AI·29 Nis

YouTube

English

4.3K

Keşfet

@vllm_project @AIatAMD @TheEntAIShow @RedHatSummit @matthicksj @RedHat @_llm_d_ @ArtificialAnlys