Red Hat AI

2.2K posts

Red Hat AI banner
Red Hat AI

Red Hat AI

@RedHat_AI

Accelerating AI innovation with open platforms and community. The future of AI is open.

Katılım Mayıs 2018
2.1K Takip Edilen9.3K Takipçiler
Sabitlenmiş Tweet
Red Hat AI
Red Hat AI@RedHat_AI·
vLLM meetup is coming to Boston on March 31! Workshop + evening sessions covering: - @vllm_project update - Model compression and speculative decoding - Agentic AI with vLLM - Distributed inference at scale with @_llm_d_ and Kubernetes Pre-event workshop at 3:30 PM: Deploy Llama 3.1 8B and benchmark llm-d's cache-aware routing live. Shoutout to our sponsors: @RedHat, @IBM, @NVIDIAAI, The Open Accelerator, and @MITIBMLab! Register here 👇 luma.com/4rmkrrb7
English
3
3
13
4.9K
Red Hat AI retweetledi
vLLM
vLLM@vllm_project·
Boston, we’re coming. 🎉 Join the vLLM meetup on March 31 for an evening of deep technical sessions, live demos, and real conversations on LLM inference at scale. We’ll cover vLLM updates, model compression, speculative decoding, agentic AI, and distributed inference with llm-d + Kubernetes. Thanks to @RedHat, @IBM, @NVIDIAAI, The Open Accelerator, and @MITIBMLab for the support. Register: luma.com/4rmkrrb7
vLLM tweet media
Red Hat AI@RedHat_AI

vLLM meetup is coming to Boston on March 31! Workshop + evening sessions covering: - @vllm_project update - Model compression and speculative decoding - Agentic AI with vLLM - Distributed inference at scale with @_llm_d_ and Kubernetes Pre-event workshop at 3:30 PM: Deploy Llama 3.1 8B and benchmark llm-d's cache-aware routing live. Shoutout to our sponsors: @RedHat, @IBM, @NVIDIAAI, The Open Accelerator, and @MITIBMLab! Register here 👇 luma.com/4rmkrrb7

English
0
1
31
4K
Danny Hernández
Danny Hernández@danNH2006·
@RedHat_AI Bro, I need to quantize Qwen3.5 you guys are barely making possible to quantize Qwen3...
English
2
0
0
40
Red Hat AI
Red Hat AI@RedHat_AI·
LLM Compressor v0.10 is out 🚀 Compress 70B+ models without running out of memory. And do it 3.8x faster. What's new: - Custom disk offloading for models that don't fit in GPU memory - Distributed GPTQ: split compression across 4 GPUs (Qwen3-30B-A3B went from 3.9h → 1h) - GPTQ FP4 microscale (NVFP4 + MXFP4) support @MistralAI already used it to ship their NVFP4 checkpoint. developers.redhat.com/articles/2026/…
English
2
10
86
4.6K
Red Hat AI
Red Hat AI@RedHat_AI·
@danNH2006 Not yet. v5 will be supported in an upcoming release. Stay tuned!
English
1
0
0
116
Red Hat AI
Red Hat AI@RedHat_AI·
Big step for production LLM serving. KServe v0.17 brings LLMInferenceService to GA, built on @_llm_d_. KV-cache routing, disaggregated prefill-decode, cost-aware autoscaling. 38 contributors, 21 of them new. The community keeps delivering.
Yuan (Terry) Tang@TerryTangYuan

𝗞𝗦𝗲𝗿𝘃𝗲 𝘃𝟬.𝟭𝟳 𝗶𝘀 𝗹𝗶𝘃𝗲! 🚀 We are thrilled to announce KServe's most significant update yet. We’ve overhauled the architecture to move beyond traditional model serving. LLMInferenceService is now fully production-ready and built on the high-performance @_llm_d_ framework. 𝗪𝗵𝗮𝘁’𝘀 𝗶𝗻𝗰𝗹𝘂𝗱𝗲𝗱? - KV-cache aware routing and disaggregated prefill-decode to maximize throughput. - Cost-aware autoscaling designed for LLM inference workloads. - Comprehensive parallelism specification for distributed inference. - Envoy AI Gateway integration for sophisticated token-based rate limiting. - A completely restructured modular Helm chart architecture. 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝘆 𝗣𝗼𝘄𝗲𝗿 🤝 This version was made possible by 𝟯𝟴 𝗰𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗼𝗿𝘀, including 𝟮𝟭 𝗻𝗲𝘄 𝗰𝗼𝗻𝘁𝗿𝗶𝗯𝘂𝘁𝗼𝗿𝘀. Thank you for your hard work! Check out the full release notes here: github.com/kserve/kserve/… Release blog: kserve.github.io/website/blog/k…

English
0
6
11
1.2K
Red Hat AI
Red Hat AI@RedHat_AI·
The Vienna vLLM Meetup recording is live. 2+ hours of deep technical sessions on inference, quantization, speculative decoding, MoE architecture, and a real-world case study from Canva. Here's what you missed, with timestamps 🧵 youtube.com/live/CXrECamqH…
YouTube video
YouTube
English
1
2
14
951
Red Hat AI retweetledi
Red Hat
Red Hat@RedHat·
Launch your private #AI code assistant today. The new AI quickstart, powered by @RedHat_AI's MaaS and NVIDIA Nemotron models, gives you governed, cost-controlled AI. Learn how: red.ht/4uwrQ9r #NVIDIAGTC
Red Hat tweet media
English
1
15
105
5.2K
Red Hat AI retweetledi
Red Hat
Red Hat@RedHat·
Build #AI that is sustainable, scalable, and sovereign. @RedHat_AI Enterprise helps you move beyond the "pilot phase" with a strategic foundation for private LLMs and autonomous agents. Read the blog: red.ht/4s6jlzZ.
Red Hat tweet media
English
1
2
18
1.4K
Red Hat AI
Red Hat AI@RedHat_AI·
🇸🇪 Sweden AI engineers 👋 vLLM Inference Meetup on March 25, hosted by Red Hat AI, @AIatAMD, and @AISweden. Sessions on vLLM project updates, LLM compression for fast inference, vLLM optimization on AMD GPUs, and how AI Sweden is using inference endpoints in production. Networking, food and drinks after. Register before March 24: luma.com/q5aytn9u
English
0
0
9
568
vLLM
vLLM@vllm_project·
🎉 Congrats to @MistralAI on releasing Mistral Small 4 — a 119B MoE model (6.5B active per token) that unifies instruct, reasoning, and coding in one checkpoint. Multimodal, 256K context. Day-0 support in vLLM — MLA attention backend, tool calling, and configurable reasoning mode, verified on @nvidia GPUs. 🔗 huggingface.co/mistralai/Mist…
vLLM tweet media
Mistral AI for Developers@MistralDevs

🔥 Meet Mistral Small 4: One model to do it all. ⚡ 128 experts, 119B total parameters, 256k context window ⚡ Configurable Reasoning ⚡ Apache 2.0 ⚡ 40% faster, 3x more throughput Our first model to unify the capabilities of our flagship models into a single, versatile model.

English
7
37
381
28.7K
Red Hat AI
Red Hat AI@RedHat_AI·
Congrats to @MistralAI on releasing Mistral Small 4 in NVFP4. This checkpoint was generated using LLM Compressor, an open source quantization toolkit that's part of @vllm_project. Red Hat AI worked directly with the Mistral team to produce it. Great to see upstream model providers choosing open source tooling for their official quantized releases. huggingface.co/mistralai/Mist…
Mistral AI for Developers@MistralDevs

🔥 Meet Mistral Small 4: One model to do it all. ⚡ 128 experts, 119B total parameters, 256k context window ⚡ Configurable Reasoning ⚡ Apache 2.0 ⚡ 40% faster, 3x more throughput Our first model to unify the capabilities of our flagship models into a single, versatile model.

English
8
7
31
3.2K