brian stevens (@addvin) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Introducing llm-d on stage at Red Hat Summit was truly a privilege ...

LLM inference is too slow, too expensive, and too hard to scale. 🚨 Introducing llm-d, a Kubernetes-native distributed inference framework, to change that—using vLLM (@vllm_project), smart scheduling, and disaggregated compute. Here’s how it works—and how you can use it today:

English

2

8

33

3.8K

brian stevens retweetledi

Red Hat AI@RedHat_AI·18 May

🇬🇧 London, June 10. @vllm_project & @_llm_d_ Inference Meetup, hosted by Red Hat AI, @nvidia, and @SteliaAI. Talks on vLLM updates, speculative decoding, llm-d in production, AI safety, and more. Plus food, drinks, and the people building this stuff. luma.com/iuecyow4

English

1

5

12

5.4K

brian stevens retweetledi

Red Hat AI@RedHat_AI·25 Nis

Llama 70B as a cloud endpoint costs exponentially more than Llama 8B. For teams where a smaller model meets the quality bar, that gap is hard to ignore. And with INT4 quantization: 4x smaller, 2x faster, less than 1% accuracy loss. The right model isn't always the biggest one. redhat.com/en/blog/when-l…

English

2

3

14

1.3K

brian stevens retweetledi

Red Hat AI@RedHat_AI·24 Ara

Calling Boston area startups building with AI. 🤙 We're kicking off 2026 with the first event in a new monthly, in person hackathon series hosted by @RedHat and @IBM in Boston’s Seaport District. This one day hackathon is designed specifically for local startups that want to move faster from idea to working prototype. Instead of a fixed theme, you bring a real AI problem your team is actively facing. We help you build a proof of concept using open source, enterprise ready templates from aitemplates.io, including MCP Server, AI Agent, and UI templates. What you will get: ⚡ Rapid prototyping without boilerplate 🧠 Hands on guidance from Red Hat AI architects 🤝 Connections with other Boston based AI startups and ecosystem partners If you are a Boston startup looking to turn an AI challenge into something real, this is for you. Event details are shared after registration. Register now: luma.com/i3q8df0x

English

3

5

29

2.1K

brian stevens retweetledi

SemiAnalysis@SemiAnalysis_·24 Eki

The @RedHat_AI team contributes a lot to vLLM and does amazing work for the open-source community. Great to see vLLM performing so well compared to TRT-LLM on H200! vLLM comes pretty close to B200, with the @NVIDIAAI team working on closing the gap for GPTOSS within the next couple of updates.

Red Hat AI@RedHat_AI

InferenceMAX, vLLM TPU, compressed-tensors, MoE support via transformers, DeepSeek-OCR, and more. Here’s what’s new in the @vllm_project community over the past two weeks:

English

3

15

98

20.6K

brian stevens retweetledi

Red Hat AI@RedHat_AI·24 Eki

InferenceMAX, vLLM TPU, compressed-tensors, MoE support via transformers, DeepSeek-OCR, and more. Here’s what’s new in the @vllm_project community over the past two weeks:

English

1

8

42

24.4K

brian stevens retweetledi

Red Hat AI@RedHat_AI·22 Eyl

4 tracks. 12 sessions. 1 day of learning. Join us on Oct. 16 for Red Hat AI Day of Learning, a free virtual event for developers, engineers & practitioners. Tracks: ⚡ Fast & efficient inference 🎯 Model customization 🤖 Agentic AI 🌐 Scaling AI over hybrid cloud Sessions include: · Intro to vLLM and how to get started · Model optimization with LLM Compressor · Lossless LLM inference acceleration w/ Speculators · End-to-end model customization · Synthetic data generation and data processing · Continual learning of LLMs with Training Hub · Build open source agentic AI solutions · Intro to Model Context Protocol (MCP) · Intro to Llama Stack · Intro to distributed inference · Distributed inference with llm-d · Scaling AI Infrastructure 👉 Register free: redhat.com/en/events/webi…

English

1

17

39

7.4K

brian stevens retweetledi

Red Hat AI@RedHat_AI·13 Eyl

Qwen3-Next dropped yesterday and you can run it with Red Hat AI today. ✅ Day-zero support in vLLM ✅ Day-one deployment with Red Hat AI Step-by-step guide: developers.redhat.com/articles/2025/… The future of AI is open.

English

0

5

18

2.3K

brian stevens@addvin·6 Ağu

@charles_irl @modal_labs @mgoin_ @vllm_project @shariqmobin Well done!

English

0

2

91

Charles 🎉 Frye@charles_irl·6 Ağu

Proud to announce that @modal_labs was a Day 0 Lunch Partner for the release of gpt-oss. That is, on Day 0 I got lunch with @mgoin_ of @vllm_project and then had him look at my and @shariqmobin's code to deploy the model on Modal.

Modal@modal

OpenAI has released its first open weights language model since GPT-2 over five years ago. gpt-oss has - efficient mxfp4 MoEs - native tool-calling & reasoning - attention sinks for long context This looks like a great model for self-hosted agents. Try it on Modal!

English

7

5

139

8K

brian stevens retweetledi

Red Hat AI@RedHat_AI·29 May

Thanks to the @lmcache team for joining forces with Red Hat on llm-d! llm-d is a new open source project for scalable, efficient distributed LLM inference with @vllm_project. Learn more about our collaboration here: blog.lmcache.ai/2025-05-22-red…

English

0

8

26

2K

brian stevens@addvin·29 May

@RedHat_AI Adding a shoutout to the @IBMResearch team working jointly with AMD team on contributing Triton attention kernels in vLLM v1 that improved decode throughput by 3x on Llama and Granite models.

English

0

4

49

Red Hat AI@RedHat_AI·29 May

7/7 Want to see how AMD, vLLM, and open-source tooling work together for efficient LLM inference? Check out the full breakdown: hotaisle.xyz/blog/hot-aisle… #LLM #vLLM #AMD #opensource #AIinfrastructure

English

1

0

7

381

Red Hat AI@RedHat_AI·29 May

What if you could run high-performance LLM inference on AMD GPUs, without vendor lock-in or hyperscaler markup? Check this out 👇 🧵

English

3

6

45

3.4K

brian stevens retweetledi

Mark Collier 柯理怀@sparkycollier·22 May

Really excited to see the emergence of llm-d @addvin ! Inference is the biggest workload in human history and the open source tools need to keep evolving to serve it

NVIDIA AI Developer@NVIDIAAIDev

The llm-d project is a major step forward for the #opensource AI ecosystem, and we are proud to be one of the founding contributors, reflecting our commitment to collaboration as a catalyst for innovation in generative AI. As generative and agentic AI continue to evolve, scalable, high-performance inference will be critical to unlocking their full potential. That’s why we’re partnering with @RedHat and other contributors to grow the llm-d community and accelerate its capabilities—powered by our contributions, including innovations from NVIDIA Dynamo such as NIXL. 🔗 Explore and contribute on GitHub: nvda.ws/3FlttSL 📰 Read the launch blog: nvda.ws/3FgF0Tn 🎙️ Hear from NVIDIA’s VP of Engineering & AI Frameworks, Ujval Kapasi → nvda.ws/45pyVhU

English

0

2

11

809

brian stevens retweetledi

NVIDIA AI Developer@NVIDIAAIDev·21 May

The llm-d project is a major step forward for the #opensource AI ecosystem, and we are proud to be one of the founding contributors, reflecting our commitment to collaboration as a catalyst for innovation in generative AI. As generative and agentic AI continue to evolve, scalable, high-performance inference will be critical to unlocking their full potential. That’s why we’re partnering with @RedHat and other contributors to grow the llm-d community and accelerate its capabilities—powered by our contributions, including innovations from NVIDIA Dynamo such as NIXL. 🔗 Explore and contribute on GitHub: nvda.ws/3FlttSL 📰 Read the launch blog: nvda.ws/3FgF0Tn 🎙️ Hear from NVIDIA’s VP of Engineering & AI Frameworks, Ujval Kapasi → nvda.ws/45pyVhU

English

1

17

31

9K

brian stevens@addvin·11 Nis

And was great to see the Red Hat and Google effort announced by my friend the brilliant Amin Vahdat.

Woosuk Kwon@woosuk_k

Huge congrats to all the @googlecloud and @RedHat_AI team members who drove this effort!

English

0

6

756

brian stevens@addvin·5 Nis

@RedHat_AI @vllm_project What an amazing release!

English

0

3

354

Red Hat AI@RedHat_AI·5 Nis

Llama 4 Herd is here! It brings a lot of goodies, like MoE architecture and native multimodality, enabling developers to build personalized multimodal experiences. With Day 0 support in vLLM, you can deploy Llama 4 with @vllm_project now! Let's dig into it. (a thread)

English

5

47

184

148.5K

brian stevens retweetledi

Red Hat AI@RedHat_AI·26 Şub

DeepSeek’s Open Source Week drops A LOT of exciting goodies! We’re hosting vLLM Office Hours tomorrow—learn what they are, how they integrate with vLLM, & ask questions! Date: Thursday, Thu, Feb 27 Time: 2PM ET / 11AM PT Register: neuralmagic.com/community-offi… #DeepSeek #AI

English

0

2

9

930

brian stevens retweetledi

Matt Hicks@matthicksj·13 Oca

At @RedHat, we believe the future of AI is open. That's why I'm incredibly excited about our acquisition of @NeuralMagic. Together, we're furthering our commitment to our customers and the open source community to deliver on the future of AI—and that starts today.

Red Hat@RedHat

Today, Red Hat completed the acquisition of @NeuralMagic, a pioneer in software and algorithms that accelerate #GenAI inference workloads. Read how we are accelerating our vision for #AI’s future: red.ht/408kJ8K.

English

0

28

80

4.3K

brian stevens@addvin·14 Oca

Today it become official, Neural Magic now a part of Red Hat.

Red Hat@RedHat

Today, Red Hat completed the acquisition of @NeuralMagic, a pioneer in software and algorithms that accelerate #GenAI inference workloads. Read how we are accelerating our vision for #AI’s future: red.ht/408kJ8K.

English

2

8

38

2.7K

brian stevens retweetledi

Red Hat AI@RedHat_AI·10 Ara

If you are at #NeurIPS2024 this week, stop by the Neural Magic booth #307 and talk to us about the @vllm_project! vLLM core committer @mgoin_ will be there, ready to hear your ideas and share them with the team. The best feature requests always come from in-person chats!

English

1

7

1.1K

brian stevens retweetledi

Scale ML@scaleml·2 Ara

For our last seminar of the year we will end with Lucas Wilkinson from @neuralmagic presenting! Machete: a cutting-edge mixed-input GEMM GPU kernel targeting NVIDIA Hopper GPUs Time: Dec 4, 3pm EST Sign up via scale-ml.org to join our mailing list for the zoom link

English

0

4

17

1.6K

brian stevens

Keşfet