Pete Cheslock

32.3K posts

Pete Cheslock banner
Pete Cheslock

Pete Cheslock

@petecheslock

I'm not here. Find me on BlueSky. https://t.co/4M0VoYFWQj

Boston, MA Katılım Mart 2011
441 Takip Edilen5K Takipçiler
Pete Cheslock
Pete Cheslock@petecheslock·
For all my local Boston friends. If you are interested in vLLM/llm-d and inference at scale you should join us!
Red Hat AI@RedHat_AI

vLLM meetup is coming to Boston on March 31! Workshop + evening sessions covering: - @vllm_project update - Model compression and speculative decoding - Agentic AI with vLLM - Distributed inference at scale with @_llm_d_ and Kubernetes Pre-event workshop at 3:30 PM: Deploy Llama 3.1 8B and benchmark llm-d's cache-aware routing live. Shoutout to our sponsors: @RedHat, @IBM, @NVIDIAAI, The Open Accelerator, and @MITIBMLab! Register here 👇 luma.com/4rmkrrb7

English
0
0
2
118
Pete Cheslock retweetledi
Yuan (Terry) Tang
Yuan (Terry) Tang@TerryTangYuan·
📢 𝗧𝗵𝗲 𝗦𝘁𝗮𝘁𝗲 𝗼𝗳 𝗠𝗼𝗱𝗲𝗹 𝗦𝗲𝗿𝘃𝗶𝗻𝗴 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝗶𝗲𝘀: 𝗠𝗮𝗿𝗰𝗵 𝗘𝗱𝗶𝘁𝗶𝗼𝗻 𝗶𝘀 𝗼𝘂𝘁! We launched our newsletter publicly last year to share our contributions to upstream communities from our @RedHat_AI teams. We’ve gained over 𝟭𝟯𝟬𝟬 𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗯𝗲𝗿𝘀! Our goal with this newsletter is to give a clear, community-driven view of what’s happening across the model serving ecosystem, including updates from @vllm_project, KServe, @_llm_d_, @kubernetesio, and Llama Stack. 👉 Check out the March newsletter here: inferenceops.substack.com/p/state-of-the… 👉 Subscribe to get future issues in your inbox: inferenceops.substack.com 🚀 Thanks to everyone who subscribed so far! Kudos to all contributors to this edition! @franciscojarceo, Pete Cheslock, Sean Condon, Jooho Lee, Pierangelo Di Pilato, Ran Pollak, Nir Rozenbaum, @TerryTangYuan, Wentao Ye
English
0
4
7
771
Pete Cheslock retweetledi
Red Hat AI
Red Hat AI@RedHat_AI·
We’ll cover all of this and more during our distributed inference meetup in New York City on March 11, 2026: luma.com/0crwqwg4
English
0
4
14
775
Pete Cheslock retweetledi
llm-d
llm-d@_llm_d_·
What’s on the agenda for next Wednesday's NYC meetup? 🛠️ Intro to llm-d 0.5 ⚡️ Distributed LLM serving on AMD 🧠 Lessons scaling Wide-EP and MoE 💾 KV-cache offloading & prefix scheduling Join us building the future of open-source inference. Details: luma.com/0crwqwg4
English
0
2
6
630
Pete Cheslock retweetledi
llm-d
llm-d@_llm_d_·
Join us next week in NYC with the llm-d community for a deep dive into distributed inference. We’re talking llm-d 0.5, scaling MoE models, and KV-cache offloading. If you're building LLM infra, don't miss this. 📅 March 11th 📍1 Madison Ave Register: luma.com/0crwqwg4
English
0
4
8
1K
Pete Cheslock retweetledi
Ernesto Rivera
Ernesto Rivera@ernestobrivera·
Great talk last night by @julianeagu (@QuotientAI), @thejackobrien (Subconscious), and @petecheslock (Red Hat)! LLMs as we know it today must change to meet the capacity we expect of them. Specialized agents, changing their hardware architecture, or funneling proper context!!
Ernesto Rivera tweet media
English
0
5
5
557
Pete Cheslock retweetledi
llm-d
llm-d@_llm_d_·
🚀 Announcing llm-d v0.4! This release focuses on achieving SOTA inference performance across accelerators. From ultra-low latency for MoE models to new auto-scaling capabilities, we’re pushing the boundaries of open-source inference. Blog: llm-d.ai/blog/llm-d-v0.… 🧵👇
llm-d tweet media
English
1
3
7
566
Pete Cheslock retweetledi
llm-d
llm-d@_llm_d_·
🚀 llm-d v0.3.1 is LIVE! 🚀 This patch release is packed with key follow-ups from v0.3.0, including new hardware support, expanded cloud provider integration, and streamlined image builds. Dive into the full changelog: github.com/llm-d/llm-d/re… #llmd #OpenSource #vLLM #Release
llm-d tweet media
English
2
2
6
594
Pete Cheslock retweetledi
Red Hat AI
Red Hat AI@RedHat_AI·
If you’re running LLM inference at scale and still relying solely on “requests per second” or “GPU usage,” you might be missing critical insights. At Red Hat, we’ve been rethinking observability for LLM systems, from token throughput and latency metrics to cache reuse and end-to-end visibility. This post breaks down how @llm_d brings cache-aware, token-level, and routing metrics to Red Hat @openshift AI 3.0, exposing TTFT, TPOT, cache hit ratios, and full traces from IGW to @vllm_project workers. Read more → redhat.com/en/blog/tokens…
Red Hat AI tweet media
English
3
19
133
10.3K
Pete Cheslock retweetledi
llm-d
llm-d@_llm_d_·
Getting started with llm-d v0.2 is now easier than ever! We've launched a full set of quick start guides to walk you through our most powerful features, including P/D disaggregation and deploying large MoE models on Kubernetes. Start here: llm-d.ai/docs/guide
llm-d tweet media
English
0
5
15
804
Pete Cheslock retweetledi
Red Hat AI
Red Hat AI@RedHat_AI·
Distributed LLM inference is not hard. @_llm_d_ is a Kubernetes-native high-performance inference framework that makes it easy to get started. With llm-d v0.2, we’re laying down well-lit paths: reproducible, production-grade patterns for high-performance distributed inference...
English
1
10
25
1.2K
Pete Cheslock retweetledi
llm-d
llm-d@_llm_d_·
So what’s new in v0.2? 🔹 Validated P/D Disagg: Unlock higher GPU throughput 🔹 Native MoE Support: Run massive models like DeepSeek-R1 at scale 🔹 Extensible Scheduler: Customize routing with precise, prefix-aware logic Dive into the release notes: red.ht/4o7OUsd
English
0
1
2
170
Pete Cheslock retweetledi
llm-d
llm-d@_llm_d_·
The llm-d community is proud to announce the release of v0.2! Our focus has been on building well-lit paths for large-scale inference on Kubernetes. This release delivers major advancements in performance, scheduling, and support for massive models. red.ht/4l4u9uD
llm-d tweet media
English
1
6
16
1.2K
Pete Cheslock retweetledi
llm-d
llm-d@_llm_d_·
llm-d organizes through 7 specialized teams (SIGs): 🔀 Inference Scheduler 📊 Benchmarking ⚡ PD-Disaggregation 🗄️ KV-Disaggregation 🚀 Installation 📈 Autoscaling 👀 Observability Weekly meetings, public docs, active Slack channels. Join today! llm-d.ai/docs/community…
English
0
4
16
1K