Pete Cheslock

32.3K posts

Pete Cheslock banner
Pete Cheslock

Pete Cheslock

@petecheslock

Also on https://t.co/4M0VoYFWQj

Boston, MA Katılım Mart 2011
442 Takip Edilen5K Takipçiler
Pete Cheslock
Pete Cheslock@petecheslock·
Hey Boston Friends. Join us next week during Boston Tech Week to learn more about llm-d, open source distributed inferencing on kubernetes. Special thanks to @RedHat and @Google for helping to plan and sponsoring this free event! luma.com/eqbc1gxq
English
2
0
3
97
Pete Cheslock retweetledi
Yuan (Terry) Tang
Yuan (Terry) Tang@TerryTangYuan·
📢 𝗧𝗵𝗲 𝗦𝘁𝗮𝘁𝗲 𝗼𝗳 𝗠𝗼𝗱𝗲𝗹 𝗦𝗲𝗿𝘃𝗶𝗻𝗴 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝗶𝗲𝘀: 𝗔𝗽𝗿𝗶𝗹 𝗘𝗱𝗶𝘁𝗶𝗼𝗻 𝗶𝘀 𝗼𝘂𝘁! Our goal with this newsletter is to give a clear, community-driven view of what’s happening across the model serving ecosystem, including updates from projects like @vllm_project, KServe, @_llm_d_, @kubernetesio, Llama Stack, and more. 👉 Check out the April newsletter here: inferenceops.substack.com/p/state-of-the… 👉 Subscribe to get future issues in your inbox: inferenceops.substack.com 🚀 Thanks to everyone who subscribed so far! Kudos to all contributors to this edition! Francisco Arceo, Pete Cheslock, Jooho Lee, Pierangelo Di Pilato, Nir Rozenbaum, Yuan Tang, Wentao Ye, Sasa Zelenovic
English
0
2
4
1.2K
Pete Cheslock retweetledi
Red Hat AI
Red Hat AI@RedHat_AI·
vLLM meetup is coming to Boston on March 31! Workshop + evening sessions covering: - @vllm_project update - Model compression and speculative decoding - Agentic AI with vLLM - Distributed inference at scale with @_llm_d_ and Kubernetes Pre-event workshop at 3:30 PM: Deploy
English
3
6
28
12.1K
Pete Cheslock
Pete Cheslock@petecheslock·
@MartinGree93211 @RedHat Exactly! llm-d revolutionizes apps by treating inference as a first-class cloud-native workload. Features like disaggregated serving and inference-aware routing mean you get much higher throughput for less GPU spend. It’s all about making performance portable and accessible.
English
0
0
0
11
Martin Green
Martin Green@MartinGree93211·
@RedHat Great to see Red Hat's work on llm-d for optimized model serving! This could lead to faster results and reduced costs. Pete, any insights on how this could revolutionize our AI applications? 😀
English
1
0
1
14
Pete Cheslock retweetledi
Red Hat
Red Hat@RedHat·
Red Hat is working with industry leaders to develop llm-d, an open-source project that optimizes how models are served to your users. By routing requests to the most efficient GPU and separating prefill from decode, you get faster results for less spend. Check out Pete Cheslock's quick overview of how llm-d is changing the game for Kubernetes-based AI: red.ht/3PbTkkP #KubeCon + #CloudNativeCon
English
3
8
26
2.1K
Pete Cheslock retweetledi
llm-d
llm-d@_llm_d_·
It’s official: llm-d has joined the @CNCF! 🚀 Our mission to evolve Kubernetes into SOTA AI infrastructure just got a massive boost. This milestone belongs to our amazing community. Thank you for building this with us. 💜 We’re just getting started! 🔗 cncf.io/blog/2026/03/2…
English
2
38
141
9.9K
Pete Cheslock retweetledi
Yuan (Terry) Tang
Yuan (Terry) Tang@TerryTangYuan·
📢 𝗧𝗵𝗲 𝗦𝘁𝗮𝘁𝗲 𝗼𝗳 𝗠𝗼𝗱𝗲𝗹 𝗦𝗲𝗿𝘃𝗶𝗻𝗴 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝗶𝗲𝘀: 𝗠𝗮𝗿𝗰𝗵 𝗘𝗱𝗶𝘁𝗶𝗼𝗻 𝗶𝘀 𝗼𝘂𝘁! We launched our newsletter publicly last year to share our contributions to upstream communities from our @RedHat_AI teams. We’ve gained over 𝟭𝟯𝟬𝟬 𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗯𝗲𝗿𝘀! Our goal with this newsletter is to give a clear, community-driven view of what’s happening across the model serving ecosystem, including updates from @vllm_project, KServe, @_llm_d_, @kubernetesio, and Llama Stack. 👉 Check out the March newsletter here: inferenceops.substack.com/p/state-of-the… 👉 Subscribe to get future issues in your inbox: inferenceops.substack.com 🚀 Thanks to everyone who subscribed so far! Kudos to all contributors to this edition! @franciscojarceo, Pete Cheslock, Sean Condon, Jooho Lee, Pierangelo Di Pilato, Ran Pollak, Nir Rozenbaum, @TerryTangYuan, Wentao Ye
English
0
4
7
831
Pete Cheslock retweetledi
Red Hat AI
Red Hat AI@RedHat_AI·
We’ll cover all of this and more during our distributed inference meetup in New York City on March 11, 2026: luma.com/0crwqwg4
English
0
4
14
822
Pete Cheslock retweetledi
llm-d
llm-d@_llm_d_·
What’s on the agenda for next Wednesday's NYC meetup? 🛠️ Intro to llm-d 0.5 ⚡️ Distributed LLM serving on AMD 🧠 Lessons scaling Wide-EP and MoE 💾 KV-cache offloading & prefix scheduling Join us building the future of open-source inference. Details: luma.com/0crwqwg4
English
0
2
6
665
Pete Cheslock retweetledi
llm-d
llm-d@_llm_d_·
Join us next week in NYC with the llm-d community for a deep dive into distributed inference. We’re talking llm-d 0.5, scaling MoE models, and KV-cache offloading. If you're building LLM infra, don't miss this. 📅 March 11th 📍1 Madison Ave Register: luma.com/0crwqwg4
English
0
4
8
1.1K