Kuntai Du

109 posts

Kuntai Du

Kuntai Du

@this_will_echo

Chief Scientist | Committer of vLLM / LMCache / Production Stack

Katılım Ocak 2022
59 Takip Edilen218 Takipçiler
Kuntai Du
Kuntai Du@this_will_echo·
So glad to see our project --- LMCache --- is included in the discussion!
PyTorch@PyTorch

llm-d published a new post on KServe + llm-d + vLLM for production LLM inference on Kubernetes. Authors from @RedHat and Tesla describe how the stack addressed routing, customization, and day-2 operational challenges, citing 3x higher output tokens/s and 2x lower TTFT in one deployment after enabling prefix-cache aware routing. By Yuan Tang, Scott Cabrinha, Robert Shaw, and Sai Krishna @CloudNativeFdn 🔗 @_llm_d_ llm-d.ai/blog/productio… #vLLM #KServe #Kubernetes #LLMOps #OpenSource

English
0
0
6
767
Kuntai Du
Kuntai Du@this_will_echo·
My time being spent: before using claude code --> write code after using claude code --> read code, understand and find potential issues My mental effort is not getting much lighter lol.
English
0
0
4
185
Kuntai Du
Kuntai Du@this_will_echo·
Latest models use efficient attentions like Mamba or sliding window. This gives huge potential in KV cache offloading layer --- LMCache needs to catch up.
Tensormesh@tensormesh

GPU memory alone won’t carry the next generation of LLM serving. At #RaySummit, our Chief Scientist @this_will_echo shared how #LMCache offloads KV Cache across CPU RAM, local disk, Redis, and S3, while enabling cache reuse beyond basic prefix caching. Watch the full talk on YouTube: 👉🏻youtube.com/watch?v=aVpkkV… #RaySummit #LMCache #Tensormesh #KVCache

English
0
0
2
142
Kuntai Du
Kuntai Du@this_will_echo·
Two years ago, we just have 2 NVIDIA A40. Two years later, our project is mentioned in Jensen Huang's GTC talk. Hope is the first-order weapon for human to fight for the future.
Hanchen Li@lihanc02

Some former colleagues from @lmcache shared this photo from the GTC Keynote. I am honestly surprised how fast the team has been growing. (We were a research lab on 2 A40 GPUs in 2023!) btw I think they are hiring LLM hackers (or product hackers I am not sure 🤪, you should just check with @JunchenJiang @ChengYihuaA) #GTC #LLM #Inference #Nvidia #LMCache #KVCache

English
0
3
6
1.1K
Kuntai Du retweetledi
LMCache Lab
LMCache Lab@lmcache·
Why not store KV cache permanently? In case you missed it, #IBM recently posted two blogs for 𝗹𝗹𝗺-𝗱 + 𝗞𝟴𝗦 + 𝗟𝗠𝗖𝗮𝗰𝗵𝗲-based KV storage. Thrilled to keep building together. Avoiding recomputation is the goal, but it’s still rare to see KV cache treated as shared, persistent infrastructure in real production deployments. Excited to see LMCache be part of this with IBM, a long-time collaborator of the LMCache community. Thrilled to keep building together. These two posts are a great look at what that can actually look like in practice: 1. Rethinking LLM Inference Economics with llm-d, LMCache, and IBM Storage Scale community.ibm.com/community/user… 2. Deploying Distributed LLM Inference Service with IBM Storage Scale for KV Cache Offloading community.ibm.com/community/user… Great read for anyone interested in fast yet cheap LLM inference. #LMCache #vLLM #Kubernetes #K8s #KVCache
English
0
2
8
285
Kuntai Du
Kuntai Du@this_will_echo·
Physical LLM is on the way lol
Tensormesh@tensormesh

"𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗶𝘀 𝘁𝗵𝗲 𝗻𝗲𝘄 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸" — Kevin Deierling, SVP Networking #NVIDIA At his #GTC talk last week, he highlighted 𝗖𝗠𝗫 and 𝗖𝗮𝗰𝗵𝗲𝗕𝗹𝗲𝗻𝗱 from 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 (@tensormesh) were part of the new KV Cache memory stack for agents, and recognized @tensormesh among the 𝗖𝗠𝗫 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗽𝗮𝗿𝘁𝗻𝗲𝗿𝘀. As the stack evolves, @tensormesh keeps building for what's next. ▶️ session Replay: tinyurl.com/GTC-talk

English
0
0
1
121
Kuntai Du retweetledi
Junchen Jiang
Junchen Jiang@JunchenJiang·
🚀 LMCache has officially been out for 1.5 years now! Within its success, LMCache has become the default KV-cache library for open-source LLM inference (CPU offload, P2P sharing, multi-backend storage, vLLM/SGLang integration, and more). As a PyTorch Foundation Ecosystem project, LMCache is now used by enterprise leaders across the industry (GKE, AWS, Nvidia's Dynamo, llm-d…). 🤔What’s the secret to our product?? 🔎 Come see yourself: arxiv.org/pdf/2510.09665 ♥️ A huge thank you to our contributors and community, you’ve influenced what makes LMCache today. (@lmcache) #KVCache #LMCache #LLM #vLLM
Junchen Jiang tweet media
English
5
2
16
1.6K
Kuntai Du
Kuntai Du@this_will_echo·
Github is not acting normal... Our LMCache logo suddenly disappeared today, we didn't make any change. And we cannot even clone the repo using ssh. Github bad bad.
Kuntai Du tweet media
English
0
0
0
206