LMCache Lab

145 posts

LMCache Lab

LMCache Lab

@lmcache

🧪 Open-Source Team that maintains LMCache and Production Stack 🤖 Democratizing AI by providing efficient LLM serving for ALL

Github, Online เข้าร่วม Eylül 2024
48 กำลังติดตาม798 ผู้ติดตาม
LMCache Lab
LMCache Lab@lmcache·
🧵 LMCache was spotlighted at Jensen Huang's GTC 2026 keynote — a real milestone for the community! A late post, intentionally. Just one more dose of GTC after the feed rush settles. ☕ For those new here: LMCache is a KV cache sharing layer that cuts LLM serving costs & latency. It works seamlessly with vLLM and SGLang, minimal setup. But the real story isn't the tech. It's the community that built it. In any role from researcher 🧐 , engineer 🧑‍💻, student 👩‍🎓, or just curious, there's a place for you here. 🔗 Explore LMCache 💻 Code: github.com/LMCache/LMCache 📖 Docs: docs.lmcache.ai 📝 Blog: blog.lmcache.ai/en/ ⭐ Star the repo, open an issue, submit a PR. Every contribution matters! The future of AI infrastructure is open. Come build it with us. #LMCache #KVCache #NVIDIAGTC #LLM #opensource
LMCache Lab tweet media
English
0
0
2
142
LMCache Lab รีทวีตแล้ว
Tensormesh
Tensormesh@tensormesh·
"𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗶𝘀 𝘁𝗵𝗲 𝗻𝗲𝘄 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸" — Kevin Deierling, SVP Networking #NVIDIA At his #GTC talk last week, he highlighted 𝗖𝗠𝗫 and 𝗖𝗮𝗰𝗵𝗲𝗕𝗹𝗲𝗻𝗱 from 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 (@tensormesh) were part of the new KV Cache memory stack for agents, and recognized @tensormesh among the 𝗖𝗠𝗫 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗽𝗮𝗿𝘁𝗻𝗲𝗿𝘀. As the stack evolves, @tensormesh keeps building for what's next. ▶️ session Replay: tinyurl.com/GTC-talk
Tensormesh tweet mediaTensormesh tweet mediaTensormesh tweet media
English
0
2
9
319
LMCache Lab
LMCache Lab@lmcache·
We ran a tiny one-shot experiment from a one-shot SWE-bench task with Claude Code to study the Context Engineering & Reuse Pattern: • 92 LLM calls invoked • ~2M input tokens • 13 minutes runtime • 92% prefix reuse rate With prefix caching, this single task drops from $6.00 → $1.15 in input cost (≈ 81% savings) and dramatically reduces TTFT. This trace shows Claude Code is essentially a prefix-reuse machine with warm-up calls to prime the cache, parallel multi-agent system, and a ReAct-style execution loop — all optimized for KV cache reuse. Blog post: huggingface.co/blog/kobe0938/… Raw trace: github.com/kobe0938/blog/… Trace visualizer: v0-llm-agent-dashboard.vercel.app If you care about context engineering, agent architecture, and KV-cache economics, this is a concrete, end-to-end look under the hood. You may paste the raw trace to the trace visualizer to peek more.
LMCache Lab tweet media
English
2
1
27
284.7K
LMCache Lab
LMCache Lab@lmcache·
Help wanted: we would like to conduct another round of experiments on running agents listed here: #software-development" target="_blank" rel="nofollow noopener">github.com/kyrolabs/aweso… to analyze the reuse pattern of agents and to see if LMCache can be of help in accelerating open source agent applications More details: github.com/LMCache/LMCach… github.com/LMCache/lmcach… If that sounds interesting to you or aligns with your research interest, dm us in the slack channel🫡
English
0
0
2
410
LMCache Lab
LMCache Lab@lmcache·
Yesterday we hosted our first LMCache office hours! Jiayi Yao, Research Engineer at Tensormesh and one of the top contributors, covered LMCache architecture, key performance optimizations, and benchmark results-based on the newly published technical report available at arxiv.org/pdf/2510.09665. You can watch the recording here: youtu.be/y14ruG6CNGE?si… Join us for the next LMCache office hours on December 11. Register to get it added to your calendar: lmcache-officehours.zapier.app
YouTube video
YouTube
English
0
1
6
784
LMCache Lab
LMCache Lab@lmcache·
Check out our new blog and colab with Google GKE about LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage blog.lmcache.ai/2025-10-07-LMC…
LMCache Lab tweet media
English
0
3
6
492
LMCache Lab
LMCache Lab@lmcache·
Gotta Cache 'Em All.
LMCache Lab tweet media
English
0
0
3
269
LMCache Lab
LMCache Lab@lmcache·
... Through the implementation of LMCache Plugin Framework and lmcache_frontend, we gained an important insight: when handling specific scenario requirements in open source projects, functional abstraction and universal design are crucial. The success of Plugin Framework lies in that it doesn’t directly implement various customization requirements, but provides a flexible extension mechanism. LMCache found a balance point through Plugin Framework, satisfying diverse requirements while maintaining project maintainability and extensibility. This design pattern is also reflected in the LMCache Remote External Connector framework and LMCache External backend framework, and is worth promoting in future development. By defining clear extension interfaces and specifications, we enable the community to meet specific requirements without modifying core code, thus achieving long-term healthy development of the project. We hope that as LMCache becomes increasingly powerful, it can continue to maintain healthy development.
English
0
0
1
232
LMCache Lab
LMCache Lab@lmcache·
In large-scale language model inference scenarios, efficient memory management and KV cache optimization are crucial. LMCache, as a KV cache management system specifically designed for vLLM, requires more flexible extension mechanisms to meet the needs of monitoring, troubleshooting, and state insight when facing complex production environments. However, instead of directly customizing the lmcache core, we introduced the LMCache Plugin Framework - a lightweight yet powerful plugin system that allows developers to run custom scripts within LMCache processes. Based on this plugin framework, we implemented lmcache_frontend(github.com/LMCache/lmcach…), a monitoring and proxy service that runs as a subprocess only on scheduler nodes. It provides a Web interface for cluster status visualization and implements request forwarding functionality through HTTP proxy services. This design not only facilitates deployment and management but also provides developers with an excellent plugin implementation example, demonstrating how to use the Plugin Framework to enhance system observability and control capabilities. Read more: blog.lmcache.ai/2025-09-23-lmc… Doc: docs.lmcache.ai/developer_guid…
LMCache Lab tweet media
English
1
5
13
858
Saikat Sur
Saikat Sur@sursaikat·
My writing on LMCache techniques: - 𝐂𝐚𝐜𝐡𝐞𝐆𝐞𝐧 - How to quickly transfer KV cache to GPU memory . - 𝐂𝐚𝐜𝐡𝐞𝐁𝐥𝐞𝐧𝐝 - How to quickly combine multiple KV caches on demand . @lmcache @JunchenJiang LMCache github: github.com/LMCache/LMCache
Saikat Sur tweet media
English
1
1
4
77
LMCache Lab รีทวีตแล้ว
EyeingAI
EyeingAI@EyeingAI·
Wow… LLMs can now get insane speed & memory boosts. This open-source trick makes any large language model faster than you thought possible... LMCache caches and reuses key-value data across instances and hardware, so your AI: – Remembers context – Handles multi-round Q&A effortlessly – Runs faster and smoother 🔥 Why this is next-level: • Prompt Caching: Instantly pull long conversations, AI actually remembers stuff now. • Fast RAG: Combine cached data for lightning-accurate results. • Scale Like a Boss: No messy GPU routing. • Cheaper AF: Compression tech keeps costs low. • Lightning Speed: Streaming + decompression = almost zero lag. • Plug & Play: Works with vLLM, TGI, and all your favorite LLM engines. • Better Quality: Offline upgrades make AI smarter than ever. If your AI deals with context-heavy conversations or RAG, this is the hack that changes everything. Serve more users, slash compute waste, and see your AI dominate every task.
GIF
English
26
15
66
52.7K
LMCache Lab รีทวีตแล้ว
Daily Dose of Data Science
Daily Dose of Data Science@DailyDoseOfDS_·
The fastest serving engine for LLMs is here (open-source)! LMCache is an LLM serving engine designed to reduce time-to-first-token and increase throughput, especially under long-context scenarios. It boosts vLLM with 7x faster access to 100x more KV caches. 100% open-source!
Daily Dose of Data Science tweet media
English
15
190
1.2K
66.8K
LMCache Lab รีทวีตแล้ว
Yacine Mahdid
Yacine Mahdid@yacinelearning·
I got deep respect for niche open source project in AI. you gotta have deep expertise and a good heart to run those. kuddos to teams like LMCache on keeping the open source dream alive ❤️
Yacine Mahdid tweet media
English
5
27
447
17.4K