Junchen Jiang

117 posts

Junchen Jiang

@JunchenJiang

CS Prof @ UChicago https://t.co/U01oOWGnip (Fast distributed LLM inference) https://t.co/hoetjwXKIt (Best KV cache layer)

Chicago, IL Katılım Eylül 2012

320 Takip Edilen505 Takipçiler

Junchen Jiang retweetledi

Tensormesh@tensormesh·3d

"𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗶𝘀 𝘁𝗵𝗲 𝗻𝗲𝘄 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸" — Kevin Deierling, SVP Networking #NVIDIA At his #GTC talk last week, he highlighted 𝗖𝗠𝗫 and 𝗖𝗮𝗰𝗵𝗲𝗕𝗹𝗲𝗻𝗱 from 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 (@tensormesh) were part of the new KV Cache memory stack for agents, and recognized @tensormesh among the 𝗖𝗠𝗫 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗽𝗮𝗿𝘁𝗻𝗲𝗿𝘀. As the stack evolves, @tensormesh keeps building for what's next. ▶️ session Replay: tinyurl.com/GTC-talk

English

319

Junchen Jiang retweetledi

Hanchen Li@lihanc02·18 Mar

Some former colleagues from @lmcache shared this photo from the GTC Keynote. I am honestly surprised how fast the team has been growing. (We were a research lab on 2 A40 GPUs in 2023!) btw I think they are hiring LLM hackers (or product hackers I am not sure 🤪, you should just check with @JunchenJiang @ChengYihuaA) #GTC #LLM #Inference #Nvidia #LMCache #KVCache

English

692

Junchen Jiang retweetledi

Tensormesh@tensormesh·17 Mar

🔴 Live from #GTC2026 On the floor with our Chief Scientist @this_will_echo and CTO #Yihua Chang — #KVCache is the hottest topic of the day. Even Jensen opened with it. 🎙️They covered topics like: #CacheBlend, @lmcache 0.4.0. and the super cool collab with @nvidia around a bot called #reachy using LMCache under the hood for 20x speedup #GTC2026 #KVCache #LMCache #TensorMesh

English

333

Junchen Jiang@JunchenJiang·12 Mar

Talk: nvidia.com/gtc/session-ca… github: github.com/lmcache/lmcache

English

Junchen Jiang@JunchenJiang·12 Mar

💡Rare moment when industry demand + academic curiosity collide, this time on KV $ (KV cache) Industry: pushing KV$ beyond CPU RAM to reuse more context and save compute. Academia: treating KV$ as a new kind of ML data — compressible, manipulable, optimizable. That’s exactly where LMCache sits. NVIDIA GTC talk Wed (3/18) 10am — sharing our vision. @lmcache @tensormesh #kvcache #llminference #gtc, #nvidia

English

205

Junchen Jiang retweetledi

Tensormesh@tensormesh·10 Mar

🗣️Next week at 𝗡𝗩𝗜𝗗𝗜𝗔 𝗚𝗧𝗖 Our CEO @JunchenJiang joins engineers from @nvidia & @TencentGlobal to unpack 𝗞𝗩 𝗰𝗮𝗰𝗵𝗲 𝗱𝗲𝘀𝗶𝗴𝗻 — one of the most critical perf levers in production LLM inference. Featuring: FlexKV • LMCache • Dynamo KVBM 📅 Mar 18 | 10 AM 📍 San Jose, CA Reserve your seat 👉 nvidia.com/gtc/session-ca… #NVIDIAGTC #AIInference #KVCache #TensorMesh

English

599

Junchen Jiang retweetledi

Vikram@msharmavikram·3 Mar

Join us at GTC’26 for a tutorial where Ziqi Fan, Ph.D. (@NVIDIAAI Dynamo), Fan YE (Tencent Cloud), @JunchenJiang (@lmcache / @tensormesh ), and I break down everything you need to know about KV cache design for production LLM inference. (2/5)

English

764

Junchen Jiang retweetledi

Tensormesh@tensormesh·4 Mar

Most teams running MemGPT agents are wasting 56% of their prefill compute every turn, by default. The fix isn't better hardware. It's a different caching strategy. Read the full technical breakdown: tensormesh.ai/blog-posts/pre…

English

187

Junchen Jiang retweetledi

Tensormesh@tensormesh·17 Şub

Running open-source models is easy. Running them well in production isn’t. In today’s spotlight, Nick Barcet (@nijaba), Head of GTM @tensormesh, shares what Tensormesh is built to solve — and the 𝗽𝗲𝗼𝗽𝗹𝗲 and culture behind the execution. If you’re running OSS models and want dramatically more efficient inference 🎯Try @tensormesh with $100 in free GPU credits: lnkd.in/g-gXYtaV #AI #LLM #Inference #kvcache

English

178

Junchen Jiang retweetledi

Tensormesh@tensormesh·10 Şub

𝗟𝗠𝗖𝗮𝗰𝗵𝗲 × 𝗣𝘆𝗧𝗼𝗿𝗰𝗵 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻: 𝗪𝗵𝘆 𝗜𝘁 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 Learn why open-sourcing @lmcache was a natural step for @tensormesh from our 𝗖𝗘𝗢, @JunchenJiang. And the role it plays in the @PyTorch ecosystem — helping shape the future of open AI infrastructure. 🎥 Watch the full interview: 👉 y2u.be/zHW4Zzd7pjI

English

240

Junchen Jiang retweetledi

Tensormesh@tensormesh·22 Oca

Officially live speaking at @nvidia Dynamo Day, our CTO and Cofounder Yihua Cheng! Watch now at nvda.ws/4svwy6m! #NVIDIA #DynamoDay #AI #KVCache #LLM #GPU

English

166

Junchen Jiang retweetledi

Tensormesh@tensormesh·22 Oca

😲 450 million tokens reprocessed monthly when you should process each document once. That's what happens when 15 attorneys query 500 contracts and your LLM has no persistent memory. We built @tensormesh to fix this. 5-10x cost reduction through intelligent caching. 📖 Full blog: tensormesh.ai/blog-posts/fix… 🚀 Try our beta now with ($100 credit): app.tensormesh.ai/login #AI #LLM #MachineLearning #DocumentAI #MLOps #ArtificialIntelligence #EnterpriseTech

English

Junchen Jiang retweetledi

Tensormesh@tensormesh·20 Oca

𝗧𝗲𝗻𝘀𝗼𝗿𝗺𝗲𝘀𝗵: 𝗙𝗿𝗼𝗺 𝗔𝗰𝗮𝗱𝗲𝗺𝗶𝗮 𝘁𝗼 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 In this clip, our 𝗖𝗘𝗢 𝗮𝗻𝗱 𝗰𝗼-𝗳𝗼𝘂𝗻𝗱𝗲𝗿, Junchen Jiang, explains what it really takes to 𝗯𝘂𝗶𝗹𝗱 𝗮 𝗰𝗼𝗺𝗽𝗮𝗻𝘆 at the intersection of 𝗮𝗰𝗮𝗱𝗲𝗺𝗶𝗮, 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲, 𝗮𝗻𝗱 𝗶𝗻𝗱𝘂𝘀𝘁𝗿𝘆. Building an open-source company isn’t just about code prowess. It requires a deep, long-term technical vision, and a team willing to commit to that vision long enough to turn research into real systems. At @tensormesh , that foundation was already in place. 🎥 Watch the full interview on YouTube: 👉 y2u.be/zHW4Zzd7pjI #AIInfrastructure #KVCache #Tensormesh

English

127

Junchen Jiang retweetledi

Tensormesh@tensormesh·13 Oca

𝗞𝗩 𝗖𝗮𝗰𝗵𝗲: 𝗧𝗵𝗲 𝗠𝗶𝘀𝘀𝗶𝗻𝗴 𝗣𝗶𝗲𝗰𝗲 ... Our 𝗖𝗘𝗢 and 𝗰𝗼-𝗳𝗼𝘂𝗻𝗱𝗲𝗿, @JunchenJiang , reflects on the moment when years of systems research converged into a clear insight: 𝗞𝗩 𝗰𝗮𝗰𝗵𝗶𝗻𝗴 𝘄𝗮𝘀𝗻’𝘁 𝗷𝘂𝘀𝘁 𝗮𝗻 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. 𝗜𝘁 𝘄𝗮𝘀 𝗮 𝗳𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝘀𝗵𝗶𝗳𝘁 in how LLM inference should work. 🎥 Watch the full interview on YouTube: 👉 y2u.be/zHW4Zzd7pjI #LLMInference #KVCache #OpenSource #TensorMesh #AIMemory

English

115

Junchen Jiang retweetledi

Sumanth@Sumanth_077·16 Kas

Fastest inference engine for LLMs! LMCache is an LLM serving engine that reduce Time to First Token (TTFT) and increase throughput, especially under long-context scenarios. Here is why it's a game changer: LMCache caches and reuses key-value (KV) pairs from repeated text across different storage tiers (GPU, CPU DRAM, and local disk). Unlike traditional prefix-only caching, it can reuse any repeated text in any serving engine instance. This allows LMCache to conserve GPU resources while reducing response latency. Key Features: ⚡ High-performance CPU KV cache offloading 🔀 Disaggregated prefill for efficient scaling 🤝 Peer-to-peer KV cache sharing It's 100% Open Source

English

620

43.3K

Junchen Jiang retweetledi

Huaizheng Zhang@zhzHNN·12 Oca

Hi @danielhanchen, we are organizing a small RL infra workshop with @GenAI_is_real @charlie_ruan @this_will_echo @JunchenJiang on Jan-17. Would you like to join us to share a unsloth's recent progress?

Daniel Han@danielhanchen

Streaming datasets allows you to load gigantic datasets on the fly! Cool walkthrough by Daniel from HuggingFace on doing huge data fine-tuning runs with @UnslothAI!

English

1.9K

Junchen Jiang retweetledi

Tensormesh@tensormesh·6 Oca

In this new interview, our CEO & co-founder @JunchenJiang explains why KV cache — the internal memory of LLMs , is becoming the 𝗻𝗲𝘅𝘁 𝗕𝗶𝗴 𝗗𝗮𝘁𝗮 layer for AI, and how @tensormesh tackles large-scale inference. 🎥 Watch the full interview: [youtu.be/zHW4Zzd7pjI] Topics covered: • The blind spot in LLM inference infrastructure • KV cache: the missing piece • TensorMesh’s path from academia to production • Why inference costs escalate at scale • What @lmcache joining the @PyTorch ecosystem represents #LLMInference #KVCache #OpenSource #PyTorch

YouTube

English

121

Junchen Jiang retweetledi

GMI Cloud@gmi_cloud·3 Oca

Happy 2026 🥂 First post of the year: a technical benchmark. In a joint study with @tensormesh , we achieved: - 4× TTFT improvement - Prefix cache hit rate >50% Using SSD-augmented KVCache on realistic multi-turn LLM traffic. Full write-up on GMI Cloud: gmicloud.ai/blog/gmi-cloud…

English

695

Junchen Jiang@JunchenJiang·10 Ara

🚀 LMCache has officially been out for 1.5 years now! Within its success, LMCache has become the default KV-cache library for open-source LLM inference (CPU offload, P2P sharing, multi-backend storage, vLLM/SGLang integration, and more). As a PyTorch Foundation Ecosystem project, LMCache is now used by enterprise leaders across the industry (GKE, AWS, Nvidia's Dynamo, llm-d…). 🤔What’s the secret to our product?? 🔎 Come see yourself: arxiv.org/pdf/2510.09665 ♥️ A huge thank you to our contributors and community, you’ve influenced what makes LMCache today. (@lmcache) #KVCache #LMCache #LLM #vLLM

English

1.6K

Keşfet

@tensormesh @lmcache @ChengYihuaA @this_will_echo @nvidia @TencentGlobal @NVIDIAAI @nijaba