Junchen Jiang

117 posts

Junchen Jiang banner
Junchen Jiang

Junchen Jiang

@JunchenJiang

CS Prof @ UChicago https://t.co/U01oOWGnip (Fast distributed LLM inference) https://t.co/hoetjwXKIt (Best KV cache layer)

Chicago, IL Katılım Eylül 2012
320 Takip Edilen505 Takipçiler
Junchen Jiang retweetledi
Tensormesh
Tensormesh@tensormesh·
"𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗶𝘀 𝘁𝗵𝗲 𝗻𝗲𝘄 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸" — Kevin Deierling, SVP Networking #NVIDIA At his #GTC talk last week, he highlighted 𝗖𝗠𝗫 and 𝗖𝗮𝗰𝗵𝗲𝗕𝗹𝗲𝗻𝗱 from 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 (@tensormesh) were part of the new KV Cache memory stack for agents, and recognized @tensormesh among the 𝗖𝗠𝗫 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗽𝗮𝗿𝘁𝗻𝗲𝗿𝘀. As the stack evolves, @tensormesh keeps building for what's next. ▶️ session Replay: tinyurl.com/GTC-talk
Tensormesh tweet mediaTensormesh tweet mediaTensormesh tweet media
English
0
2
9
319
Junchen Jiang retweetledi
Hanchen Li
Hanchen Li@lihanc02·
Some former colleagues from @lmcache shared this photo from the GTC Keynote. I am honestly surprised how fast the team has been growing. (We were a research lab on 2 A40 GPUs in 2023!) btw I think they are hiring LLM hackers (or product hackers I am not sure 🤪, you should just check with @JunchenJiang @ChengYihuaA) #GTC #LLM #Inference #Nvidia #LMCache #KVCache
Hanchen Li tweet media
English
0
3
14
692
Junchen Jiang
Junchen Jiang@JunchenJiang·
💡Rare moment when industry demand + academic curiosity collide, this time on KV $ (KV cache) Industry: pushing KV$ beyond CPU RAM to reuse more context and save compute. Academia: treating KV$ as a new kind of ML data — compressible, manipulable, optimizable. That’s exactly where LMCache sits. NVIDIA GTC talk Wed (3/18) 10am — sharing our vision. @lmcache @tensormesh #kvcache #llminference #gtc, #nvidia
Junchen Jiang tweet media
English
2
0
4
205
Junchen Jiang retweetledi
Tensormesh
Tensormesh@tensormesh·
🗣️Next week at 𝗡𝗩𝗜𝗗𝗜𝗔 𝗚𝗧𝗖 Our CEO @JunchenJiang joins engineers from @nvidia & @TencentGlobal to unpack 𝗞𝗩 𝗰𝗮𝗰𝗵𝗲 𝗱𝗲𝘀𝗶𝗴𝗻 — one of the most critical perf levers in production LLM inference. Featuring: FlexKV • LMCache • Dynamo KVBM 📅 Mar 18 | 10 AM 📍 San Jose, CA Reserve your seat 👉 nvidia.com/gtc/session-ca… #NVIDIAGTC #AIInference #KVCache #TensorMesh
Tensormesh tweet media
English
0
2
9
599
Junchen Jiang retweetledi
Vikram
Vikram@msharmavikram·
Join us at GTC’26 for a tutorial where Ziqi Fan, Ph.D. (@NVIDIAAI Dynamo), Fan YE (Tencent Cloud), @JunchenJiang (@lmcache / @tensormesh ), and I break down everything you need to know about KV cache design for production LLM inference. (2/5)
English
1
3
7
764
Junchen Jiang retweetledi
Tensormesh
Tensormesh@tensormesh·
Most teams running MemGPT agents are wasting 56% of their prefill compute every turn, by default. The fix isn't better hardware. It's a different caching strategy. Read the full technical breakdown: tensormesh.ai/blog-posts/pre…
Tensormesh tweet media
English
0
1
4
187
Junchen Jiang retweetledi
Tensormesh
Tensormesh@tensormesh·
Running open-source models is easy. Running them well in production isn’t. In today’s spotlight, Nick Barcet (@nijaba), Head of GTM @tensormesh, shares what Tensormesh is built to solve — and the 𝗽𝗲𝗼𝗽𝗹𝗲 and culture behind the execution. If you’re running OSS models and want dramatically more efficient inference 🎯Try @tensormesh with $100 in free GPU credits: lnkd.in/g-gXYtaV #AI #LLM #Inference #kvcache
English
0
2
4
178
Junchen Jiang retweetledi
Tensormesh
Tensormesh@tensormesh·
𝗟𝗠𝗖𝗮𝗰𝗵𝗲 × 𝗣𝘆𝗧𝗼𝗿𝗰𝗵 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻: 𝗪𝗵𝘆 𝗜𝘁 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 Learn why open-sourcing @lmcache was a natural step for @tensormesh from our 𝗖𝗘𝗢, @JunchenJiang. And the role it plays in the @PyTorch ecosystem — helping shape the future of open AI infrastructure. 🎥 Watch the full interview: 👉 y2u.be/zHW4Zzd7pjI
English
0
1
9
240
Junchen Jiang retweetledi
Tensormesh
Tensormesh@tensormesh·
😲 450 million tokens reprocessed monthly when you should process each document once. That's what happens when 15 attorneys query 500 contracts and your LLM has no persistent memory. We built @tensormesh to fix this. 5-10x cost reduction through intelligent caching. 📖 Full blog: tensormesh.ai/blog-posts/fix… 🚀 Try our beta now with ($100 credit): app.tensormesh.ai/login #AI #LLM #MachineLearning #DocumentAI #MLOps #ArtificialIntelligence #EnterpriseTech
English
0
1
1
97
Junchen Jiang retweetledi
Tensormesh
Tensormesh@tensormesh·
𝗧𝗲𝗻𝘀𝗼𝗿𝗺𝗲𝘀𝗵: 𝗙𝗿𝗼𝗺 𝗔𝗰𝗮𝗱𝗲𝗺𝗶𝗮 𝘁𝗼 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 In this clip, our 𝗖𝗘𝗢 𝗮𝗻𝗱 𝗰𝗼-𝗳𝗼𝘂𝗻𝗱𝗲𝗿, Junchen Jiang, explains what it really takes to 𝗯𝘂𝗶𝗹𝗱 𝗮 𝗰𝗼𝗺𝗽𝗮𝗻𝘆 at the intersection of 𝗮𝗰𝗮𝗱𝗲𝗺𝗶𝗮, 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲, 𝗮𝗻𝗱 𝗶𝗻𝗱𝘂𝘀𝘁𝗿𝘆. Building an open-source company isn’t just about code prowess. It requires a deep, long-term technical vision, and a team willing to commit to that vision long enough to turn research into real systems. At @tensormesh , that foundation was already in place. 🎥 Watch the full interview on YouTube: 👉 y2u.be/zHW4Zzd7pjI #AIInfrastructure #KVCache #Tensormesh
English
0
1
4
127
Junchen Jiang retweetledi
Tensormesh
Tensormesh@tensormesh·
𝗞𝗩 𝗖𝗮𝗰𝗵𝗲: 𝗧𝗵𝗲 𝗠𝗶𝘀𝘀𝗶𝗻𝗴 𝗣𝗶𝗲𝗰𝗲 ... Our 𝗖𝗘𝗢 and 𝗰𝗼-𝗳𝗼𝘂𝗻𝗱𝗲𝗿, @JunchenJiang , reflects on the moment when years of systems research converged into a clear insight: 𝗞𝗩 𝗰𝗮𝗰𝗵𝗶𝗻𝗴 𝘄𝗮𝘀𝗻’𝘁 𝗷𝘂𝘀𝘁 𝗮𝗻 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. 𝗜𝘁 𝘄𝗮𝘀 𝗮 𝗳𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝘀𝗵𝗶𝗳𝘁 in how LLM inference should work. 🎥 Watch the full interview on YouTube: 👉 y2u.be/zHW4Zzd7pjI #LLMInference #KVCache #OpenSource #TensorMesh #AIMemory
English
0
1
5
115
Junchen Jiang retweetledi
Sumanth
Sumanth@Sumanth_077·
Fastest inference engine for LLMs! LMCache is an LLM serving engine that reduce Time to First Token (TTFT) and increase throughput, especially under long-context scenarios. Here is why it's a game changer: LMCache caches and reuses key-value (KV) pairs from repeated text across different storage tiers (GPU, CPU DRAM, and local disk). Unlike traditional prefix-only caching, it can reuse any repeated text in any serving engine instance. This allows LMCache to conserve GPU resources while reducing response latency. Key Features: ⚡ High-performance CPU KV cache offloading 🔀 Disaggregated prefill for efficient scaling 🤝 Peer-to-peer KV cache sharing It's 100% Open Source
Sumanth tweet media
English
20
91
620
43.3K
Junchen Jiang retweetledi
Tensormesh
Tensormesh@tensormesh·
In this new interview, our CEO & co-founder @JunchenJiang explains why KV cache — the internal memory of LLMs , is becoming the 𝗻𝗲𝘅𝘁 𝗕𝗶𝗴 𝗗𝗮𝘁𝗮 layer for AI, and how @tensormesh tackles large-scale inference. 🎥 Watch the full interview: [youtu.be/zHW4Zzd7pjI] Topics covered: • The blind spot in LLM inference infrastructure • KV cache: the missing piece • TensorMesh’s path from academia to production • Why inference costs escalate at scale • What @lmcache joining the @PyTorch ecosystem represents #LLMInference #KVCache #OpenSource #PyTorch
YouTube video
YouTube
English
0
2
5
121
Junchen Jiang retweetledi
GMI Cloud
GMI Cloud@gmi_cloud·
Happy 2026 🥂 First post of the year: a technical benchmark. In a joint study with @tensormesh , we achieved: - 4× TTFT improvement - Prefix cache hit rate >50% Using SSD-augmented KVCache on realistic multi-turn LLM traffic. Full write-up on GMI Cloud: gmicloud.ai/blog/gmi-cloud…
English
0
3
14
695
Junchen Jiang
Junchen Jiang@JunchenJiang·
🚀 LMCache has officially been out for 1.5 years now! Within its success, LMCache has become the default KV-cache library for open-source LLM inference (CPU offload, P2P sharing, multi-backend storage, vLLM/SGLang integration, and more). As a PyTorch Foundation Ecosystem project, LMCache is now used by enterprise leaders across the industry (GKE, AWS, Nvidia's Dynamo, llm-d…). 🤔What’s the secret to our product?? 🔎 Come see yourself: arxiv.org/pdf/2510.09665 ♥️ A huge thank you to our contributors and community, you’ve influenced what makes LMCache today. (@lmcache) #KVCache #LMCache #LLM #vLLM
Junchen Jiang tweet media
English
0
2
16
1.6K