Tensormesh

48 posts

Tensormesh

@tensormesh

Powering the next generation of AI infrastructure.

San Francisco, CA Katılım Ekim 2025

6 Takip Edilen61 Takipçiler

Tensormesh@tensormesh·3d

"𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗶𝘀 𝘁𝗵𝗲 𝗻𝗲𝘄 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸" — Kevin Deierling, SVP Networking #NVIDIA At his #GTC talk last week, he highlighted 𝗖𝗠𝗫 and 𝗖𝗮𝗰𝗵𝗲𝗕𝗹𝗲𝗻𝗱 from 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 (@tensormesh) were part of the new KV Cache memory stack for agents, and recognized @tensormesh among the 𝗖𝗠𝗫 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗽𝗮𝗿𝘁𝗻𝗲𝗿𝘀. As the stack evolves, @tensormesh keeps building for what's next. ▶️ session Replay: tinyurl.com/GTC-talk

English

319

Tensormesh@tensormesh·17 Mar

🔴 Live from #GTC2026 On the floor with our Chief Scientist @this_will_echo and CTO #Yihua Chang — #KVCache is the hottest topic of the day. Even Jensen opened with it. 🎙️They covered topics like: #CacheBlend, @lmcache 0.4.0. and the super cool collab with @nvidia around a bot called #reachy using LMCache under the hood for 20x speedup #GTC2026 #KVCache #LMCache #TensorMesh

English

333

Tensormesh@tensormesh·11 Mar

We're going to @nvidia GTC 2026 🎉 Booth 7022 South Market Lot | March 16–19 | San Jose, CA Stop by for: →Live KV cache optimization demos →Meet the team →Tensormesh swag If GPU inference costs are killing your margins, let's talk. #nvidiagtc2026 #nvidia #gpu #aiinference

English

111

Tensormesh@tensormesh·10 Mar

🗣️Next week at 𝗡𝗩𝗜𝗗𝗜𝗔 𝗚𝗧𝗖 Our CEO @JunchenJiang joins engineers from @nvidia & @TencentGlobal to unpack 𝗞𝗩 𝗰𝗮𝗰𝗵𝗲 𝗱𝗲𝘀𝗶𝗴𝗻 — one of the most critical perf levers in production LLM inference. Featuring: FlexKV • LMCache • Dynamo KVBM 📅 Mar 18 | 10 AM 📍 San Jose, CA Reserve your seat 👉 nvidia.com/gtc/session-ca… #NVIDIAGTC #AIInference #KVCache #TensorMesh

English

599

Tensormesh@tensormesh·4 Mar

Most teams running MemGPT agents are wasting 56% of their prefill compute every turn, by default. The fix isn't better hardware. It's a different caching strategy. Read the full technical breakdown: tensormesh.ai/blog-posts/pre…

English

187

Tensormesh retweetledi

Vikram@msharmavikram·3 Mar

Join us at GTC’26 for a tutorial where Ziqi Fan, Ph.D. (@NVIDIAAI Dynamo), Fan YE (Tencent Cloud), @JunchenJiang (@lmcache / @tensormesh ), and I break down everything you need to know about KV cache design for production LLM inference. (2/5)

English

764

Tensormesh@tensormesh·3 Mar

𝗧𝗵𝗲 𝗕𝗹𝗶𝗻𝗱 𝗦𝗽𝗼𝘁 𝗶𝗻 𝗟𝗟𝗠 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 In this clip, our 𝗖𝗘𝗢 𝗮𝗻𝗱 𝗰𝗼-𝗳𝗼𝘂𝗻𝗱𝗲𝗿, Junchen Jiang explains why AI infrastructure isn’t just about bigger GPUs — it’s about models internal memory, 𝗞𝗩 𝗰𝗮𝗰𝗵𝗲, which is emerging as the next 𝗕𝗶𝗴 𝗗𝗮𝘁𝗮 layer for AI. Watch the full interview: 👉 y2u.be/zHW4Zzd7pjI

English

Tensormesh@tensormesh·26 Şub

Most teams running LLM inference have no idea how much they're overpaying. GPU utilization? Invisible. KV cache efficiency? No clue. We just shipped Tensormesh Beta 2 with real-time metrics, plus one-click deployment and pre-loaded Qwen3/Mistral models. $100 free GPU credits to try it: app.tensormesh.ai/login

English

Tensormesh@tensormesh·24 Şub

Every time an LLM re-reads your context, you're paying for it twice. @tensormesh persists KV Cache beyond GPU memory — so repeated context, long agent workflows, and multi-turn chats stop burning compute. Result? ✅ Lower latency ✅ Higher GPU efficiency ✅ Better margins Claim $100 in free GPU credits and see it on your own workloads: tensormesh.ai

English

Tensormesh@tensormesh·18 Şub

Every AI agent run starts from scratch. That's not an agent problem, it's a caching problem. Standard prefix caching breaks when dynamic content shifts position. CacheBlend fixes this: → 63-85% cache hit rates on skill content → ~85% served from cache → Lower latency + token costs Full breakdown: tensormesh.ai/blog-posts/age…

English

Tensormesh@tensormesh·17 Şub

Running open-source models is easy. Running them well in production isn’t. In today’s spotlight, Nick Barcet (@nijaba), Head of GTM @tensormesh, shares what Tensormesh is built to solve — and the 𝗽𝗲𝗼𝗽𝗹𝗲 and culture behind the execution. If you’re running OSS models and want dramatically more efficient inference 🎯Try @tensormesh with $100 in free GPU credits: lnkd.in/g-gXYtaV #AI #LLM #Inference #kvcache

English

178

Tensormesh@tensormesh·11 Şub

We analyzed RepoAgent's prompt caching and found a 25x gap between theory and reality. Hit rate: 3.4% Reusable content: 85.9% The problem? Prefix caching breaks when variables shift token positions, even when content is identical. Non-prefix caching (CacheBlend) closes that gap. Full breakdown: tensormesh.ai/blog-posts/blo…

English

Tensormesh@tensormesh·10 Şub

𝗟𝗠𝗖𝗮𝗰𝗵𝗲 × 𝗣𝘆𝗧𝗼𝗿𝗰𝗵 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻: 𝗪𝗵𝘆 𝗜𝘁 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 Learn why open-sourcing @lmcache was a natural step for @tensormesh from our 𝗖𝗘𝗢, @JunchenJiang. And the role it plays in the @PyTorch ecosystem — helping shape the future of open AI infrastructure. 🎥 Watch the full interview: 👉 y2u.be/zHW4Zzd7pjI

English

240

Tensormesh@tensormesh·4 Şub

Open-source AI just became better than GPT-4o and 92% cheaper Most companies haven't noticed yet, but the ones who have are building 5-year leads. New Massachusetts Institute of Technology research reveals a striking market inefficiency: open-source AI models deliver 90% of closed-source performance at 87% lower cost, yet they account for only 20% of usage. 📖 Read Full Blog Here: tensormesh.ai/blog-posts/blo… 🚀Try Tensormesh with $100 in free GPU Credits: app.tensormesh.ai/login?utm_sour… #OpenSourceAI #AIInfrastructure #MachineLearning #EnterpriseAI #AI

English

Tensormesh@tensormesh·3 Şub

At the recent @nvidia 𝗗𝘆𝗻𝗮𝗺𝗼 Day, our CTO, @ChengYihuaA , shared how KV Caching becomes a real bottleneck for inference at scale, and what it takes to solve it, including integration with engines like #Dynamo, 𝗱𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗲𝗱 CPU cache sharing, and outlines @lmcache roadmap. 𝗧𝗼𝗽𝗶𝗰𝘀 𝗰𝗼𝘃𝗲𝗿𝗲𝗱: • 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 Use cases • LMCache x Dynamo Integration • 𝗗𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗲𝗱 𝗖𝗣𝗨 𝗞𝗩𝗖𝗮𝗰𝗵𝗲 𝘀𝗵𝗮𝗿𝗶𝗻𝗴 • LMCache roadmap (K8s operator, and more) 🎥 Watch full talk: y2u.be/ljFRtAoUYzg

English

Tensormesh@tensormesh·28 Oca

Most production AI deployments waste millions because of cache silos. Our P2P architecture (developed with @TencentGlobal) eliminates this: instances now share cache across peers using RDMA transfers. Results: 4x faster TTFT, 5x faster completion, massive reduction in redundant compute. Read the technical breakdown: Tensormesh Blog: tensormesh.ai/blog-posts/lmc… LMCache Blog: blog.lmcache.ai/en/2026/01/21/… Beta + $100 credits: app.tensormesh.ai/login #AIInfrastructure #MachineLearning #LLMOps #GPUOptimization #OpenSource #ArtificialIntelligence #CloudComputing

English

Tensormesh@tensormesh·27 Oca

𝗪𝗵𝘆 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗖𝗼𝘀𝘁𝘀 𝗕𝗿𝗲𝗮𝗸 𝗮𝘁 𝗦𝗰𝗮𝗹𝗲 ? In this clip, our 𝗖𝗘𝗢 𝗮𝗻𝗱 𝗰𝗼-𝗳𝗼𝘂𝗻𝗱𝗲𝗿, @JunchenJiang, explains why the inference cost problem isn’t compute alone, it’s how systems repeatedly reprocess and forget the same 𝗹𝗼𝗻𝗴-𝗰𝗼𝗻𝘁𝗲𝘅𝘁 data. By persisting and reusing 𝗞𝗩 𝗰𝗮𝗰𝗵𝗲, inference systems can avoid redundant computation and 𝗳𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝗹𝘆 𝗰𝗵𝗮𝗻𝗴𝗲 𝘁𝗵𝗲 𝗰𝗼𝘀𝘁 𝗲𝗾𝘂𝗮𝘁𝗶𝗼𝗻. That’s the economic relief 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 enables. 🎥 Watch the full interview: 👉 y2u.be/zHW4Zzd7pjI

English

Tensormesh@tensormesh·22 Oca

Officially live speaking at @nvidia Dynamo Day, our CTO and Cofounder Yihua Cheng! Watch now at nvda.ws/4svwy6m! #NVIDIA #DynamoDay #AI #KVCache #LLM #GPU

English

166

Tensormesh retweetledi

Vikram@msharmavikram·21 Oca

2026 is the year of AI inference at scale. That’s exactly why we’re kicking off the year by talking about Dynamo and what we’ve been building at Dynamo Day. Dynamo Day is less than two days away, and I couldn’t be more excited! (1/3)

English

16.8K

Keşfet

@this_will_echo @lmcache @nvidia @JunchenJiang @TencentGlobal @NVIDIAAI @nijaba @PyTorch