Tensormesh

48 posts

Tensormesh banner
Tensormesh

Tensormesh

@tensormesh

Powering the next generation of AI infrastructure.

San Francisco, CA Katılım Ekim 2025
6 Takip Edilen61 Takipçiler
Tensormesh
Tensormesh@tensormesh·
"𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗶𝘀 𝘁𝗵𝗲 𝗻𝗲𝘄 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸" — Kevin Deierling, SVP Networking #NVIDIA At his #GTC talk last week, he highlighted 𝗖𝗠𝗫 and 𝗖𝗮𝗰𝗵𝗲𝗕𝗹𝗲𝗻𝗱 from 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 (@tensormesh) were part of the new KV Cache memory stack for agents, and recognized @tensormesh among the 𝗖𝗠𝗫 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗽𝗮𝗿𝘁𝗻𝗲𝗿𝘀. As the stack evolves, @tensormesh keeps building for what's next. ▶️ session Replay: tinyurl.com/GTC-talk
Tensormesh tweet mediaTensormesh tweet mediaTensormesh tweet media
English
0
2
9
319
Tensormesh
Tensormesh@tensormesh·
We're going to @nvidia GTC 2026 🎉 Booth 7022 South Market Lot | March 16–19 | San Jose, CA Stop by for: →Live KV cache optimization demos →Meet the team →Tensormesh swag If GPU inference costs are killing your margins, let's talk. #nvidiagtc2026 #nvidia #gpu #aiinference
Tensormesh tweet media
English
0
0
1
111
Tensormesh
Tensormesh@tensormesh·
Most teams running MemGPT agents are wasting 56% of their prefill compute every turn, by default. The fix isn't better hardware. It's a different caching strategy. Read the full technical breakdown: tensormesh.ai/blog-posts/pre…
Tensormesh tweet media
English
0
1
4
187
Tensormesh retweetledi
Vikram
Vikram@msharmavikram·
Join us at GTC’26 for a tutorial where Ziqi Fan, Ph.D. (@NVIDIAAI Dynamo), Fan YE (Tencent Cloud), @JunchenJiang (@lmcache / @tensormesh ), and I break down everything you need to know about KV cache design for production LLM inference. (2/5)
English
1
3
7
764
Tensormesh
Tensormesh@tensormesh·
𝗧𝗵𝗲 𝗕𝗹𝗶𝗻𝗱 𝗦𝗽𝗼𝘁 𝗶𝗻 𝗟𝗟𝗠 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 In this clip, our 𝗖𝗘𝗢 𝗮𝗻𝗱 𝗰𝗼-𝗳𝗼𝘂𝗻𝗱𝗲𝗿, Junchen Jiang explains why AI infrastructure isn’t just about bigger GPUs — it’s about models internal memory, 𝗞𝗩 𝗰𝗮𝗰𝗵𝗲, which is emerging as the next 𝗕𝗶𝗴 𝗗𝗮𝘁𝗮 layer for AI. Watch the full interview: 👉 y2u.be/zHW4Zzd7pjI
English
1
0
3
75
Tensormesh
Tensormesh@tensormesh·
Most teams running LLM inference have no idea how much they're overpaying. GPU utilization? Invisible. KV cache efficiency? No clue. We just shipped Tensormesh Beta 2 with real-time metrics, plus one-click deployment and pre-loaded Qwen3/Mistral models. $100 free GPU credits to try it: app.tensormesh.ai/login
English
0
0
1
47
Tensormesh
Tensormesh@tensormesh·
Every time an LLM re-reads your context, you're paying for it twice. @tensormesh persists KV Cache beyond GPU memory — so repeated context, long agent workflows, and multi-turn chats stop burning compute. Result? ✅ Lower latency ✅ Higher GPU efficiency ✅ Better margins Claim $100 in free GPU credits and see it on your own workloads: tensormesh.ai
English
0
0
3
40
Tensormesh
Tensormesh@tensormesh·
Every AI agent run starts from scratch. That's not an agent problem, it's a caching problem. Standard prefix caching breaks when dynamic content shifts position. CacheBlend fixes this: → 63-85% cache hit rates on skill content → ~85% served from cache → Lower latency + token costs Full breakdown: tensormesh.ai/blog-posts/age…
Tensormesh tweet media
English
0
0
1
42
Tensormesh
Tensormesh@tensormesh·
Running open-source models is easy. Running them well in production isn’t. In today’s spotlight, Nick Barcet (@nijaba), Head of GTM @tensormesh, shares what Tensormesh is built to solve — and the 𝗽𝗲𝗼𝗽𝗹𝗲 and culture behind the execution. If you’re running OSS models and want dramatically more efficient inference 🎯Try @tensormesh with $100 in free GPU credits: lnkd.in/g-gXYtaV #AI #LLM #Inference #kvcache
English
0
2
4
178
Tensormesh
Tensormesh@tensormesh·
We analyzed RepoAgent's prompt caching and found a 25x gap between theory and reality. Hit rate: 3.4% Reusable content: 85.9% The problem? Prefix caching breaks when variables shift token positions, even when content is identical. Non-prefix caching (CacheBlend) closes that gap. Full breakdown: tensormesh.ai/blog-posts/blo…
Tensormesh tweet media
English
0
0
4
70
Tensormesh
Tensormesh@tensormesh·
𝗟𝗠𝗖𝗮𝗰𝗵𝗲 × 𝗣𝘆𝗧𝗼𝗿𝗰𝗵 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻: 𝗪𝗵𝘆 𝗜𝘁 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 Learn why open-sourcing @lmcache was a natural step for @tensormesh from our 𝗖𝗘𝗢, @JunchenJiang. And the role it plays in the @PyTorch ecosystem — helping shape the future of open AI infrastructure. 🎥 Watch the full interview: 👉 y2u.be/zHW4Zzd7pjI
English
0
1
9
240
Tensormesh
Tensormesh@tensormesh·
Open-source AI just became better than GPT-4o and 92% cheaper Most companies haven't noticed yet, but the ones who have are building 5-year leads. New Massachusetts Institute of Technology research reveals a striking market inefficiency: open-source AI models deliver 90% of closed-source performance at 87% lower cost, yet they account for only 20% of usage. 📖 Read Full Blog Here: tensormesh.ai/blog-posts/blo… 🚀Try Tensormesh with $100 in free GPU Credits: app.tensormesh.ai/login?utm_sour… #OpenSourceAI #AIInfrastructure #MachineLearning #EnterpriseAI #AI
Tensormesh tweet media
English
0
1
1
72
Tensormesh
Tensormesh@tensormesh·
At the recent @nvidia 𝗗𝘆𝗻𝗮𝗺𝗼 Day, our CTO, @ChengYihuaA , shared how KV Caching becomes a real bottleneck for inference at scale, and what it takes to solve it, including integration with engines like #Dynamo, 𝗱𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗲𝗱 CPU cache sharing, and outlines @lmcache roadmap. 𝗧𝗼𝗽𝗶𝗰𝘀 𝗰𝗼𝘃𝗲𝗿𝗲𝗱: •  𝗟𝗠𝗖𝗮𝗰𝗵𝗲 Use cases •  LMCache x Dynamo Integration •  𝗗𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗲𝗱 𝗖𝗣𝗨 𝗞𝗩𝗖𝗮𝗰𝗵𝗲 𝘀𝗵𝗮𝗿𝗶𝗻𝗴 •  LMCache roadmap (K8s operator, and more) 🎥 Watch full talk: y2u.be/ljFRtAoUYzg
English
0
0
2
80
Tensormesh
Tensormesh@tensormesh·
Most production AI deployments waste millions because of cache silos. Our P2P architecture (developed with @TencentGlobal) eliminates this: instances now share cache across peers using RDMA transfers. Results: 4x faster TTFT, 5x faster completion, massive reduction in redundant compute. Read the technical breakdown: Tensormesh Blog: tensormesh.ai/blog-posts/lmc… LMCache Blog: blog.lmcache.ai/en/2026/01/21/… Beta + $100 credits: app.tensormesh.ai/login #AIInfrastructure #MachineLearning #LLMOps #GPUOptimization #OpenSource #ArtificialIntelligence #CloudComputing
Tensormesh tweet media
English
0
0
2
50
Tensormesh
Tensormesh@tensormesh·
𝗪𝗵𝘆 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗖𝗼𝘀𝘁𝘀 𝗕𝗿𝗲𝗮𝗸 𝗮𝘁 𝗦𝗰𝗮𝗹𝗲 ? In this clip, our 𝗖𝗘𝗢 𝗮𝗻𝗱 𝗰𝗼-𝗳𝗼𝘂𝗻𝗱𝗲𝗿, @JunchenJiang,  explains why the inference cost problem isn’t compute alone, it’s how systems repeatedly reprocess and forget the same 𝗹𝗼𝗻𝗴-𝗰𝗼𝗻𝘁𝗲𝘅𝘁 data. By persisting and reusing 𝗞𝗩 𝗰𝗮𝗰𝗵𝗲, inference systems can avoid redundant computation and 𝗳𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝗹𝘆 𝗰𝗵𝗮𝗻𝗴𝗲 𝘁𝗵𝗲 𝗰𝗼𝘀𝘁 𝗲𝗾𝘂𝗮𝘁𝗶𝗼𝗻. That’s the economic relief 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 enables. 🎥 Watch the full interview: 👉 y2u.be/zHW4Zzd7pjI
English
0
0
4
95
Tensormesh retweetledi
Vikram
Vikram@msharmavikram·
2026 is the year of AI inference at scale. That’s exactly why we’re kicking off the year by talking about Dynamo and what we’ve been building at Dynamo Day. Dynamo Day is less than two days away, and I couldn’t be more excited! (1/3)
Vikram tweet media
English
4
9
53
16.8K