My time being spent: before using claude code --> write code after using claude code --> read code, understand and find potential issues My mental effort is not getting much lighter lol.

English

185

Kuntai Du@this_will_echo·16 Nis

I want stardew valley on my IDE 😝

Aman@Amank1412

Someone built a transparent Mario game that runs OVER IDE so can play while waiting for Copilot to write code.

English

Kuntai Du@this_will_echo·16 Nis

Heard that Qwen close-sourced their best model 😈

Qwen@Alibaba_Qwen

⚡ Meet Qwen3.6-35B-A3B：Now Open-Source！🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes Efficient. Powerful. Versatile. Try it now👇 Blog：qwen.ai/blog?id=qwen3.… Qwen Studio：chat.qwen.ai HuggingFace：huggingface.co/Qwen/Qwen3.6-3… ModelScope：modelscope.cn/models/Qwen/Qw… API（‘Qwen3.6-Flash’ on Model Studio）：Coming soon～ Stay tuned

English

122

Kuntai Du@this_will_echo·15 Nis

LLM for reranking helps you push Terminal Bench Sota by so much!

Azalia Mirhoseini@Azaliamirh

Turns out we can get SOTA on agentic benchmarks with a simple test-time method! Excited to introduce LLM-as-a-Verifier. Test-time scaling is effective, but picking the "winner" among many candidates is the bottleneck. We introduce a way to extract a cleaner signal from the model: 1️⃣ Ask the LLM to rank results on a scale of 1-k 2️⃣ Use the log-probs of those rank tokens to calculate an expected score You can get a verification score in a single sampling pass per candidate pair. Blog: llm-as-a-verifier.notion.site Code: llm-as-a-verifier.github.io Led by @jackyk02 and in collaboration with a great team: @shululi256, @pranav_atreya, @liu_yuejiang, @drmapavone, @istoica05

English

3.2K

Kuntai Du@this_will_echo·15 Nis

Latest models use efficient attentions like Mamba or sliding window. This gives huge potential in KV cache offloading layer --- LMCache needs to catch up.

Tensormesh@tensormesh

GPU memory alone won’t carry the next generation of LLM serving. At #RaySummit, our Chief Scientist @this_will_echo shared how #LMCache offloads KV Cache across CPU RAM, local disk, Redis, and S3, while enabling cache reuse beyond basic prefix caching. Watch the full talk on YouTube: 👉🏻youtube.com/watch?v=aVpkkV… #RaySummit #LMCache #Tensormesh #KVCache

English

142

Kuntai Du@this_will_echo·13 Nis

Lol benchmaxxing, sooo true

Junyang Lin@JustinLin610

we need agent evals that are really consistent with real world usages. otherwise people are optimizing foundation models for the wrong direction. the problem of targeting is even bigger than benchmaxxing.

English

582

Kuntai Du@this_will_echo·11 Nis

Two years ago, we just have 2 NVIDIA A40. Two years later, our project is mentioned in Jensen Huang's GTC talk. Hope is the first-order weapon for human to fight for the future.

Hanchen Li@lihanc02

Some former colleagues from @lmcache shared this photo from the GTC Keynote. I am honestly surprised how fast the team has been growing. (We were a research lab on 2 A40 GPUs in 2023!) btw I think they are hiring LLM hackers (or product hackers I am not sure 🤪, you should just check with @JunchenJiang @ChengYihuaA) #GTC #LLM #Inference #Nvidia #LMCache #KVCache

English

1.1K

Kuntai Du retweetledi

LMCache Lab@lmcache·1 Nis

Why not store KV cache permanently? In case you missed it, #IBM recently posted two blogs for 𝗹𝗹𝗺-𝗱 + 𝗞𝟴𝗦 + 𝗟𝗠𝗖𝗮𝗰𝗵𝗲-based KV storage. Thrilled to keep building together. Avoiding recomputation is the goal, but it’s still rare to see KV cache treated as shared, persistent infrastructure in real production deployments. Excited to see LMCache be part of this with IBM, a long-time collaborator of the LMCache community. Thrilled to keep building together. These two posts are a great look at what that can actually look like in practice: 1. Rethinking LLM Inference Economics with llm-d, LMCache, and IBM Storage Scale community.ibm.com/community/user… 2. Deploying Distributed LLM Inference Service with IBM Storage Scale for KV Cache Offloading community.ibm.com/community/user… Great read for anyone interested in fast yet cheap LLM inference. #LMCache #vLLM #Kubernetes #K8s #KVCache

English

285

Kuntai Du@this_will_echo·25 Mar

Physical LLM is on the way lol

Tensormesh@tensormesh

"𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗶𝘀 𝘁𝗵𝗲 𝗻𝗲𝘄 𝗯𝗼𝘁𝘁𝗹𝗲𝗻𝗲𝗰𝗸" — Kevin Deierling, SVP Networking #NVIDIA At his #GTC talk last week, he highlighted 𝗖𝗠𝗫 and 𝗖𝗮𝗰𝗵𝗲𝗕𝗹𝗲𝗻𝗱 from 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 (@tensormesh) were part of the new KV Cache memory stack for agents, and recognized @tensormesh among the 𝗖𝗠𝗫 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 𝗽𝗮𝗿𝘁𝗻𝗲𝗿𝘀. As the stack evolves, @tensormesh keeps building for what's next. ▶️ session Replay: tinyurl.com/GTC-talk

English

121

Kuntai Du retweetledi

Tensormesh@tensormesh·17 Mar

🔴 Live from #GTC2026 On the floor with our Chief Scientist @this_will_echo and CTO #Yihua Chang — #KVCache is the hottest topic of the day. Even Jensen opened with it. 🎙️They covered topics like: #CacheBlend, @lmcache 0.4.0. and the super cool collab with @nvidia around a bot called #reachy using LMCache under the hood for 20x speedup #GTC2026 #KVCache #LMCache #TensorMesh

English

441

Kuntai Du@this_will_echo·14 Oca

By offloading KV caches to SSD, we managed to reduce the time-to-first-token for @gmi_cloud without ANY extra infra cost!

GMI Cloud@gmi_cloud

Happy 2026 🥂 First post of the year: a technical benchmark. In a joint study with @tensormesh , we achieved: - 4× TTFT improvement - Prefix cache hit rate >50% Using SSD-augmented KVCache on realistic multi-turn LLM traffic. Full write-up on GMI Cloud: gmicloud.ai/blog/gmi-cloud…

English

379

Kuntai Du retweetledi

Junchen Jiang@JunchenJiang·10 Ara

🚀 LMCache has officially been out for 1.5 years now! Within its success, LMCache has become the default KV-cache library for open-source LLM inference (CPU offload, P2P sharing, multi-backend storage, vLLM/SGLang integration, and more). As a PyTorch Foundation Ecosystem project, LMCache is now used by enterprise leaders across the industry (GKE, AWS, Nvidia's Dynamo, llm-d…). 🤔What’s the secret to our product?? 🔎 Come see yourself: arxiv.org/pdf/2510.09665 ♥️ A huge thank you to our contributors and community, you’ve influenced what makes LMCache today. (@lmcache) #KVCache #LMCache #LLM #vLLM

English

1.6K

Kuntai Du@this_will_echo·19 Kas

Github is not acting normal... Our LMCache logo suddenly disappeared today, we didn't make any change. And we cannot even clone the repo using ssh. Github bad bad.

English

206

Keşfet

@lmcache @nvidia @gmi_cloud @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates