KVCache.AI

2

3

324

KVCache.AI@KT_Project_AI·21 Şub

Huge congratulations to the @lmsysorg SGLang team and @nvidia on these impressive GB300 results! 🚀 Powerful hardware + excellent software optimization is exactly how you unlock the full potential of long-context inference. Glad that Mooncake, as the KV cache transfer component, could contribute to this milestone. Excited to see what’s next!

🚀 Our new blog: 1.53X over GB200 - Deploying DeepSeek on GB300 NVL72, with 226 TPS/GPU on long-context inference! Together with @nvidia, we have achieved new milestones on GB300 NVL72 for 128K/8K long-context serving: ⚡ 226 TPS/GPU peak throughput (1.53X vs GB200) 🧠 1.87X

English

7

371

KVCache.AI@KT_Project_AI·16 Şub

⚡ Day-0 support for Qwen3.5-397B-A17B just landed in KTransformers! This beast features Gated Delta Networks + sparse MoE (397B total, 17B active), unified vision-language, and 262K native context. Ready to run on your local machine.

🚀 Qwen3.5-397B-A17B is here: The first open-weight model in the Qwen3.5 series. 🖼️Native multimodal. Trained for real-world agents. ✨Powered by hybrid linear attention + sparse MoE and large-scale RL environment scaling. ⚡8.6x–19.0x decoding throughput vs Qwen3-Max 🌍201

English

4

21

19.8K

KVCache.AI@KT_Project_AI·13 Şub

IMPRESSIVE! with such a size of 200~B parameters and 10B activation!

Introducing M2.5, an open-source frontier model designed for real-world productivity. - SOTA performance at coding (SWE-Bench Verified 80.2%), search (BrowseComp 76.3%), agentic tool-calling (BFCL 76.8%) & office work. - Optimized for efficient execution, 37% faster at complex

English

3

193

KVCache.AI@KT_Project_AI·13 Şub

Huge congrats to Minimax, this awesome new model is now open-source! KTransformers is happy to provided day0 support for M2.5. You can use KTransformers to enjoy the cutting edge ability of M2.5 with only 1 5090 + 300GB DRAM!

MiniMax-M2.5 is now open source. Trained with reinforcement learning across hundreds of thousands of complex real-world environments, it delivers SOTA performance in coding, agentic tool use, search, and office workflows. Hugging Face: huggingface.co/MiniMaxAI/Mini… GitHub:

English

5

236

KVCache.AI@KT_Project_AI·13 Şub

@aiktp_com @PyTorch Thanks for your interest and support! Regarding PD disaggregation performance, we have seen very promising improvements. Here are some benchmark results for reference: lmsys.org/blog/2025-07-2… lmsys.org/blog/2025-06-1…

English

We’re excited to welcome Mooncake to the PyTorch Ecosystem! Mooncake is designed to solve the “memory wall” in LLM serving. By integrating Mooncake’s high performance KVCache transfer and storage capabilities with PyTorch native inference engines like SGLang, vLLM, and TensorRT-LLM, it unlocks new levels of throughput and scalability for large language model deployments. Mooncake enables prefill decode disaggregation, global KVCache reuse, elastic expert parallelism, and serves as a fault tolerant PyTorch distributed backend. 🔗 hubs.la/Q042Zf9N0 #PyTorch #OpenSourceAI #LLM #AIInfrastructure

42

PyTorch@PyTorch·13 Şub

We’re excited to welcome Mooncake to the PyTorch Ecosystem! Mooncake is designed to solve the “memory wall” in LLM serving. By integrating Mooncake’s high performance KVCache transfer and storage capabilities with PyTorch native inference engines like SGLang, vLLM, and TensorRT-LLM, it unlocks new levels of throughput and scalability for large language model deployments. Mooncake enables prefill decode disaggregation, global KVCache reuse, elastic expert parallelism, and serves as a fault tolerant PyTorch distributed backend. 🔗 hubs.la/Q042Zf9N0 #PyTorch #OpenSourceAI #LLM #AIInfrastructure

English

7

51

400

103.6K

KVCache.AI@KT_Project_AI·13 Şub

🚀 Exciting news! Mooncake is now officially part of the PyTorch Ecosystem! Mooncake brings high-performance KVCache transfer and storage to PyTorch-native LLM serving, enabling better prefill–decode disaggregation, global KVCache reuse, elastic MoE support, and fault-tolerant PyTorch distributed backends. Already integrated with engines like SGLang, vLLM & TensorRT LLM, we are thrilled to build the future of scalable LLM serving together. 👉 Read more: pytorch.org/blog/mooncake-… #Mooncake #PyTorch #LLM #OpenSourceAI

PyTorch@PyTorch

English

1

3

153

KVCache.AI@KT_Project_AI·27 Oca

Amazing!

One-shot "Video to code" result from Kimi K2.5 It not only clones a website, but also all the visual interactions and UX designs. No need to describe it in detail, all you need to do is take a screen recording and ask Kimi: "Clone this website with all the UX designs."

English

1

122

KVCache.AI@KT_Project_AI·27 Oca

Also, You can use KTransformers with LLamaFactory to Finetune K2.5 in a local low HBM hardware (96GB) plus many DDR5 DRAM!

English

3

98

KVCache.AI@KT_Project_AI·27 Oca

We are excited to provide day0-support for Kimi-K2.5. KTransformers is a growing opensource project which provides local deployment ability for large models in Low HBM scenario(maybe 64GB).

English

2

173

KVCache.AI@KT_Project_AI·27 Oca

Huge congrats to Kimi K2.5! Newest SOTA VL model with very complete agents support🎉

🥝 Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. 🔹 Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%) 🔹 Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%) 🔹 Code with Taste: turn chats,

English

92

KVCache.AI@KT_Project_AI·27 Oca

RT @Kimi_Moonshot: 🥝 Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. 🔹 Global SOTA on Agentic Benchmarks: HLE full set (50.2%), B…

English

2

0

31

KVCache.AI@KT_Project_AI·27 Oca

Huge congrats to Kimi K2.5! Newest SOTA VL model with very complete agents support🎉🎉🎉 KTransformers is happy to provide day0 support for K2.5 in local deployment scenario, please see our github.

English

1

82

KVCache.AI@KT_Project_AI·28 Ara

KTransformers release v0.5.0, new kt cli is coming! Now it is very easy to manage your local AI deployment system with kt + commands.

English

2

70

KVCache.AI@KT_Project_AI·27 Ara

@SkylerMiao7 Have fun with kt and minimax m2.1!

English

97

Skyler Miao@SkylerMiao7·27 Ara

@KT_Project_AI Great work, huge thanks!

English

0

4

293

KVCache.AI@KT_Project_AI·26 Ara

🚀 KTransformers now supports MiniMax-M2.1 with native FP8 inference! On a single RTX 5090: ✅Prefill: 2500+ tokens/s ✅Decode: 33+ tokens/s Compared to llama.cpp: 🚀4.5x faster prefill 📈30% faster decodeRun locally with just: "kt run m2" 🔗 github.com/kvcache-ai/ktr…

English

4

29

9K

KVCache.AI@KT_Project_AI·26 Ara

MiniMax M2.1 is impressive! With so much improvement on coding and other tool uses. Congrats to @minimax_ai Team!

MiniMax M2.1 is OPEN SOURCE: SOTA for real-world dev & agents • SOTA on coding benchmarks (SWE / VIBE / Multi-SWE) • Beats Gemini 3 Pro & Claude Sonnet 4.5 • 10B active / 230B total (MoE) Not just SOTA, faster to infer, easier to deploy, and yes, you can even run it locally

English

4

487

KVCache.AI ری ٹویٹ کیا

@·26 Ara

MiniMax M2.1 is OPEN SOURCE: SOTA for real-world dev & agents • SOTA on coding benchmarks (SWE / VIBE / Multi-SWE) • Beats Gemini 3 Pro & Claude Sonnet 4.5 • 10B active / 230B total (MoE) Not just SOTA, faster to infer, easier to deploy, and yes, you can even run it locally

English

57

174

1.5K

1.1M

KVCache.AI@KT_Project_AI·26 Ara

New open source SOTA code model! Released with KTransformers day-0 support!

MiniMax M2.1 is OPEN SOURCE: SOTA for real-world dev & agents • SOTA on coding benchmarks (SWE / VIBE / Multi-SWE) • Beats Gemini 3 Pro & Claude Sonnet 4.5 • 10B active / 230B total (MoE) Not just SOTA, faster to infer, easier to deploy, and yes, you can even run it locally

English

1

139

KVCache.AI@KT_Project_AI·23 Ara

🎉 As a KTransformers maintainer, I’m genuinely happy to say this: RL-DPO is now basically “plug-and-play” 😄 With LLaMA-Factory + LoRA + DPO: ✅ one line in YAML: `use_kt: true` ✅ one command: `USE_KT=1 llamafactory-cli train ...` …and you can start preference alignment on DeepSeek-V2-Lite-Chat (aka: models that speak more “human”) Full tutorial: blog.llamafactory.net/en/posts/ktran… #KTransformers #LLaMAFactory #DPO #RLHF

English