KVCache.AI

24 posts

KVCache.AI banner
KVCache.AI

KVCache.AI

@KT_Project_AI

Hi, this is https://t.co/EO7MXLjRIs official account. We build systems for efficient LLM serving, including KTransformers and Mooncake.

Beijing انضم Ağustos 2018
97 يتبع120 المتابعون
KVCache.AI
KVCache.AI@KT_Project_AI·
Great work! Scalable speculative decoding training is an important step forward as models continue to grow in size and context length. Excited to see Mooncake play a key role here by providing efficient and reliable streaming of hidden states, making fully disaggregated inference and training pipelines practical.
PyTorch@PyTorch

We’re excited to introduce TorchSpec, a torch-native framework for scalable speculative decoding training developed by the TorchSpec and Mooncake teams. By streaming hidden states from inference engines to training workers via Mooncake, TorchSpec enables fully disaggregated pipelines where inference and training scale independently. 🔗 Read our latest blog from TorchSpec & Mooncake teams: pytorch.org/blog/torchspec… @lightseekorg @KT_Project_AI #PyTorch #TorchSpec #Mooncake #OpenSourceAI

English
0
2
3
356
KVCache.AI
KVCache.AI@KT_Project_AI·
Huge congratulations to the @lmsysorg SGLang team and @nvidia on these impressive GB300 results! 🚀 Powerful hardware + excellent software optimization is exactly how you unlock the full potential of long-context inference. Glad that Mooncake, as the KV cache transfer component, could contribute to this milestone. Excited to see what’s next!
LMSYS Org@lmsysorg

🚀 Our new blog: 1.53X over GB200 - Deploying DeepSeek on GB300 NVL72, with 226 TPS/GPU on long-context inference! Together with @nvidia, we have achieved new milestones on GB300 NVL72 for 128K/8K long-context serving: ⚡ 226 TPS/GPU peak throughput (1.53X vs GB200) 🧠 1.87X TPS/User gain with MTP under matched throughput 💾 1.6X higher decode batch size via GB300's 288GB HBM3e ⏱ 8.6s TTFT for 128K prefill with dynamic chunked PP 🔧 1.35X faster FMHA kernel via 2x SFU softmax throughput on Blackwell Ultra Powered by: PD disaggregation + Wide-EP + chunked PP + MTP overlap scheduling + FP8 attention, and orchestrated with NVIDIA Dynamo @NVIDIAAIDev

English
1
1
7
371
KVCache.AI
KVCache.AI@KT_Project_AI·
Huge congrats to Minimax, this awesome new model is now open-source! KTransformers is happy to provided day0 support for M2.5. You can use KTransformers to enjoy the cutting edge ability of M2.5 with only 1 5090 + 300GB DRAM!
MiniMax (official)@MiniMax_AI

MiniMax-M2.5 is now open source. Trained with reinforcement learning across hundreds of thousands of complex real-world environments, it delivers SOTA performance in coding, agentic tool use, search, and office workflows. Hugging Face: huggingface.co/MiniMaxAI/Mini… GitHub: github.com/MiniMax-AI/Min… Coding Plan: platform.minimax.io/subscribe/codi… Intelligence with Everyone

English
1
1
5
236
PyTorch
PyTorch@PyTorch·
We’re excited to welcome Mooncake to the PyTorch Ecosystem! Mooncake is designed to solve the “memory wall” in LLM serving. By integrating Mooncake’s high performance KVCache transfer and storage capabilities with PyTorch native inference engines like SGLang, vLLM, and TensorRT-LLM, it unlocks new levels of throughput and scalability for large language model deployments. Mooncake enables prefill decode disaggregation, global KVCache reuse, elastic expert parallelism, and serves as a fault tolerant PyTorch distributed backend. 🔗 hubs.la/Q042Zf9N0 #PyTorch #OpenSourceAI #LLM #AIInfrastructure
PyTorch tweet media
English
7
51
401
103.6K
KVCache.AI
KVCache.AI@KT_Project_AI·
🚀 Exciting news! Mooncake is now officially part of the PyTorch Ecosystem! Mooncake brings high-performance KVCache transfer and storage to PyTorch-native LLM serving, enabling better prefill–decode disaggregation, global KVCache reuse, elastic MoE support, and fault-tolerant PyTorch distributed backends. Already integrated with engines like SGLang, vLLM & TensorRT LLM, we are thrilled to build the future of scalable LLM serving together. 👉 Read more: pytorch.org/blog/mooncake-… #Mooncake #PyTorch #LLM #OpenSourceAI
PyTorch@PyTorch

We’re excited to welcome Mooncake to the PyTorch Ecosystem! Mooncake is designed to solve the “memory wall” in LLM serving. By integrating Mooncake’s high performance KVCache transfer and storage capabilities with PyTorch native inference engines like SGLang, vLLM, and TensorRT-LLM, it unlocks new levels of throughput and scalability for large language model deployments. Mooncake enables prefill decode disaggregation, global KVCache reuse, elastic expert parallelism, and serves as a fault tolerant PyTorch distributed backend. 🔗 hubs.la/Q042Zf9N0 #PyTorch #OpenSourceAI #LLM #AIInfrastructure

English
0
1
3
153
KVCache.AI
KVCache.AI@KT_Project_AI·
Also, You can use KTransformers with LLamaFactory to Finetune K2.5 in a local low HBM hardware (96GB) plus many DDR5 DRAM!
English
0
0
3
98
KVCache.AI
KVCache.AI@KT_Project_AI·
We are excited to provide day0-support for Kimi-K2.5. KTransformers is a growing opensource project which provides local deployment ability for large models in Low HBM scenario(maybe 64GB).
KVCache.AI tweet media
English
0
2
2
173
KVCache.AI
KVCache.AI@KT_Project_AI·
RT @Kimi_Moonshot: 🥝 Meet Kimi K2.5, Open-Source Visual Agentic Intelligence. 🔹 Global SOTA on Agentic Benchmarks: HLE full set (50.2%), B…
English
0
2
0
31
KVCache.AI
KVCache.AI@KT_Project_AI·
Huge congrats to Kimi K2.5! Newest SOTA VL model with very complete agents support🎉🎉🎉 KTransformers is happy to provide day0 support for K2.5 in local deployment scenario, please see our github.
English
0
0
1
82
KVCache.AI
KVCache.AI@KT_Project_AI·
KTransformers release v0.5.0, new kt cli is coming! Now it is very easy to manage your local AI deployment system with kt + commands.
KVCache.AI tweet media
English
0
0
2
70
KVCache.AI
KVCache.AI@KT_Project_AI·
🚀 KTransformers now supports MiniMax-M2.1 with native FP8 inference! On a single RTX 5090: ✅Prefill: 2500+ tokens/s ✅Decode: 33+ tokens/s Compared to llama.cpp: 🚀4.5x faster prefill 📈30% faster decodeRun locally with just: "kt run m2" 🔗 github.com/kvcache-ai/ktr…
KVCache.AI tweet media
English
1
4
29
9K
KVCache.AI أُعيد تغريده
MiniMax (official)
MiniMax (official)@MiniMax_AI·
MiniMax M2.1 is OPEN SOURCE: SOTA for real-world dev & agents • SOTA on coding benchmarks (SWE / VIBE / Multi-SWE) • Beats Gemini 3 Pro & Claude Sonnet 4.5 • 10B active / 230B total (MoE) Not just SOTA, faster to infer, easier to deploy, and yes, you can even run it locally Weights: huggingface.co/MiniMaxAI/Mini…
MiniMax (official) tweet media
English
57
174
1.5K
1.1M
KVCache.AI
KVCache.AI@KT_Project_AI·
🎉 As a KTransformers maintainer, I’m genuinely happy to say this: RL-DPO is now basically “plug-and-play” 😄 With LLaMA-Factory + LoRA + DPO: ✅ one line in YAML: `use_kt: true` ✅ one command: `USE_KT=1 llamafactory-cli train ...` …and you can start preference alignment on DeepSeek-V2-Lite-Chat (aka: models that speak more “human”) Full tutorial: blog.llamafactory.net/en/posts/ktran… #KTransformers #LLaMAFactory #DPO #RLHF
English
0
1
3
293