PyTorch

3.2K posts

PyTorch banner
PyTorch

PyTorch

@PyTorch

Tensors and neural networks in Python with strong hardware acceleration. PyTorch is an open source project at the Linux Foundation. #PyTorchFoundation

Katılım Eylül 2016
85 Takip Edilen489.9K Takipçiler
PyTorch
PyTorch@PyTorch·
IBM Research uses vLLM at the heart of its Research Inference & Tuning Service (RITS) Platform, providing shared access to model inference and tuning across its research community. Our recent case study outlines how RITS supports more than 1,300 active users and hosts over 100 models at any given time, with vLLM serving as the core model serving runtime. pytorch.org/blog/ibm-resea… The Futurum Group covered the case study and examines what this approach could mean for enterprise AI infrastructure, including centralized access, governance, and cost control. 🔗 Read here: futurumgroup.com/insights/can-i… #PyTorch #vLLM #AIInfrastructure #OpenSourceAI
English
3
3
42
8.4K
PyTorch retweetledi
vLLM
vLLM@vllm_project·
🎉 Day-0 support for @deepseek_ai V4 Pro and Flash on vLLM — a new generation of DeepSeek model, purpose-built for tasks up to 1M tokens. Alongside the release, we're publishing a first-principles walkthrough of the new long-context attention and how we implemented it in vLLM. The new attention mechanism, in four moves: • Shared K/V + inverse RoPE → 2× memory savings • c4a / c128a KV compression → 4×–128× savings • DeepSeek Sparse Attention over compressed tokens • Short sliding window for locality across compression boundaries At 1M context, per-layer KV state is ~8.7× smaller than a DeepSeek V3.2-style 61-layer stack (9.62 GiB vs 83.9 GiB, bf16). fp8 attention cache + fp4 indexer cache shrink it further. vLLM side: • Unified hybrid KV cache — single logical block size (256 native positions) across all compression rates; compressor state folded into the SWA KV cache spec so prefix caching, disagg prefill, CUDA graphs and MTP reuse the same abstraction • Three page-size buckets for the full 5-way cache stack → no cross-kind fragmentation • Fused kernels: compressor + RMSNorm + RoPE + cache insert (1.4–3×), inverse RoPE + fp8 quant (2–3×), Q-norm + KV RoPE + K insert (10–20×) • Multi-stream overlap of indexer vs main-KV compression vs SWA insertion Disaggregated serving is supported out of the box and strongly recommended for best performance. Follow our recipes site for verified commands for @nvidia Blackwell (B200, B300, GB200, GB300) and Hopper (H100/H200/H20) systems. Thanks to the @deepseek_ai team for open-sourcing DeepSeek V4, and to @inferact for landing day-0 support 🤝 📝 Blog: vllm.ai/blog/deepseek-… 📖 Recipes: recipes.vllm.ai/deepseek-ai/De… 🤗 huggingface.co/deepseek-ai/De…
vLLM tweet media
DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English
17
90
567
120.7K
PyTorch
PyTorch@PyTorch·
LightSeek Foundation recently released Shepherd Model Gateway (SMG). It came out of a production bottleneck in LLM serving: CPU-bound work sitting on the critical path. Light Seek moved all non-GPU work into a Rust-based gateway, with a minimal gRPC boundary around inference. The project is developed and maintained by @lightseekorg. The result: up to 3.5× throughput in long-context scenarios. 🔗 More details in our latest blog: pytorch.org/blog/lightseek… #PyTorch #LightSeek #OpenSourceAI #ShepherdModelGateway
PyTorch tweet media
English
3
12
56
157K
PyTorch retweetledi
vLLM
vLLM@vllm_project·
🏆 vLLM powers the fastest inference on NVIDIA Blackwell Ultra on Artificial Analysis. On @digitalocean's Serverless Inference, powered by vLLM on NVIDIA HGX B300: 🥇 AA #1 output speed for DeepSeek V3.2 (230 tok/s, 0.96s TTFT) and Qwen 3.5 397B 🔧 MiniMax-M2.5: 23% TPOT gain via an EAGLE3 draft model trained on TorchSpec Co-design highlights: - NVFP4 quantization on Blackwell Ultra - EAGLE3 + MTP speculative decoding - Per-model kernel fusion Thanks to @digitalocean, @nvidia, and @inferact for the collaboration. Optimizations land back in open-source vLLM. 🔗 digitalocean.com/blog/how-we-bu…
vLLM tweet mediavLLM tweet mediavLLM tweet media
English
4
26
158
50.5K
PyTorch retweetledi
LightSeek Foundation
LightSeek Foundation@lightseekorg·
Congratulations to @vllm_project and @digitalocean on the launch. We also appreciate them adopting TorchSpec for EAGLE 3 training—great to see the open-source ecosystem thriving.
LightSeek Foundation tweet media
vLLM@vllm_project

🏆 vLLM powers the fastest inference on NVIDIA Blackwell Ultra on Artificial Analysis. On @digitalocean's Serverless Inference, powered by vLLM on NVIDIA HGX B300: 🥇 AA #1 output speed for DeepSeek V3.2 (230 tok/s, 0.96s TTFT) and Qwen 3.5 397B 🔧 MiniMax-M2.5: 23% TPOT gain via an EAGLE3 draft model trained on TorchSpec Co-design highlights: - NVFP4 quantization on Blackwell Ultra - EAGLE3 + MTP speculative decoding - Per-model kernel fusion Thanks to @digitalocean, @nvidia, and @inferact for the collaboration. Optimizations land back in open-source vLLM. 🔗 digitalocean.com/blog/how-we-bu…

English
1
7
34
9.4K
PyTorch
PyTorch@PyTorch·
Need to cut LLM training checkpoint costs? Training LLMs requires periodic checkpoints, full snapshots of model weights, optimizer states, and gradients saved to storage so training can resume after interruptions. At scale, these checkpoints become massive. NVIDIA nvCOMP is a GPU-accelerated lossless compression library that compresses the checkpoint before it leaves GPU memory, no roundtrip – no extra data movement. Developers can easily integrate high-throughput compression directly into their Python workflows (such as PyTorch or TensorFlow). 🔗 Read the full post: developer.nvidia.com/blog/cut-check… #PyTorch #OpenSourceAI #AI #Inference #Innovation
English
3
26
107
10.7K
PyTorch
PyTorch@PyTorch·
Want to train LLMs on longer contexts without re-engineering your entire systems stack? Introducing AutoSP — the first compiler-based solution that automatically optimizes LLM training for long contexts. Under the hood, AutoSP applies a series of compiler passes that trigger sequence parallelism, paired with a curated activation-checkpointing scheme tailored for long-context training. It's integrated directly into DeepSpeed, so enabling long-context training is just a config change away. No more rewiring your stack to push context lengths. Read the blog to learn more 🖇️ pytorch.org/blog/introduci…@AhanGupta13, Zhihao W., Neel Dani, @toh_tana, Tunji Ruwase, @_Minjia_Zhang_ #PyTorch #DeepSpeed #AutoSP #OpenSourceAI
PyTorch tweet media
English
3
21
120
15.3K
PyTorch
PyTorch@PyTorch·
The 2026 PyTorch Docathon runs May 5 to May 19, bringing contributors together to improve PyTorch tutorials, guides, examples, and documentation. Beginner-friendly issues, skill-based tasks, and community support are available for new and experienced contributors. pytorch.org/blog/rsvp-for-… #PyTorch #PyTorchDocathon
PyTorch tweet media
English
1
11
77
8.7K
PyTorch
PyTorch@PyTorch·
IBM Research introduced the RITS Platform to provide its research community with shared access to model inference endpoints. @vllm_project is at the heart of the RITS Platform and has been critical to democratizing access to the latest LLMs as they release. “The vLLM community is vibrant and responsive, and with collaborative expertise, we are able to do great things both upstream and internally by leveraging and contributing to this groundbreaking project. vLLM has been critical to democratizing access to our research community to the latest and greatest LLMs as they release.” – Priya Nagpurkar, Vice President, AI Platform, @IBM Research (@IBMResearch) Read our latest case study: 🔗 pytorch.org/blog/ibm-resea… #vLLM #PyTorch #OpenSourceAI
PyTorch tweet media
English
2
5
45
7.9K
PyTorch retweetledi
Red Hat AI
Red Hat AI@RedHat_AI·
Red Hat and Tesla engineers tackled a real production problem together. 3x output tokens/sec, 2x faster TTFT on Llama 3.1 70B with KServe + @_llm_d_ + @vllm_project. Fixes pushed upstream to KServe along the way. This is what open source looks like. 🤝 🚀
Yuan (Terry) Tang@TerryTangYuan

Excited to share our latest blog post on how we're solving real-world LLM inference challenges at production scale, a collaboration between @RedHat_AI and Tesla engineering teams. We hit the usual pain points: massive model weights choking storage, GPU cycles wasted on naive load balancing, infrastructure that fights you when nodes go down. Our answer: KServe + @_llm_d_ + @vllm_project with prefix-cache aware routing. The results: 3x more output tokens/sec and 2x faster time to first token. Thanks everyone who've contributed to this successful adoption: Scott Cabrinha, Sai Krishna, Sergey Bekkerman, Nati Fridman, Killian Golds, Andres Llausas, Bartosz Majsak, Greg Pereira, Pierangelo Di Pilato, Ran Pollak, Vivek Karunai Kiri Ragavan, Robert Shaw

English
4
9
71
19.3K
PyTorch retweetledi
Mark Collier 柯理怀
Mark Collier 柯理怀@sparkycollier·
If you missed PyTorch Con EU you can now check out all of the talk recordings And make plans to come to PyTorchCon in Shanghai in September and PyTorchCon NA in San Jose in October
PyTorch@PyTorch

PyTorch Conference Europe 2026 was one for the books. 2 days in Paris building the future of open source AI together. 100+ session recordings are now live. View the full #PyTorchCon EU playlist: youtube.com/playlist?list=…

English
1
4
15
10.3K
PyTorch
PyTorch@PyTorch·
Mark Collier (@sparkycollier), Executive Director of PyTorch Foundation and General Manager of AI & Infrastructure at @linuxfoundation, opened #PyTorchCon Europe 2026 with a keynote on how the open source intelligence stack advances through coordination across hardware, software, and open source communities. He shared updates on our growth, our expanding set of hosted projects, and why open collaboration remains essential to building AI infrastructure at global scale. 🎥 Watch the keynote: youtu.be/_BEmlMAKNSI?si… #PyTorchCon #PyTorch #vLLM #DeepSpeed #Ray #Helion #Safetensors #ExecuTorch #OpenSourceAI
YouTube video
YouTube
English
3
4
14
8.4K