PyTorch

3.2K posts

PyTorch

@PyTorch

Tensors and neural networks in Python with strong hardware acceleration. PyTorch is an open source project at the Linux Foundation. #PyTorchFoundation

Katılım Eylül 2016

85 Takip Edilen489.9K Takipçiler

PyTorch@PyTorch·1d

FINAL CALL! 🚨 The #CFP for #KubeCon + #CloudNativeCon + #OpenInfraSummit + #PyTorchCon China, 8-9 Sept in Shanghai, closes TODAY at 23:59 China Standard Time. Don't miss your chance to speak on the premier #OpenSource stage. Submit NOW: bit.ly/40QIjb2

English

9.1K

PyTorch@PyTorch·3d

IBM Research uses vLLM at the heart of its Research Inference & Tuning Service (RITS) Platform, providing shared access to model inference and tuning across its research community. Our recent case study outlines how RITS supports more than 1,300 active users and hosts over 100 models at any given time, with vLLM serving as the core model serving runtime. pytorch.org/blog/ibm-resea… The Futurum Group covered the case study and examines what this approach could mean for enterprise AI infrastructure, including centralized access, governance, and cost control. 🔗 Read here: futurumgroup.com/insights/can-i… #PyTorch #vLLM #AIInfrastructure #OpenSourceAI

English

8.4K

PyTorch retweetledi

vLLM@vllm_project·24 Nis

🎉 Day-0 support for @deepseek_ai V4 Pro and Flash on vLLM — a new generation of DeepSeek model, purpose-built for tasks up to 1M tokens. Alongside the release, we're publishing a first-principles walkthrough of the new long-context attention and how we implemented it in vLLM. The new attention mechanism, in four moves: • Shared K/V + inverse RoPE → 2× memory savings • c4a / c128a KV compression → 4×–128× savings • DeepSeek Sparse Attention over compressed tokens • Short sliding window for locality across compression boundaries At 1M context, per-layer KV state is ~8.7× smaller than a DeepSeek V3.2-style 61-layer stack (9.62 GiB vs 83.9 GiB, bf16). fp8 attention cache + fp4 indexer cache shrink it further. vLLM side: • Unified hybrid KV cache — single logical block size (256 native positions) across all compression rates; compressor state folded into the SWA KV cache spec so prefix caching, disagg prefill, CUDA graphs and MTP reuse the same abstraction • Three page-size buckets for the full 5-way cache stack → no cross-kind fragmentation • Fused kernels: compressor + RMSNorm + RoPE + cache insert (1.4–3×), inverse RoPE + fp8 quant (2–3×), Q-norm + KV RoPE + K insert (10–20×) • Multi-stream overlap of indexer vs main-KV compression vs SWA insertion Disaggregated serving is supported out of the box and strongly recommended for best performance. Follow our recipes site for verified commands for @nvidia Blackwell (B200, B300, GB200, GB300) and Hopper (H100/H200/H20) systems. Thanks to the @deepseek_ai team for open-sourcing DeepSeek V4, and to @inferact for landing day-0 support 🤝 📝 Blog: vllm.ai/blog/deepseek-… 📖 Recipes: recipes.vllm.ai/deepseek-ai/De… 🤗 huggingface.co/deepseek-ai/De…

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

567

120.7K

PyTorch@PyTorch·3d

LightSeek Foundation recently released Shepherd Model Gateway (SMG). It came out of a production bottleneck in LLM serving: CPU-bound work sitting on the critical path. Light Seek moved all non-GPU work into a Rust-based gateway, with a minimal gRPC boundary around inference. The project is developed and maintained by @lightseekorg. The result: up to 3.5× throughput in long-context scenarios. 🔗 More details in our latest blog: pytorch.org/blog/lightseek… #PyTorch #LightSeek #OpenSourceAI #ShepherdModelGateway

English

157K

PyTorch@PyTorch·4d

Got a #PyTorch breakthrough to share? 🎤 The #CallForProposals for #PyTorchCon North America (Oct 20-21 | San Jose) is in full swing. Submit your technical session idea by June 7! Apply: bit.ly/4bIgqbs

English

4.3K

PyTorch@PyTorch·4d

The clock is ticking to join the program for #KubeCon + #CloudNativeCon + #OpenInfraSummit + #PyTorchCon China. Share your knowledge with the global ecosystem & submit to speak by 3 May: bit.ly/40QIjb2 And don't forget early bird rates end in 1 week! Save now: bit.ly/3NG8hv1

English

4.7K

PyTorch retweetledi

vLLM@vllm_project·5d

🏆 vLLM powers the fastest inference on NVIDIA Blackwell Ultra on Artificial Analysis. On @digitalocean's Serverless Inference, powered by vLLM on NVIDIA HGX B300: 🥇 AA #1 output speed for DeepSeek V3.2 (230 tok/s, 0.96s TTFT) and Qwen 3.5 397B 🔧 MiniMax-M2.5: 23% TPOT gain via an EAGLE3 draft model trained on TorchSpec Co-design highlights: - NVFP4 quantization on Blackwell Ultra - EAGLE3 + MTP speculative decoding - Per-model kernel fusion Thanks to @digitalocean, @nvidia, and @inferact for the collaboration. Optimizations land back in open-source vLLM. 🔗 digitalocean.com/blog/how-we-bu…

English

158

50.5K

PyTorch retweetledi

LightSeek Foundation@lightseekorg·5d

Congratulations to @vllm_project and @digitalocean on the launch. We also appreciate them adopting TorchSpec for EAGLE 3 training—great to see the open-source ecosystem thriving.

vLLM@vllm_project

English

9.4K

PyTorch@PyTorch·5d

Need to cut LLM training checkpoint costs? Training LLMs requires periodic checkpoints, full snapshots of model weights, optimizer states, and gradients saved to storage so training can resume after interruptions. At scale, these checkpoints become massive. NVIDIA nvCOMP is a GPU-accelerated lossless compression library that compresses the checkpoint before it leaves GPU memory, no roundtrip – no extra data movement. Developers can easily integrate high-throughput compression directly into their Python workflows (such as PyTorch or TensorFlow). 🔗 Read the full post: developer.nvidia.com/blog/cut-check… #PyTorch #OpenSourceAI #AI #Inference #Innovation

English

107

10.7K

PyTorch@PyTorch·5d

Want to train LLMs on longer contexts without re-engineering your entire systems stack? Introducing AutoSP — the first compiler-based solution that automatically optimizes LLM training for long contexts. Under the hood, AutoSP applies a series of compiler passes that trigger sequence parallelism, paired with a curated activation-checkpointing scheme tailored for long-context training. It's integrated directly into DeepSpeed, so enabling long-context training is just a config change away. No more rewiring your stack to push context lengths. Read the blog to learn more 🖇️ pytorch.org/blog/introduci… ✍ @AhanGupta13, Zhihao W., Neel Dani, @toh_tana, Tunji Ruwase, @_Minjia_Zhang_ #PyTorch #DeepSpeed #AutoSP #OpenSourceAI

English

120

15.3K

PyTorch@PyTorch·26 Nis

Only 1 week left to submit! ⏳ Join in Shanghai this 8-9 September for #KubeCon + #CloudNativeCon + #OpenInfraSummit + #PyTorchCon. Whether it's a panel or a lightning talk, we want your insights. Submit by the 3 May deadline: bit.ly/40QIjb2 Register by 6 May + save! bit.ly/3NG8hv1

English

PyTorch@PyTorch·24 Nis

The 2026 PyTorch Docathon runs May 5 to May 19, bringing contributors together to improve PyTorch tutorials, guides, examples, and documentation. Beginner-friendly issues, skill-based tasks, and community support are available for new and experienced contributors. pytorch.org/blog/rsvp-for-… #PyTorch #PyTorchDocathon

English

8.7K

PyTorch@PyTorch·24 Nis

IBM Research introduced the RITS Platform to provide its research community with shared access to model inference endpoints. @vllm_project is at the heart of the RITS Platform and has been critical to democratizing access to the latest LLMs as they release. “The vLLM community is vibrant and responsive, and with collaborative expertise, we are able to do great things both upstream and internally by leveraging and contributing to this groundbreaking project. vLLM has been critical to democratizing access to our research community to the latest and greatest LLMs as they release.” – Priya Nagpurkar, Vice President, AI Platform, @IBM Research (@IBMResearch) Read our latest case study: 🔗 pytorch.org/blog/ibm-resea… #vLLM #PyTorch #OpenSourceAI

English

7.9K

PyTorch retweetledi

Red Hat AI@RedHat_AI·22 Nis

Red Hat and Tesla engineers tackled a real production problem together. 3x output tokens/sec, 2x faster TTFT on Llama 3.1 70B with KServe + @_llm_d_ + @vllm_project. Fixes pushed upstream to KServe along the way. This is what open source looks like. 🤝 🚀

Yuan (Terry) Tang@TerryTangYuan

Excited to share our latest blog post on how we're solving real-world LLM inference challenges at production scale, a collaboration between @RedHat_AI and Tesla engineering teams. We hit the usual pain points: massive model weights choking storage, GPU cycles wasted on naive load balancing, infrastructure that fights you when nodes go down. Our answer: KServe + @_llm_d_ + @vllm_project with prefix-cache aware routing. The results: 3x more output tokens/sec and 2x faster time to first token. Thanks everyone who've contributed to this successful adoption: Scott Cabrinha, Sai Krishna, Sergey Bekkerman, Nati Fridman, Killian Golds, Andres Llausas, Bartosz Majsak, Greg Pereira, Pierangelo Di Pilato, Ran Pollak, Vivek Karunai Kiri Ragavan, Robert Shaw

English

19.3K

PyTorch retweetledi

Mark Collier 柯理怀@sparkycollier·21 Nis

If you missed PyTorch Con EU you can now check out all of the talk recordings And make plans to come to PyTorchCon in Shanghai in September and PyTorchCon NA in San Jose in October

PyTorch@PyTorch

PyTorch Conference Europe 2026 was one for the books. 2 days in Paris building the future of open source AI together. 100+ session recordings are now live. View the full #PyTorchCon EU playlist: youtube.com/playlist?list=…

English

10.3K

PyTorch@PyTorch·23 Nis

Mark Collier (@sparkycollier), Executive Director of PyTorch Foundation and General Manager of AI & Infrastructure at @linuxfoundation, opened #PyTorchCon Europe 2026 with a keynote on how the open source intelligence stack advances through coordination across hardware, software, and open source communities. He shared updates on our growth, our expanding set of hosted projects, and why open collaboration remains essential to building AI infrastructure at global scale. 🎥 Watch the keynote: youtu.be/_BEmlMAKNSI?si… #PyTorchCon #PyTorch #vLLM #DeepSpeed #Ray #Helion #Safetensors #ExecuTorch #OpenSourceAI

YouTube

English

8.4K

Keşfet

@deepseek_ai @nvidia @inferact @lightseekorg @digitalocean @vllm_project @AhanGupta13 @toh_tana