PyTorch

3.1K posts

PyTorch banner
PyTorch

PyTorch

@PyTorch

Tensors and neural networks in Python with strong hardware acceleration. PyTorch is an open source project at the Linux Foundation. #PyTorchFoundation

Sumali Eylül 2016
83 Sinusundan480.9K Mga Tagasunod
PyTorch
PyTorch@PyTorch·
We’re excited to introduce TorchSpec, a torch-native framework for scalable speculative decoding training developed by the TorchSpec and Mooncake teams. By streaming hidden states from inference engines to training workers via Mooncake, TorchSpec enables fully disaggregated pipelines where inference and training scale independently. 🔗 Read our latest blog from TorchSpec & Mooncake teams: pytorch.org/blog/torchspec… @lightseekorg @KT_Project_AI #PyTorch #TorchSpec #Mooncake #OpenSourceAI
PyTorch tweet media
English
1
19
92
13.7K
PyTorch
PyTorch@PyTorch·
Want to build differentiable computational physics Code leveraging AI? NVIDIA Warp is a framework for accelerated simulation, data generation, and spatial computing that bridges CUDA and Python. Warp enables developers to write high-performance kernels as regular Python functions that are JIT-compiled into efficient code for execution on the GPU. Warp can be easily made differentiable through the Warp native support for automatic differentiation. Straightforward to integrate with optimization or training workflows while remaining interoperable with frameworks like PyTorch, JAX, and NumPy. 🖇️ Read the full post: developer.nvidia.com/blog/build-acc… #PyTorch #OpenSourceAI #AI #Inference #Innovation
English
0
10
34
6K
PyTorch
PyTorch@PyTorch·
Heading to #NVIDIAGTC next week? Let’s talk @PyTorch. 🚀 We’re bringing the community to San Jose. Drop by Booth #338 to meet expert developers and core maintainers in person. Scaling, inference, foundation models, and OSS contributions. Full schedule below 👇 #PyTorch
English
5
3
17
17.9K
PyTorch nag-retweet
vLLM
vLLM@vllm_project·
Great to see @AMD select vLLM as one of the designated inference frameworks for the GPU MODE Hackathon. 🎉 The challenge: push Kimi K2.5 1T FP4 end-to-end inference performance on 8× AMD Instinct MI355X — using vLLM or AMD ATOM. Grand prize: $650,000. What makes this different: winning optimizations must be mergeable into AMD ATOM or vLLM upstream. Improvements that land in vLLM benefit the whole community. Phase 1 (kernel optimization) runs through April 6. More details ⬇️
AMD@AMD

Join the GPU MODE Hackathon, sponsored by AMD, and push the boundaries of LLM inference performance on leading open models, optimized for AMD Instinct MI355X GPUs. Finalists will compete for the $1.1M total cash prize pool across two independent tracks, each focused on a specific model and inference stack. Learn more and get registered here: luma.com/cqq4mojz

English
3
19
121
14K
PyTorch
PyTorch@PyTorch·
⏰ Last call! Registration for #PyTorchCon Europe, 7-8 April in Paris, goes up €100 tomorrow at 23:59 CET. 🪧 Don't miss a thing - check out the Poster Presentations: bit.ly/4bPqWNY 🎟 Register now: bit.ly/4bUWj91
PyTorch tweet media
English
1
0
18
4.8K
PyTorch
PyTorch@PyTorch·
We’re excited to share Generalized Dot-Product Attention (GDPA) — a production-driven attention kernel designed specifically for large-scale recommendation systems (RecSys). Proposed in our recent paper, GDPA replaces softmax with a flexible activation tailored for real-world RecSys traffic patterns and has been deployed in Meta’s largest recommendation model, GEM. 🔗 Read our latest blog: pytorch.org/blog/generaliz… By redesigning attention around production characteristics rather than benchmark assumptions, GDPA achieves 2× forward speedup (1,145 BF16 TFLOPs, ~97% tensor core utilization), 1.6× backward speedup, and up to 3.5× forward speedup vs. FA4 under short K/V settings on NVIDIA B200. This work demonstrates how real production traffic can fundamentally reshape kernel design. ✍ Jiaqi Xu, Han Xu, Junqing Zhou, Devashish Shankar, Xiaoyi (Leo) Liu, Shuqi Yang #PyTorch #OpenSourceAI #GDPA #GEM
PyTorch tweet media
English
1
20
151
13.7K
PyTorch
PyTorch@PyTorch·
3️⃣ days left before rates rise! Registration for #PyTorchCon Europe increases €100 after 20 March. 🐦 Join us in Paris, 7-8 April, where you'll find friends with shared interest at BoFs: hubs.la/Q046LJWJ0 🎟 Register: hubs.la/Q046LL3H0
English
1
1
16
5.3K
PyTorch
PyTorch@PyTorch·
🚀 Put your brand in front of the #PyTorch community. Sponsor #PyTorchCon Europe, 7–8 April in Paris, and connect with the researchers, engineers & #ML leaders building the next generation of #AI. Showcase your tech. Meet top talent. Build real partnerships. 🤝 Explore opportunities: hubs.la/Q046LpSr0
English
0
4
16
6.4K
PyTorch
PyTorch@PyTorch·
Before we head to Paris for PyTorch Conference EU 2026, we’re looking back on PyTorch Conference 2025 keynotes from visionary AI leaders in our community. We’re starting with Eli Uriegas (@_seemethere) from Meta on testing PyTorch at scale: 11,000 commits and 794 million minutes of CI/CD compute in 2025 alone. This session recognizes the partners working across competitive lines to keep the ecosystem stable for everyone. Watch the full keynote: youtu.be/xWjXsP1E5mQ?si… See you in Paris: pytorch.org/event/pytorch-… #PyTorch #OpenSourceAI #PyTorchConf
YouTube video
YouTube
PyTorch tweet media
English
0
3
27
6.1K
PyTorch nag-retweet
vLLM
vLLM@vllm_project·
🎉 Congrats to @nvidia on the release of Nemotron 3 Super — day-0 support in vLLM v0.17.1! Verified on NVIDIA GPUs. 120B hybrid MoE, only 12B active at inference. Big upgrades over the previous Nemotron Super: - 5x higher throughput - 2x higher accuracy on Artificial Analysis Intelligence Index - Multi-Token Prediction (MTP) for faster long-form generation - Configurable thinking budget — dial accuracy vs token cost per task - 1M token context window Supports BF16, FP8, and NVFP4. Fully open: weights, datasets, recipes. Blog: vllm.ai/blog/nemotron-… 🤝 Thanks @NVIDIAAIDev Nemotron team and vLLM community contributors!
vLLM tweet media
NVIDIA AI Developer@NVIDIAAIDev

Introducing NVIDIA Nemotron 3 Super 🎉 Open 120B-parameter (12B active) hybrid Mamba-Transformer MoE model Native 1M-token context Built for compute-efficient, high-accuracy multi-agent applications Plus, fully open weights, datasets and recipes for easy customization and deployment. 🧵

English
12
31
347
43.9K