PyTorch

3.3K posts

PyTorch

@PyTorch

Tensors and neural networks in Python with strong hardware acceleration. PyTorch is an open source project at the Linux Foundation. #PyTorchFoundation

Sumali Eylül 2016

86 Sinusundan495.2K Mga Tagasunod

NVIDIA AI@NVIDIAAI·2d

We're adopting the Linux Foundation’s OpenMDW framework across our open model families. This helps make open model licensing simpler and more consistent at scale. A single legal framework across models, code, documentation, and data helps reduce friction for developers and enterprises building with open source.

The Linux Foundation@linuxfoundation

OpenMDW-1.1 is now available — and @NVIDIAAI is adopting it across Cosmos, Isaac GR00T, Ising, and Nemotron model families. A permissive, unified legal framework purpose-built for AI models. Learn more at openmdw.ai

English

622

157.1K

PyTorch@PyTorch·1d

@NVIDIAAI 🙌🙌🙌

QME

PyTorch@PyTorch·1d

@AIatAMD @radixark Can't wait!

English

379

PyTorch@PyTorch·1d

And details about the joint KubeCon + CloudNativeCon + OpenInfra Summit + PyTorch Conference China 2026 are at: lfopensource.cn/kubecon-cloudn…

English

PyTorch@PyTorch·1d

Wondering what it's like to attend a PyTorch Conference? Attendees of the first ever #PyTorchCon Europe 2026 in Paris share a glimpse into what it's like... Watch the highlight reel on YouTube: youtu.be/0lKdvXZkS-4?si… Will we see you at PyTorch Conference China and PyTorch Conference North America later this year?

YouTube

English

11.3K

PyTorch@PyTorch·1d

Details about PyTorch Conference North America at: events.linuxfoundation.org/pytorch-confer… Note: the CFP for speaking submissions is currently open!

English

1.4K

PyTorch@PyTorch·2d

How to Eliminate Pipeline Friction in AI Model Serving There are numerous issues that are collectively known as pipeline friction, and they cost organizations time, money, and competitive advantage. This post provides actionable best practices for eliminating the most common sources of friction in AI model serving pipelines. Learn about the most frequent sources of pipeline friction, including unsupported operations, dynamic input sizes, version mismatches, and model export issues that might arise, for example, when converting from a training frameworks like PyTorch into optimized inference formats. Read the full post: developer.nvidia.com/blog/how-to-el…

English

6.5K

PyTorch@PyTorch·2d

"We believe the future of AI is built on open, production-proven infrastructure - and PyTorch sits at the heart of that future. Joining the PyTorch Foundation is a natural step given our years of running PyTorch at scale across heterogeneous hardware on Alibaba Cloud. We look forward to working alongside the PyTorch Foundation to raise the bar for AI infrastructure and help developers build the next generation of models with confidence," said Dr. Feifei Li, Chief Technology Officer, Alibaba Cloud. ITBrief @techday covers @alibaba_cloud joining the PyTorch Foundation as a Platinum member itbrief.news/story/alibaba-…

English

8.8K

PyTorch@PyTorch·2d

PyTorch Compile can make models run dramatically faster, but the real magic is what happens under the hood. This blog breaks down one of the most important optimizations behind torch.compile: kernel fusion. Instead of launching separate GPU kernels for every PyTorch operation, PyTorch Inductor can combine dependent operations into a single optimized Triton kernel. The result? Fewer kernel launches, less memory traffic, fewer intermediate tensors, and more efficient GPU execution. Learn more here 👉 tinyurl.com/ms9sdnyn

English

191

13K

PyTorch@PyTorch·3d

The speed-of-light optimization for Qwen3.5 on the TokenSpeed inference engine is a significant milestone, achieving a record-breaking 580 tokens per second (tps) for agentic workloads on NVIDIA GPUs. In the PyTorch Foundation's latest community blog post, you can learn all about the complete design, implementation, and optimization of Qwen3.5 models in the TokenSpeed inference framework and see for yourself how this work is improving performance 👉 bit.ly/4uGUvIS This achievement was a joint effort between the @Alibaba_Qwen inference team, @lightseekorg Foundation TokenSpeed team, @NVIDIAAI , and the Mooncake team, with special contributions from @tri_dao for FlashAttention-4 (FA4) optimization. @KVCache_AI

English

284

264.3K

PyTorch@PyTorch·3d

Don't miss the flagship #PyTorch event of the year! 🚀 Join us in San Jose, CA, from Oct 20-21 for #PyTorchCon North America. Early bird registration saves you $400 through July 31. Register: bit.ly/4sh3DSw

English

4.5K

PyTorch nag-retweet

LightSeek Foundation@lightseekorg·3d

Introducing EAGLE 3.1 — the next evolution of speculative decoding from @EagleCorp, developed by @hongyangzh, @dogacel0, and the EAGLE team in collaboration with vLLM @vllm_project and TorchSpec @lightseekorg. EAGLE 3.1 introduces a new FC normalization + post-normalization hidden-state feedback architecture that significantly improves long-context robustness, acceptance length, and serving stability across real-world inference environments. @NVIDIA has been instrumental in the large-scale training, benchmarking, and inference validation of EAGLE 3.1 to help bring this next step in inference acceleration to production environments. For EAGLE 3.1, the EAGLE team identified attention drift as a key bottleneck behind deeper-step acceptance-length degradation in speculative decoding.| The Results: • Up to 2× longer acceptance length in long-context • Stronger long-context + chat-template robustness • More stable serving across diverse prompts/environments • Native vLLM support • TorchSpec training support • Open-source Kimi K2.6 EAGLE 3.1 draft model Read more below 👇 lightseek.org/blog/eagle-3-1…

English

16.8K

PyTorch@PyTorch·3d

👉 Read the announcement: pytorch.org/blog/alibaba-c…

English

PyTorch@PyTorch·3d

We’re excited to welcome @alibaba_cloud as a Platinum Member of the PyTorch Foundation 🎉 Alibaba Cloud is a global leader in full-stack AI infrastructure and the force behind Qwen—one of the world’s most influential open-weight model families. Having run PyTorch at massive scale across diverse hardware, they bring invaluable, production-hardened engineering expertise to the upstream community.

English

16.3K

PyTorch@PyTorch·3d

Planning your trip to Shanghai? 🛫 Secure your pass for #KubeCon + #CloudNativeCon + #OpenInfraSummit + #PyTorchCon China (Sept 8-9) by July 28 to save ¥710 RMB (USD$100). Join the leaders of the intelligent era: bit.ly/3NG8hv1

English

4.6K

vLLM@vllm_project·3d

🎉Thrilled to announce EAGLE 3.1 - the next evolution of speculative decoding from @EagleCorp, developed by @hongyangzh, @dogacel0, and the EAGLE team in collaboration with vLLM @vllm_project and TorchSpec @lightseekorg! 💡EAGLE 3.1 introduces a new FC normalization + post-normalization hidden-state feedback architecture that significantly improves long-context robustness, acceptance length, and serving stability across real-world inference environments. Shoutout to @NVIDIA who has been instrumental in the large-scale training, benchmarking, and inference validation of EAGLE 3.1 to help bring this next step in inference acceleration to production environments. For EAGLE 3.1, the EAGLE team identified attention drift as a key bottleneck behind deeper-step acceptance-length degradation in speculative decoding. ✨What's new: • Up to 2× longer acceptance length in long-context • Stronger long-context + chat-template robustness • More stable serving across diverse prompts or environments • Native vLLM support • TorchSpec training support • Open-source Kimi K2.6 EAGLE 3.1 draft model 🔗 Blog: vllm.ai/blog/2026-05-2…

English

348

34.9K

PyTorch@PyTorch·3d

@vllm_project @EagleCorp @hongyangzh @dogacel0 @lightseekorg 🙌

QME

1.6K

PyTorch@PyTorch·4d

@Birchlabs We apologize, the folder sync was in progress due to folder restructuring. It should be visible now!

English

238

Birchlabs@Birchlabs·4d

@PyTorch yeah but where is it tho

English

637

PyTorch@PyTorch·4d

PyTorch member Meta just open-sourced a GPU kernel that makes attention 2.3x faster on NVIDIA Blackwell. TLX Block Attention is a warp-specialized Triton kernel built for block-diagonal self-attention — a pattern widely used in recommendation and feature-interaction models. By exploiting compile-time knowledge of the attention structure, entire stages of the Flash Attention algorithm have been eliminated: no multi-tile loops, no correction factors, no auxiliary tensors. The result: 2.3x kernel speedup, 3.5x when fused with rotary embeddings, and +30.6% MFU on production layers. Learn more here 👉 bit.ly/4fOoh9E Code: bit.ly/4e6SPlK

English

265

18K

PyTorch@PyTorch·4d

Model Optimization and Post-Training Quantization Model quantization is an effective method to reduce VRAM usage and improve inference performance on consumer devices. By lowering computational and memory requirements while preserving model quality, quantization helps AI models run more efficiently in resource-constrained environments. This post walks through how to use NVIDIA Model Optimizer to quantize a CLIP model in FP8 format with the post-training quantization (PTQ) method, including an example workflow exporting a PyTorch checkpoint. Read the complete blog post: developer.nvidia.com/blog/model-qua…

English

142

12.1K

PyTorch nag-retweet

Matt White@matthew_d_white·4d

Back from MLSys 2026 in Bellevue — a packed week at the intersection of AI and systems. Highlights: @marksaroufim on AI writing systems code, Lidong Zhou on “system intelligence,” deep sessions on LLM serving/training, agentic AI, kernels, compilers, edge ML, benchmarking, and Industry Day. Also great to see strong interest at the PyTorch Foundation booth all week. Thank you to everyone who stopped by — and especially to the volunteers who represented the community so well. #MLSys2026 #PyTorch #MLSystems #AIInfrastructure

English

6.6K

Tuklasin

@NVIDIAAI @AIatAMD @radixark @techday @alibaba_cloud @Alibaba_Qwen @lightseekorg @tri_dao