Christian Puhrsch

17 posts

Christian Puhrsch

Christian Puhrsch

@cpuhrsch

Katılım Mart 2014
74 Takip Edilen242 Takipçiler
Christian Puhrsch retweetledi
AI at Meta
AI at Meta@AIatMeta·
We’re releasing SAM 3.1: a drop-in update to SAM 3 that introduces object multiplexing to significantly improve video processing efficiency without sacrificing accuracy. We’re sharing this update with the community to help make high-performance applications feasible on smaller, more accessible hardware. 🔗 Model Checkpoint: go.meta.me/8dd321 🔗 Codebase: go.meta.me/b0a9fb
AI at Meta tweet media
English
106
274
2.2K
330.2K
Christian Puhrsch retweetledi
PyTorch
PyTorch@PyTorch·
Check out our latest series on Accelerating Generative AI using PyTorch: Segment Anything 2. 13x speedup over eager, all using native PyTorch and auto-scaling cloud infrastructure from Modal. 👉 Read it here: pytorch.org/blog/accelerat… #pytorch #genai #ai #opensource
PyTorch tweet media
English
5
16
90
15.7K
Christian Puhrsch retweetledi
PyTorch
PyTorch@PyTorch·
Make Flux go brrr on H100s without bells and whistles ⚡️ We're excited to provide a simple recipe, dubbed `flux-fast`, providing a 2.5x speedup on H100 GPUs. 🔗 Blog: hubs.la/Q03tBKP70 ➡️ Code: hubs.la/Q03tBF8-0 By Joel Schlosser & @RisingSayak
PyTorch tweet media
English
2
16
128
15.7K
Christian Puhrsch retweetledi
Aryan V S
Aryan V S@aryanvs_·
We shipped @PyTorch's TorchAO as one of the officially supported quantization backends in Diffusers 🧨🔥 With TorchAO, we have all kinds of exotic quant types with weight-only and dynamic quantization. This is topped by off-the-shelf `torch.compile()` support! Enjoy 🤗
Aryan V S tweet media
English
9
31
145
28.2K
Christian Puhrsch retweetledi
PyTorch
PyTorch@PyTorch·
We’re happy to officially launch torchao, a PyTorch native library that makes models faster and smaller by leveraging low bit dtypes, quantization & sparsity. Our techniques are written in easy-to-read PyTorch code spanning both inference & training: hubs.la/Q02RnjwJ0
PyTorch tweet media
English
6
118
642
80.3K
Christian Puhrsch retweetledi
Sayak Paul
Sayak Paul@RisingSayak·
Life is too short to wait for good images & videos from diffusion models. Excited to present e2e optimization recipes for image (Flux.1-Dev) & video (Cog-5B) gen w/ diffusers & torchao. Excellent latency-memory tradeoffs at ease 🏎️ For example, you can now run Cog 3.1 GB with quantization with quantization (+ some other offloading shenanigans). Code: github.com/sayakpaul/diff… Hop in 🧵
Sayak Paul tweet media
English
5
20
156
11.8K
Christian Puhrsch retweetledi
Sayak Paul
Sayak Paul@RisingSayak·
Running Flux under 17GBs WITHOUT bells and whistles🔔🪈 ❌ No `enable_model_cpu_offload()` ❌ No NF4 ✅ All components on the GPU ✅ `torchao` ✅ Negligible compromise in latency (`torch.compile()` ❤️) Code 🔽 gist.github.com/sayakpaul/e1f2… How? Follow 🧵 1/4
Sayak Paul tweet media
English
5
17
127
13.5K
Christian Puhrsch retweetledi
PyTorch
PyTorch@PyTorch·
Introducing FlexAttention: a new API that lets you implement diverse attention variants in just a few lines of idiomatic PyTorch code. 🔥 Check out the blog post for more details: hubs.la/Q02KsKNR0
PyTorch tweet media
English
3
87
475
70.2K
Christian Puhrsch retweetledi
PyTorch
PyTorch@PyTorch·
Accelerating ViTs 1.46x in PyTorch using our new Block Sparse Kernels 🔥 Check it out: hubs.la/Q02x749T0
PyTorch tweet media
English
1
33
127
20K
Christian Puhrsch retweetledi
PyTorch
PyTorch@PyTorch·
Accelerating Generative AI Part 4️⃣✨ In this blog, we’ll focus on speeding up FAIR’s Seamless M4T-v2 model 🚀 Check out the results: hubs.la/Q02hsH2s0
PyTorch tweet media
English
1
47
223
28.4K
Christian Puhrsch retweetledi
PyTorch
PyTorch@PyTorch·
3x faster text-to-image diffusion models, all in pure PyTorch. No C++ needed. Check out our third blog post in the series on Accelerating Generative AI using Native PyTorch. 🔥 hubs.la/Q02f9TKh0
PyTorch tweet media
English
7
115
659
117.1K
Christian Puhrsch retweetledi
PyTorch
PyTorch@PyTorch·
10x performance on LLaMa 7B, all in native PyTorch. No C++ needed. Check out our second blog post in a series on Accelerating Generative AI using Native PyTorch. 🔥 hubs.la/Q02bx5Bn0
English
14
256
1.5K
198.5K
Soumith Chintala
Soumith Chintala@soumithchintala·
make "Segment Anything" 8x faster by just using PyTorch's more advanced features. No C++, no custom extensions. great post (and work) by @cpuhrsch and the team!
Mark Saroufim@marksaroufim

Very pumped that this blog is finally out pytorch.org/blog/accelerat… 8x perf improvements over SAM in native PyTorch, no C++ needed. The blog is fantastic as a case study of different optimizations that matter 1. torch.compile and how to rewrite your graph breaks away 2. Write your quant and dequant APIs in pure torch and torch.compile them 3. SDPA as an easy wrapper to dispatch to either flash attention or memory attention kernels 4. 2:4 sparsity 5. Nested tensors to batch different sized images 6. Some custom, trivial to integrate Triton kernels Big thanks to @cpuhrsch for spearheading our v-team

English
3
44
326
75.5K
Christian Puhrsch retweetledi
PyTorch
PyTorch@PyTorch·
New blog series: Accelerating Generative AI using native PyTorch. 🔥 In this post we talk through new PyTorch performance features from the conference and how they can be used to produce an 8x faster, entirely PyTorch implementation of Segment Anything. hubs.la/Q0298GN00
PyTorch tweet media
English
7
54
301
53.6K