Christian Puhrsch

17 posts

Christian Puhrsch

@cpuhrsch

Katılım Mart 2014

74 Takip Edilen242 Takipçiler

Christian Puhrsch retweetledi

AI at Meta@AIatMeta·27 Mar

We’re releasing SAM 3.1: a drop-in update to SAM 3 that introduces object multiplexing to significantly improve video processing efficiency without sacrificing accuracy. We’re sharing this update with the community to help make high-performance applications feasible on smaller, more accessible hardware. 🔗 Model Checkpoint: go.meta.me/8dd321 🔗 Codebase: go.meta.me/b0a9fb

English

106

274

2.2K

330.2K

Christian Puhrsch retweetledi

PyTorch@PyTorch·26 Şub

Check out our latest series on Accelerating Generative AI using PyTorch: Segment Anything 2. 13x speedup over eager, all using native PyTorch and auto-scaling cloud infrastructure from Modal. 👉 Read it here: pytorch.org/blog/accelerat… #pytorch #genai #ai #opensource

English

15.7K

Christian Puhrsch retweetledi

PyTorch@PyTorch·25 Haz

Make Flux go brrr on H100s without bells and whistles ⚡️ We're excited to provide a simple recipe, dubbed `flux-fast`, providing a 2.5x speedup on H100 GPUs. 🔗 Blog: hubs.la/Q03tBKP70 ➡️ Code: hubs.la/Q03tBF8-0 By Joel Schlosser & @RisingSayak

English

128

15.7K

Christian Puhrsch retweetledi

Aryan V S@aryanvs_·18 Ara

We shipped @PyTorch's TorchAO as one of the officially supported quantization backends in Diffusers 🧨🔥 With TorchAO, we have all kinds of exotic quant types with weight-only and dynamic quantization. This is topped by off-the-shelf `torch.compile()` support! Enjoy 🤗

English

145

28.2K

Christian Puhrsch retweetledi

PyTorch@PyTorch·27 Eyl

We’re happy to officially launch torchao, a PyTorch native library that makes models faster and smaller by leveraging low bit dtypes, quantization & sparsity. Our techniques are written in easy-to-read PyTorch code spanning both inference & training: hubs.la/Q02RnjwJ0

English

118

642

80.3K

Christian Puhrsch retweetledi

Sayak Paul@RisingSayak·6 Eyl

Life is too short to wait for good images & videos from diffusion models. Excited to present e2e optimization recipes for image (Flux.1-Dev) & video (Cog-5B) gen w/ diffusers & torchao. Excellent latency-memory tradeoffs at ease 🏎️ For example, you can now run Cog 3.1 GB with quantization with quantization (+ some other offloading shenanigans). Code: github.com/sayakpaul/diff… Hop in 🧵

English

156

11.8K

Christian Puhrsch retweetledi

Sayak Paul@RisingSayak·19 Ağu

Running Flux under 17GBs WITHOUT bells and whistles🔔🪈 ❌ No `enable_model_cpu_offload()` ❌ No NF4 ✅ All components on the GPU ✅ `torchao` ✅ Negligible compromise in latency (`torch.compile()` ❤️) Code 🔽 gist.github.com/sayakpaul/e1f2… How? Follow 🧵 1/4

English

127

13.5K

Christian Puhrsch retweetledi

PyTorch@PyTorch·7 Ağu

Introducing FlexAttention: a new API that lets you implement diverse attention variants in just a few lines of idiomatic PyTorch code. 🔥 Check out the blog post for more details: hubs.la/Q02KsKNR0

English

475

70.2K

Christian Puhrsch retweetledi

Hamel Husain@HamelHusain·11 Haz

Great presentation from @marksaroufim and Jane Xu on managing/debugging GPU vRAM, w/an example of optimizing torch-tune using @answerdotai training scripts as a benchmark! From maven.com/parlance-labs/…

English

219

42.2K

Christian Puhrsch retweetledi

PyTorch@PyTorch·14 May

Accelerating ViTs 1.46x in PyTorch using our new Block Sparse Kernels 🔥 Check it out: hubs.la/Q02x749T0

English

127

20K

Christian Puhrsch retweetledi

PyTorch@PyTorch·23 Oca

Accelerating Generative AI Part 4️⃣✨ In this blog, we’ll focus on speeding up FAIR’s Seamless M4T-v2 model 🚀 Check out the results: hubs.la/Q02hsH2s0

English

223

28.4K

Christian Puhrsch retweetledi

Hugging Face@huggingface·4 Oca

Check out our latest work on the PyTorch blog where we show how to optimize a default SDXL pipeline by 3x using pure PyTorch. No C++ code. Check out the blog post here 👉 pytorch.org/blog/accelerat…

PyTorch@PyTorch

3x faster text-to-image diffusion models, all in pure PyTorch. No C++ needed. Check out our third blog post in the series on Accelerating Generative AI using Native PyTorch. 🔥 hubs.la/Q02f9TKh0

English

38K

Christian Puhrsch retweetledi

PyTorch@PyTorch·3 Oca

3x faster text-to-image diffusion models, all in pure PyTorch. No C++ needed. Check out our third blog post in the series on Accelerating Generative AI using Native PyTorch. 🔥 hubs.la/Q02f9TKh0

English

115

659

117.1K

Christian Puhrsch retweetledi

Joe Isaacson@Jisaacso·14 Ara

I'm excited for our NeurIPS LLM Efficiency Competition workshop tomorrow: 1LLM + 1GPU + 1Day! Stop by 1:30 CT to see Weiwei Yang (MSR), @marksaroufim , @jeremyphoward , @rasbt , Ao Liu, @Tim_Dettmers , @sourab_m , @KemingLu612 , @mojan_jp , @LChoshen , Vicki Boykis, @cpuhrsch

English

12.2K

Christian Puhrsch retweetledi

PyTorch@PyTorch·30 Kas

10x performance on LLaMa 7B, all in native PyTorch. No C++ needed. Check out our second blog post in a series on Accelerating Generative AI using Native PyTorch. 🔥 hubs.la/Q02bx5Bn0

English

256

1.5K

198.5K

Christian Puhrsch@cpuhrsch·19 Kas

@BranislavHesko @soumithchintala Yes, significantly. You can find the results under github.com/pytorch-labs/s… for vit_b and github.com/pytorch-labs/s… for vit_h

English

Branislav Hesko@BranislavHesko·18 Kas

@soumithchintala @cpuhrsch I wonder what was the final time for batch size of 1. Did it improve?

English

Soumith Chintala@soumithchintala·17 Kas

make "Segment Anything" 8x faster by just using PyTorch's more advanced features. No C++, no custom extensions. great post (and work) by @cpuhrsch and the team!

Mark Saroufim@marksaroufim

Very pumped that this blog is finally out pytorch.org/blog/accelerat… 8x perf improvements over SAM in native PyTorch, no C++ needed. The blog is fantastic as a case study of different optimizations that matter 1. torch.compile and how to rewrite your graph breaks away 2. Write your quant and dequant APIs in pure torch and torch.compile them 3. SDPA as an easy wrapper to dispatch to either flash attention or memory attention kernels 4. 2:4 sparsity 5. Nested tensors to batch different sized images 6. Some custom, trivial to integrate Triton kernels Big thanks to @cpuhrsch for spearheading our v-team

English

326

75.5K

Christian Puhrsch retweetledi

PyTorch@PyTorch·16 Kas

New blog series: Accelerating Generative AI using native PyTorch. 🔥 In this post we talk through new PyTorch performance features from the conference and how they can be used to produce an 8x faster, entirely PyTorch implementation of Segment Anything. hubs.la/Q0298GN00

English

301

53.6K

Keşfet

@RisingSayak @PyTorch @marksaroufim @answerdotai @jeremyphoward @rasbt @Tim_Dettmers @sourab_m