Alvaro Somoza

121 posts

Alvaro Somoza

@OzzyGT

ML Engineer @HuggingFace

Chile Katılım Mart 2009

188 Takip Edilen283 Takipçiler

Alvaro Somoza retweetledi

Jiaxiang Liu@lclbrew·14 Nis

1/ we are excited to release ERNIE-Image, after 3 months of building from scratch. an 8b text-to-image model from baidu's ernie image team. honestly, we didn't expect an 8b dit to get this far, this fast. strong instruction following. best-in-class text rendering. runs on a 24gb gpu. huge thanks to the ERNIE-Image team, this wouldn't exist without an incredibly talented group of people who shipped fast and cared deeply. thread below. 👇 👇

English

6.7K

Alvaro Somoza@OzzyGT·15 Nis

Ernie-Image is pretty impressive, it can do a lot of things that other open source models couldn't. You can now use it with diffusers.

English

279

Alvaro Somoza@OzzyGT·17 Mar

you don't need an AI agent when you have a cat.

English

259

Alvaro Somoza retweetledi

Sayak Paul@RisingSayak·6 Mar

Introducing Modular Diffusers 🔥 The `DiffusionPipeline` abstraction in Diffusers has established a standard in the community. But it has also limited flexibility. Modular Diffusers breaks those shackles & enables the next gen. of creative user workflows 🧨 Details ⬇️

English

7.6K

Alvaro Somoza retweetledi

Ostris@ostrisai·4 Mar

YouTube Video youtu.be/CXJ95qI_9Xg

YouTube

English

1.8K

Alvaro Somoza@OzzyGT·19 Şub

@NoonienStar for that we need to wait for the condition pipeline to be merged. But for I2V and control it will lower the number of frames by a lot, this is just text to video, with image to video or video to video with those constraints probably it will be 10-8s for that resolution.

English

Noonien Star@NoonienStar·19 Şub

@OzzyGT Is there a working DiTFlow implementation that takes the motion of a reference video (20s) to a newly synthesized one? At similar vram constraints?

English

Alvaro Somoza@OzzyGT·18 Şub

Finally got some time to play with LTX2. With diffusers, you can generate 20-second videos with 24 GB of VRAM and 10-second videos with 16 GB GPUs, both with less than 32 GB of RAM. Here are some recipes to suit your needs: github.com/asomoza/diffus…

English

457

Alvaro Somoza@OzzyGT·16 Ara

@KristjanRetter thanks, I'll add those too

English

Kristjan Retter@KristjanRetter·16 Ara

@OzzyGT Nice work ;). Lora and Compile examples would also be nice

English

Alvaro Somoza@OzzyGT·16 Ara

I've created a new repository with diffusers recipes, starting with Z-Image. It has easy copy & paste code with benchmarks (RAM, VRAM, inference time), so you can choose the best optimization for your environment: github.com/asomoza/diffus…

English

366

Alvaro Somoza retweetledi

Sayak Paul@RisingSayak·8 Ara

Christmas came early in the Diffusers bandwagon 🎄 It's out folks! Go, check it 🔥

English

1.7K

Alvaro Somoza@OzzyGT·27 Kas

While also testing Z-Image-Turbo, I tried by mistake a prompt with an hexadecimal color and it worked, so I tested it more and it also understands colors and gradients!

English

554

Alvaro Somoza@OzzyGT·27 Kas

I was reading the Z Image Turbo report and saw that it can understand multiple languages, so I tried Spanish, which is my native language, and it delivered everything I asked for.

English

350

Alvaro Somoza@OzzyGT·26 Kas

If you want to use the new FLUX.2-dev with 8–12 GB GPUs, you can do it with Diffusers. You'll need to use the remote text encoder and this script: gist.github.com/asomoza/301be3… In case you're wondering, the remote text encoder is free to use, you just need a hf token.

English

147

Alvaro Somoza@OzzyGT·10 Eyl

@xhinker @RisingSayak that's true and you're not the first one to ask for this, I'm thinking of opening a repo with examples and best practices for the popular models so people can just browse it. Also there's some other efforts we're doing to bring a better experience to the users.

English

Andrew Zhu@xhinker·9 Eyl

@OzzyGT @RisingSayak 2) But it really take a lot of time and search to build up a correct pipeline that performs the best. It would be great if there is a best practise pipe samples, that can performs equally or better than ComfyUI. and I am happy to help test out if there are any, thank you @OzzyGT

English

Andrew Zhu@xhinker·9 Eyl

Start using #Comfyui, and test a bit, the backend of comfyui is doing really really well in terms of VRAM management and inference speed optimization out of box compare with Huggingface #diffusers @RisingSayak @OzzyGT

English

Alvaro Somoza@OzzyGT·10 Eyl

@xhinker @RisingSayak I'll do some tests but my experience wasn't the same one, I tested WAN2.1 and saw a quality drop and for Flux I tested with a 24GB GPU and it was slower than using group offloading. It has been a while since then so I'll do some new benchmarks and see if something changed.

English

Andrew Zhu@xhinker·9 Eyl

@OzzyGT @RisingSayak 1) I know, I compared both Flux and Wan2.2. out of box, ComfyUI use way less VRAM and faster. I know there should be ways to boost up Diffusers performance, and build a pipe using components and unload individually, and apply optmized attentions, compile etc.

English

Alvaro Somoza@OzzyGT·18 Ağu

@KristjanRetter I thought that was implied in my answer sorry. We don't have a limit on how many loras you can load, so yes, you can use any other loras you want but if there's one that fails you can open an issue with it.

English

Kristjan Retter@KristjanRetter·18 Ağu

@OzzyGT I meant, can I add an additional LoRA on top of lightx2v? 🙂

English

Alvaro Somoza@OzzyGT·10 Ağu

Qwen-Image-Lightning 8-Steps runs in 22s and using less than 16GB with a 3090. You can find the models and the code to test it here: huggingface.co/OzzyGT/qwen-im…

English

1.4K

Alvaro Somoza@OzzyGT·18 Ağu

@KristjanRetter yes, in fact, you now can just load the lighting lora without the need of using these models: #lora-for-faster-inference" target="_blank" rel="nofollow noopener">huggingface.co/docs/diffusers…

English

Kristjan Retter@KristjanRetter·18 Ağu

@OzzyGT Could I add Lora to it? 👀

English

Alvaro Somoza@OzzyGT·5 Ağu

@pelolisu Qwen2_5_VLForConditionalGeneration

Dansk

Eliseu Silva@pelolisu·5 Ağu

@OzzyGT What is the text encoder model?

English

Alvaro Somoza@OzzyGT·5 Ağu

so it turns out we don't need to do anything special for the text encoder, this is with both the transformer and text encoder using bitsandbytes with 4-bit quantization, using under 16GB of VRAM and in ~1m40s with a 3090

English

329

Keşfet

@NoonienStar @KristjanRetter @xhinker @RisingSayak @elonmusk @BarackObama @taylorswift13 @cristiano