Alvaro Somoza

121 posts

Alvaro Somoza

Alvaro Somoza

@OzzyGT

ML Engineer @HuggingFace

Chile Katılım Mart 2009
188 Takip Edilen283 Takipçiler
Alvaro Somoza retweetledi
Jiaxiang Liu
Jiaxiang Liu@lclbrew·
1/ we are excited to release ERNIE-Image, after 3 months of building from scratch. an 8b text-to-image model from baidu's ernie image team. honestly, we didn't expect an 8b dit to get this far, this fast. strong instruction following. best-in-class text rendering. runs on a 24gb gpu. huge thanks to the ERNIE-Image team, this wouldn't exist without an incredibly talented group of people who shipped fast and cared deeply. thread below. 👇 👇
English
9
11
99
6.7K
Alvaro Somoza
Alvaro Somoza@OzzyGT·
Ernie-Image is pretty impressive, it can do a lot of things that other open source models couldn't. You can now use it with diffusers.
Alvaro Somoza tweet media
English
0
0
4
279
Alvaro Somoza
Alvaro Somoza@OzzyGT·
you don't need an AI agent when you have a cat.
English
0
0
1
259
Alvaro Somoza retweetledi
Sayak Paul
Sayak Paul@RisingSayak·
Introducing Modular Diffusers 🔥 The `DiffusionPipeline` abstraction in Diffusers has established a standard in the community. But it has also limited flexibility. Modular Diffusers breaks those shackles & enables the next gen. of creative user workflows 🧨 Details ⬇️
English
7
9
87
7.6K
Alvaro Somoza
Alvaro Somoza@OzzyGT·
@NoonienStar for that we need to wait for the condition pipeline to be merged. But for I2V and control it will lower the number of frames by a lot, this is just text to video, with image to video or video to video with those constraints probably it will be 10-8s for that resolution.
English
0
0
1
47
Noonien Star
Noonien Star@NoonienStar·
@OzzyGT Is there a working DiTFlow implementation that takes the motion of a reference video (20s) to a newly synthesized one? At similar vram constraints?
English
1
0
0
27
Alvaro Somoza
Alvaro Somoza@OzzyGT·
Finally got some time to play with LTX2. With diffusers, you can generate 20-second videos with 24 GB of VRAM and 10-second videos with 16 GB GPUs, both with less than 32 GB of RAM. Here are some recipes to suit your needs: github.com/asomoza/diffus…
English
1
2
8
457
Kristjan Retter
Kristjan Retter@KristjanRetter·
@OzzyGT Nice work ;). Lora and Compile examples would also be nice
English
1
0
1
75
Alvaro Somoza
Alvaro Somoza@OzzyGT·
I've created a new repository with diffusers recipes, starting with Z-Image. It has easy copy & paste code with benchmarks (RAM, VRAM, inference time), so you can choose the best optimization for your environment: github.com/asomoza/diffus…
English
1
0
4
366
Alvaro Somoza retweetledi
Sayak Paul
Sayak Paul@RisingSayak·
Christmas came early in the Diffusers bandwagon 🎄 It's out folks! Go, check it 🔥
Sayak Paul tweet media
English
0
2
37
1.7K
Alvaro Somoza
Alvaro Somoza@OzzyGT·
While also testing Z-Image-Turbo, I tried by mistake a prompt with an hexadecimal color and it worked, so I tested it more and it also understands colors and gradients!
Alvaro Somoza tweet mediaAlvaro Somoza tweet media
English
3
0
5
554
Alvaro Somoza
Alvaro Somoza@OzzyGT·
I was reading the Z Image Turbo report and saw that it can understand multiple languages, so I tried Spanish, which is my native language, and it delivered everything I asked for.
Alvaro Somoza tweet mediaAlvaro Somoza tweet media
English
0
0
2
350
Alvaro Somoza
Alvaro Somoza@OzzyGT·
If you want to use the new FLUX.2-dev with 8–12 GB GPUs, you can do it with Diffusers. You'll need to use the remote text encoder and this script: gist.github.com/asomoza/301be3… In case you're wondering, the remote text encoder is free to use, you just need a hf token.
Alvaro Somoza tweet media
English
0
0
1
147
Alvaro Somoza
Alvaro Somoza@OzzyGT·
@xhinker @RisingSayak that's true and you're not the first one to ask for this, I'm thinking of opening a repo with examples and best practices for the popular models so people can just browse it. Also there's some other efforts we're doing to bring a better experience to the users.
English
0
0
1
17
Andrew Zhu
Andrew Zhu@xhinker·
@OzzyGT @RisingSayak 2) But it really take a lot of time and search to build up a correct pipeline that performs the best. It would be great if there is a best practise pipe samples, that can performs equally or better than ComfyUI. and I am happy to help test out if there are any, thank you @OzzyGT
English
1
0
0
30
Andrew Zhu
Andrew Zhu@xhinker·
Start using #Comfyui, and test a bit, the backend of comfyui is doing really really well in terms of VRAM management and inference speed optimization out of box compare with Huggingface #diffusers @RisingSayak @OzzyGT
English
1
0
1
96
Alvaro Somoza
Alvaro Somoza@OzzyGT·
@xhinker @RisingSayak I'll do some tests but my experience wasn't the same one, I tested WAN2.1 and saw a quality drop and for Flux I tested with a 24GB GPU and it was slower than using group offloading. It has been a while since then so I'll do some new benchmarks and see if something changed.
English
1
0
1
18
Andrew Zhu
Andrew Zhu@xhinker·
@OzzyGT @RisingSayak 1) I know, I compared both Flux and Wan2.2. out of box, ComfyUI use way less VRAM and faster. I know there should be ways to boost up Diffusers performance, and build a pipe using components and unload individually, and apply optmized attentions, compile etc.
English
1
0
0
86
Alvaro Somoza
Alvaro Somoza@OzzyGT·
@KristjanRetter I thought that was implied in my answer sorry. We don't have a limit on how many loras you can load, so yes, you can use any other loras you want but if there's one that fails you can open an issue with it.
English
0
0
0
71
Kristjan Retter
Kristjan Retter@KristjanRetter·
@OzzyGT I meant, can I add an additional LoRA on top of lightx2v? 🙂
English
1
0
0
21
Alvaro Somoza
Alvaro Somoza@OzzyGT·
@KristjanRetter yes, in fact, you now can just load the lighting lora without the need of using these models: #lora-for-faster-inference" target="_blank" rel="nofollow noopener">huggingface.co/docs/diffusers…
English
1
0
0
81
Alvaro Somoza
Alvaro Somoza@OzzyGT·
so it turns out we don't need to do anything special for the text encoder, this is with both the transformer and text encoder using bitsandbytes with 4-bit quantization, using under 16GB of VRAM and in ~1m40s with a 3090
Alvaro Somoza tweet media
English
1
0
5
329