
Bram Wallace
68 posts



Introducing ChatGPT Images, powered by our flagship new image generation model. - Stronger instruction following - Precise editing - Detail preservation - 4x faster than before Rolling out today in ChatGPT for all users, and in the API as GPT Image 1.5.





sora.com signups are fully open


Today’s deployment is only possible with our new model, Sora Turbo. Huge shoutout to @bram_wallace and @JureZbontar who spearheaded Sora model acceleration research!




Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation paper page: huggingface.co/papers/2402.10… Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner" and "loser" images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation paper page: huggingface.co/papers/2402.10… Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner" and "loser" images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.


The output is pretty stunning - some highlights: 1. fewer denoising steps 2. cascade improves text generation 3. native image variation generation 4. works with controlnet!



Turboing into 2024! Why let SDXL-Base have all the fun? DPO-tuning 4-step SDXL-Turbo turns out to work pretty well. Original SDXL-Turbo on left, Turb(DP)o on right. The video shown is sped up 2x to stop typing being the bottleneck


