Stable Diffusion Tutorials

2.1K posts

Stable Diffusion Tutorials

@SD_Tutorial

👉 Ai models local installation 👉 Comfy Workflows 👉 Tutorials (Image Gen, Video gen) FOLLOW WEBSITE 👇👇

Bergabung Mayıs 2018

88 Mengikuti1.6K Pengikut

Tweet Disematkan

Stable Diffusion Tutorials@SD_Tutorial·29 Haz

Generate PROMPTS like a PRO 😹 👇 #stablediffusion #aipromptgenerator 🔖Bookmark (save) for future reference.... 🩷 Like if you loved it.... 🖊️Comment/Suggestions if you are getting any error....

English

3.5K

Stable Diffusion Tutorials@SD_Tutorial·12h

WorldMesh😌 Generating Navigable Multi-Room 3D Scenes via Mesh-Conditioned Image Diffusion To generate a complex, multi-room 3D scene from a text prompt, problem into first constructing the global scene structure as a mesh scaffold (top) github page: mschneider456.github.io/world-mesh/

English

Stable Diffusion Tutorials@SD_Tutorial·12h

Foveated Diffusion: 😃😌 Efficient Spatially Aware Image and Video Generation iteratively denoise a foveated token sequence of reduced length instead of the full high-resolution sequence. github page:👇 bchao1.github.io/foveated-diffu…

English

Stable Diffusion Tutorials@SD_Tutorial·13h

Kling's opensourced X-DUb X-Dub is a visual dubbing system that synchronizes a character's lip movements in a video to match arbitrary input audio. the public Wan-based X-Dub release and its pretrained weights. paper: arxiv.org/abs/2512.25066 HF repo: huggingface.co/KlingTeam/X-Dub

English

114

Stable Diffusion Tutorials@SD_Tutorial·13h

ComfyUI-Wan-VACE-Video-Joiner 👇😍😃 - point this workflow at a directory of clips and it will automatically stitch them together - fixing awkward motion and artifacts. - Wan VACE generates new frames guided by context on both sides Github:👇 github.com/stuttlepress/C…

English

Stable Diffusion Tutorials@SD_Tutorial·13h

ComfyUI-DaVinci-MagiHuman 😀 -Block-level CPU/GPU swapping -Async CUDA prefetching -Distill mode -1080p super-resolution -TurboVAE decoder -Audio + video Access github repo: 👇 github.com/mjansrud/Comfy…

Adina Yakup@AdinaYakup

daVinci-MagiHuman 🎬 Human Centric Audio-Video Generative Model by GAIR Model: huggingface.co/GAIR/daVinci-M… Paper: huggingface.co/GAIR/daVinci-M… ✨ 15B – Fully open source! ✨ 5-sec 1080p video in 38s on one H100 ✨ Supports 6 languages ✨ Unified model with text + video + audio

English

129

Stable Diffusion Tutorials@SD_Tutorial·13h

ltx2.3 running in realtime_on_a_4090 😃👇 Detailed post: reddit.com/r/StableDiffus…

English

Stable Diffusion Tutorials me-retweet

Adina Yakup@AdinaYakup·2d

Matrix-Game 3.0🔥real-time interactive world models from @Skywork_ai huggingface.co/Skywork/Matrix… ✨ MIT license ✨ 720p @ 40FPS with a 5B model ✨ Minute-long memory consistency ✨ Unreal + AAA + real-world data ✨ Scales up to 28B MoE

English

103

621

41.4K

Stable Diffusion Tutorials me-retweet

Black Forest Labs@bfl_ml·5d

Black Forest is focused on visual intelligence as the next AI frontier. On a panel with Jensen at @nvidia GTC, our CEO @robrombach defined visual intelligence as understanding the visual world and simulating it to power content creation, real-time applications, and robotics. Link to the full panel in the comments 🎥

English

5.5K

Stable Diffusion Tutorials me-retweet

Adina Yakup@AdinaYakup·5d

English

339

41.6K

Stable Diffusion Tutorials me-retweet

ModelScope@ModelScope2022·5d

PrismAudio is open source👏👏👏 a 518M V2A model accepted at ICLR 2026, achieving SOTA across all four perceptual dimensions on both VGGSound and the new AudioCanvas benchmark. 👀 Demo video below ⬇️ Model: modelscope.ai/models/iic/Pri… Demo: modelscope.cn/studios/iic/Pr… Paper: modelscope.ai/papers/2511.18… GitHub: github.com/FunAudioLLM/Th… 🧠 Decomposes V2A reasoning into four specialized CoT modules: Semantic, Temporal, Aesthetic, and Spatial — each with targeted reward functions 🎯 First framework to integrate RL into V2A generation via decomposed CoT planning ⚡ Fast-GRPO: hybrid ODE-SDE sampling that dramatically reduces RL training overhead 🏆 VGGSound: tops all baselines on CLAP, DeSync, PQ, and subjective MOS scores — at 0.63s inference, faster than MMAudio (1.30s) and ThinkSound (1.07s) 🌍 AudioCanvas (out-of-domain): CLAP 0.52, MOS-Q 4.12, beats HunyuanVideo-Foley, MMAudio, ThinkSound 📊 AudioCanvas benchmark released: 300 single-event classes + 501 multi-event samples

English

147

22.9K

Stable Diffusion Tutorials me-retweet

LTX@ltx_model·4d

x.com/i/article/2036…

ZXX

306

15.7K

Stable Diffusion Tutorials@SD_Tutorial·22 Mar

Wan 2.7 is planned to launch within March - improvements in: visual quality audio motion dynamics stylization consistency first-frame & last-frame video generation 9-grid image-to-video subject + voice reference instruction-based video editing video recreation / replication

English

316

Stable Diffusion Tutorials@SD_Tutorial·22 Mar

ComfyUI custom nodes for ID-LoRA-2.3 inference — audio+video generation with speaker identity transfer, built on top of LTX-2.3. Supports both one-stage (single resolution) and two-stage (2x spatial upsampling) pipelines. 👇😃 github.com/ID-LoRA/ID-LoR…

Stable Diffusion Tutorials@SD_Tutorial

ID-Lora: Identity-Driven Audio-Video Personalization with In-Context LoRA 😃😄 Generate video and audio of a specific person from a single text prompt, a reference image, and a short audio clip — all in one model. Now supporting LTX 2.3. Paper:👇 id-lora.github.io

English

274

Stable Diffusion Tutorials@SD_Tutorial·22 Mar

English

496

Stable Diffusion Tutorials@SD_Tutorial·22 Mar

Official reported by Alibaba: 😃😍 New Qwen and Wan models will be Opensource !!!

English

141

Stable Diffusion Tutorials me-retweet

lodestone-rock@LodestoneRock·18 Mar

z-image to pixel space conversion progress is looking good still need more training time for sure, super crunchy. anyone can try this underbaked model, it's supported in comfy. it should be 3-4x faster than z-image base. weight updated every hour here: huggingface.co/lodestones/Zet…

English

195

25.7K

Stable Diffusion Tutorials@SD_Tutorial·19 Mar

SkyReels-V4 is coming... 😊😋 Multi-modal Video-Audio Generation, Inpainting and Editing model research paper:🤪😌 huggingface.co/papers/2602.21…

English

228

Stable Diffusion Tutorials me-retweet

Tongyi Lab@Ali_TongyiLab·17 Mar

We are thrilled to see what our community can build! Developer @dx8152 has just dropped a specialized Style Transfer LoRA based on Qwen-Image-Edit-2511, which significantly simplifies the way we remix and transform visual aesthetics. Built using ModelScope's code-free training pipeline, this project is a perfect example of how the right infrastructure empowers creators to turn complex ideas into reality.

English

232

12.1K

Stable Diffusion Tutorials@SD_Tutorial·16 Mar

LTX 2.3 LORA with multiple characters + style

English

136

Stable Diffusion Tutorials@SD_Tutorial·16 Mar

Z-Image-Turbo-SDA 🥰☺️ A highly efficient LoKr (Low-Rank Kronecker Product) adapter designed to rescue the "Diversity Collapse" problem in few-step distilled Flow Matching / Diffusion models. 👇👇 huggingface.co/F16/z-image-tu…

English

273

Jelajahi

@Skywork_ai @nvidia @robrombach @dx8152 @elonmusk @BarackObama @taylorswift13 @cristiano