YUCHAO GU (@YuchaoGu) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

YUCHAO GU@YuchaoGu·14 May

🚀 We are excited to announce the release of AnyFlow, the first any-step video diffusion on-policy distillation (OPD) framework. By leveraging Flow Map distillation, AnyFlow significantly enhances model inference efficiency by reducing sample steps. (Code, models, and demos are now open-source!) Key Highlights: ⚡ Any-Step Generation: Unlike traditional distilled models tied to fixed step budgets, AnyFlow enables a single model to adapt to arbitrary inference budgets. It achieves high-quality few-step generation while providing stable improvements as more sampling steps are added. 🔀 Multiple Architectures: AnyFlow supports any-step distillation for both causal and bidirectional video diffusion models. 🎬 Multiple Tasks: AnyFlow supports Text-to-Video, Image-to-Video, and Video-to-Video generation within one causal video diffusion model. 📈 Scalable Performance: AnyFlow is validated from 1.3B up to 14B parameters. 📄 Paper: huggingface.co/papers/2605.13… 💻 Code: github.com/NVlabs/AnyFlow 🎨 Pre-trained Models: huggingface.co/collections/nv… 🎬 Demo: nvlabs.github.io/AnyFlow/demo

English

4

33

176

22.7K

YUCHAO GU@YuchaoGu·15 May

@eb1aexperts Thanks.

English

0

47

EB1A Experts@eb1aexperts·14 May

Very interesting direction. The any-step flexibility is a meaningful advance over fixed-step distilled models being able to adapt inference budget at runtime without retraining opens up a lot of practical deployment scenarios. Impressive to see this validated up to 14B parameters.

English

1

0

162

YUCHAO GU@YuchaoGu·14 May

🚀 We are excited to announce the release of AnyFlow, the first any-step video diffusion on-policy distillation (OPD) framework. By leveraging Flow Map distillation, AnyFlow significantly enhances model inference efficiency by reducing sample steps. (Code, models, and demos are now open-source!) Key Highlights: ⚡ Any-Step Generation: Unlike traditional distilled models tied to fixed step budgets, AnyFlow enables a single model to adapt to arbitrary inference budgets. It achieves high-quality few-step generation while providing stable improvements as more sampling steps are added. 🔀 Multiple Architectures: AnyFlow supports any-step distillation for both causal and bidirectional video diffusion models. 🎬 Multiple Tasks: AnyFlow supports Text-to-Video, Image-to-Video, and Video-to-Video generation within one causal video diffusion model. 📈 Scalable Performance: AnyFlow is validated from 1.3B up to 14B parameters. 📄 Paper: huggingface.co/papers/2605.13… 💻 Code: github.com/NVlabs/AnyFlow 🎨 Pre-trained Models: huggingface.co/collections/nv… 🎬 Demo: nvlabs.github.io/AnyFlow/demo

English

4

33

176

22.7K

YUCHAO GU retweetledi

Kevin Lin@KevinQHLin·14 May

🌟Introducing🎻Violin — an Open-source Video Translation Skill. 📹Video is the dominant medium on the internet, yet most high-quality content (lecture, talk, podcast) is locked behind a single language, leaving global audiences behind. So we built Violin: a video skill that combines speech recognition, LLM translation, and speech synthesis into one seamless pipeline. 🌐 Demo: violin-ai.com 📝 Blog: together.ai/blog/violin-op… 🔗 GitHub: github.com/shang-zhu/viol… ✨Key Features: 🎙️High-quality multilingual ASR & Translation & TTS. 🗣️Personalize translation & voice (turn an academic talk into something children can follow). 💬Chat with the video — ask any questions grounded in the video. 🧩Support Web app, CLI, and Agent skill 🍃Fully open-source under MIT. ❤️Built with the wonderful @ShangZhu18 and advised by @james_y_zou ! All features powered by @togethercompute . Try it and let us know what you think! 🎻

English

24

140

654

134.9K

YUCHAO GU retweetledi

AK@_akhaliq·14 May

AnyFlow Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

English

1

8

49

13.9K

YUCHAO GU@YuchaoGu·14 May

@vanBeethovenLu1 Thanks, Estel.

English

0

1

Estel Mars@vanBeethovenLu1·14 May

@YuchaoGu Wow, amazing work! Congrats!

English

1

0

28

YUCHAO GU@YuchaoGu·14 May

@Selen7005717917 Thanks, Yue.

English

0

102

Yue Su@Selen7005717917·14 May

@YuchaoGu Really excited to see such OPD paradigm on DMs.

English

1

0

172

YUCHAO GU@YuchaoGu·14 May

@LiangJeff95 Thanks Jeff.

English

0

164

Jeff Liang@LiangJeff95·14 May

@YuchaoGu great job

English

1

0

2

261

YUCHAO GU retweetledi

DailyPapers@HuggingPapers·13 May

NVIDIA just released AnyFlow on Hugging Face The first any-step video diffusion model that generates high-quality text-to-video with any inference budget - 4 steps or 50, quality scales smoothly without degradation.

English

4

58

415

41.3K

YUCHAO GU retweetledi

Kevin Lin@KevinQHLin·9 Eki

😫Struggle with preparing presentation video before Deadline (such as NeurIPS)? 🔥🔥Thrilled to share our latest work — Paper2Video — automatically generates presentation videos from papers!! 🚀🚀Just provide your paper➕a portrait photo➕a short audio sample — our Paper2Video will create a full presentation video for you. Try Paper2Video and let us know your thoughts!! 💻 GitHub: github.com/showlab/Paper2… 🌐 Website: showlab.github.io/Paper2Video/ 📜 arXiv: arxiv.org/abs/2510.05096 🤗 HF Dataset: huggingface.co/datasets/ZaynZ… 🤗Daily Paper: huggingface.co/papers/2510.05… 🎉Paper2Video is accpeted by SEA Workshop @ NeurIPS 2025 sea-workshop.github.io and will be presented this year Decemebr! 🙏Kudos to the amazing @zayn42682 and @MikeShou1 Our work is built on top of the multi-agent framework @CamelAIOrg by @guohao_li. Huge thanks @_akhaliq for sharing our work!

AK@_akhaliq

Paper2Video Automatic Video Generation from Scientific Papers

English

7

23

104

27.3K

YUCHAO GU retweetledi

Han Cai@hancai_hm·30 Eyl

Changing the autoencoder in latent diffusion models is easier than you think. 🚀 Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with lightweight post-training. Highlights: - High-resolution efficiency: DC-Gen-FLUX.1-Krea-12B matches FLUX.1-Krea-12B quality while achieving 53× faster inference on H100 at 4K. Paired with NVFP4, it generates a 4K image in just 3.5s on a single NVIDIA 5090 GPU (20 sampling steps). - Low training cost: Adapting FLUX.1-Krea-12B to deeply-compressed autoencoder takes only 40 H100 GPU days. 📄 Paper: arxiv.org/abs/2509.25180 💻 Code (under legal review): github.com/dc-ai-projects… 🎨 Pre-trained Models (under legal review): huggingface.co/collections/dc… Contributors: Wenkun He†, Yuchao Gu†, Junyu Chen†, Dongyun Zou, Yujun Lin, Zhekai Zhang, Haocheng Xi, Muyang Li, Ligeng Zhu, Jincheng Yu, Junsong Chen, Enze Xie, Song Han, Han Cai

English

5

37

217

17.7K

YUCHAO GU retweetledi

Han Cai@hancai_hm·30 Eyl

We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features: 🎬 Supports video generation up to 2160×3840 (4K) resolution on a single H100 GPU ⚡ Delivers 14.8× faster inference than the base model while achieving comparable or better visual quality 💰 230× lower training cost compared to training from scratch (only 10 H100 GPU days for Wan-2.1-14B) DC-VideoGen is built on two core innovations: - Deep Compression Video Autoencoder (DC-AE-V): a new family of deep compression autoencoders for video data, providing 32×/64× spatial and 4× temporal compression. - AE-Adapt-V: a robust adaptation strategy that enables rapid and stable transfer of pre-trained video diffusion models to DC-AE-V. 📄 Paper: arxiv.org/abs/2509.25182 🎬 Videos: hanlab.mit.edu/projects/dc-vi… 💻 Code (under legal review): github.com/dc-ai-projects… 🎨 Pre-trained Models (under legal review): huggingface.co/collections/dc… Contributors: Junyu Chen†, Wenkun He†, Yuchao Gu†, Yuyang Zhao, Jincheng Yu, Junsong Chen, Dongyun Zou, Yujun Lin, Zhekai Zhang, Muyang Li, Haocheng Xi, Ligeng Zhu, Enze Xie, Song Han, Han Cai

English

2

28

147

10.8K

YUCHAO GU retweetledi

Han Cai@hancai_hm·29 Ağu

🚀 Excited to announce DC-AE 1.5! With a spatial compression ratio boosted to f64, it accelerates high-res diffusion models while preserving text-to-image quality. Key innovation: channel-wise latent structure for faster convergence with many latent channels. 📍 Catch us at ICCV 2025, Hawai'i! 🌐 Website: hanlab.mit.edu/projects/dc-ae… 📄 Paper: arxiv.org/abs/2508.00413 💻 Code (coming soon): github.com/dc-ai-projects… 🧩 Pre-trained models (coming soon): huggingface.co/collections/dc… 🎨 Demo: dc-gen.hanlab.ai/dc_gen_sana_f6… Contributors: Junyu Chen, Dongyun Zou, Wenkun He, Junsong Chen, Enze Xie, Song Han, Han Cai

English

1

11

51

5.7K

YUCHAO GU@YuchaoGu·4 Haz

I will present our recent work on long-context video world modeling this Friday. Welcome to join and discuss.

TwelveLabs (twelvelabs.io)@twelve_labs

✅ @YuchaoGu will present FAR (i.e., Frame AutoRegressive Models) - a new baseline for autoregressive video generation that achieves state-of-the-art performance on both short- and long-context video modeling. x.com/YuchaoGu/statu…

English

1

6

804

YUCHAO GU@YuchaoGu·21 Nis

For short-video generation: FAR achieved faster convergence than Sora-like VideoDiT on the same latent space and reached state-of-the-art performance in both video generation and video prediction. (7/7)

English

0

154

YUCHAO GU@YuchaoGu·21 Nis

For long-video generation: FAR demonstrated near-perfect long-term memory for the first time in action-controlled long video prediction. (6/7)

English

1

0

172

YUCHAO GU@YuchaoGu·21 Nis

Our previous work, FAR, proposes a next-frame prediction paradigm based on long short-term context modeling with asymmetric patchification. Paper: arxiv.org/abs/2503.19325 Code: github.com/showlab/FAR Glad to see this idea adopted and extended in FramePack, demonstrating superior performance. Here, we share some insights from FAR: (1/7)

AK@_akhaliq

FramePack is out Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

English

1

4

16

2.6K

YUCHAO GU

Keşfet