YUCHAO GU

71 posts

YUCHAO GU

YUCHAO GU

@YuchaoGu

Research Scientist@Nvidia | PhD from NUS

Singapore Katılım Ağustos 2022
328 Takip Edilen310 Takipçiler
Sabitlenmiş Tweet
YUCHAO GU
YUCHAO GU@YuchaoGu·
🚀 We are excited to announce the release of AnyFlow, the first any-step video diffusion on-policy distillation (OPD) framework. By leveraging Flow Map distillation, AnyFlow significantly enhances model inference efficiency by reducing sample steps. (Code, models, and demos are now open-source!) Key Highlights: ⚡ Any-Step Generation: Unlike traditional distilled models tied to fixed step budgets, AnyFlow enables a single model to adapt to arbitrary inference budgets. It achieves high-quality few-step generation while providing stable improvements as more sampling steps are added. 🔀 Multiple Architectures: AnyFlow supports any-step distillation for both causal and bidirectional video diffusion models. 🎬 Multiple Tasks: AnyFlow supports Text-to-Video, Image-to-Video, and Video-to-Video generation within one causal video diffusion model. 📈 Scalable Performance: AnyFlow is validated from 1.3B up to 14B parameters. 📄 Paper: huggingface.co/papers/2605.13… 💻 Code: github.com/NVlabs/AnyFlow 🎨 Pre-trained Models: huggingface.co/collections/nv… 🎬 Demo: nvlabs.github.io/AnyFlow/demo
English
4
33
176
22.7K
EB1A Experts
EB1A Experts@eb1aexperts·
Very interesting direction. The any-step flexibility is a meaningful advance over fixed-step distilled models being able to adapt inference budget at runtime without retraining opens up a lot of practical deployment scenarios. Impressive to see this validated up to 14B parameters.
English
1
0
0
162
YUCHAO GU
YUCHAO GU@YuchaoGu·
🚀 We are excited to announce the release of AnyFlow, the first any-step video diffusion on-policy distillation (OPD) framework. By leveraging Flow Map distillation, AnyFlow significantly enhances model inference efficiency by reducing sample steps. (Code, models, and demos are now open-source!) Key Highlights: ⚡ Any-Step Generation: Unlike traditional distilled models tied to fixed step budgets, AnyFlow enables a single model to adapt to arbitrary inference budgets. It achieves high-quality few-step generation while providing stable improvements as more sampling steps are added. 🔀 Multiple Architectures: AnyFlow supports any-step distillation for both causal and bidirectional video diffusion models. 🎬 Multiple Tasks: AnyFlow supports Text-to-Video, Image-to-Video, and Video-to-Video generation within one causal video diffusion model. 📈 Scalable Performance: AnyFlow is validated from 1.3B up to 14B parameters. 📄 Paper: huggingface.co/papers/2605.13… 💻 Code: github.com/NVlabs/AnyFlow 🎨 Pre-trained Models: huggingface.co/collections/nv… 🎬 Demo: nvlabs.github.io/AnyFlow/demo
English
4
33
176
22.7K
YUCHAO GU retweetledi
Kevin Lin
Kevin Lin@KevinQHLin·
🌟Introducing🎻Violin — an Open-source Video Translation Skill. 📹Video is the dominant medium on the internet, yet most high-quality content (lecture, talk, podcast) is locked behind a single language, leaving global audiences behind. So we built Violin: a video skill that combines speech recognition, LLM translation, and speech synthesis into one seamless pipeline. 🌐 Demo: violin-ai.com 📝 Blog: together.ai/blog/violin-op… 🔗 GitHub: github.com/shang-zhu/viol… ✨Key Features: 🎙️High-quality multilingual ASR & Translation & TTS. 🗣️Personalize translation & voice (turn an academic talk into something children can follow). 💬Chat with the video — ask any questions grounded in the video. 🧩Support Web app, CLI, and Agent skill 🍃Fully open-source under MIT. ❤️Built with the wonderful @ShangZhu18 and advised by @james_y_zou ! All features powered by @togethercompute . Try it and let us know what you think! 🎻
English
24
140
654
134.9K
YUCHAO GU retweetledi
AK
AK@_akhaliq·
AnyFlow Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
English
1
8
49
13.9K
Yue Su
Yue Su@Selen7005717917·
@YuchaoGu Really excited to see such OPD paradigm on DMs.
English
1
0
0
172
YUCHAO GU retweetledi
DailyPapers
DailyPapers@HuggingPapers·
NVIDIA just released AnyFlow on Hugging Face The first any-step video diffusion model that generates high-quality text-to-video with any inference budget - 4 steps or 50, quality scales smoothly without degradation.
English
4
58
415
41.3K
YUCHAO GU retweetledi
Kevin Lin
Kevin Lin@KevinQHLin·
😫Struggle with preparing presentation video before Deadline (such as NeurIPS)? 🔥🔥Thrilled to share our latest work — Paper2Video — automatically generates presentation videos from papers!! 🚀🚀Just provide your paper➕a portrait photo➕a short audio sample — our Paper2Video will create a full presentation video for you. Try Paper2Video and let us know your thoughts!! 💻 GitHub: github.com/showlab/Paper2… 🌐 Website: showlab.github.io/Paper2Video/ 📜 arXiv: arxiv.org/abs/2510.05096 🤗 HF Dataset: huggingface.co/datasets/ZaynZ… 🤗Daily Paper: huggingface.co/papers/2510.05… 🎉Paper2Video is accpeted by SEA Workshop @ NeurIPS 2025 sea-workshop.github.io and will be presented this year Decemebr! 🙏Kudos to the amazing @zayn42682 and @MikeShou1 Our work is built on top of the multi-agent framework @CamelAIOrg by @guohao_li. Huge thanks @_akhaliq for sharing our work!
AK@_akhaliq

Paper2Video Automatic Video Generation from Scientific Papers

English
7
23
104
27.3K
YUCHAO GU retweetledi
Han Cai
Han Cai@hancai_hm·
Changing the autoencoder in latent diffusion models is easier than you think. 🚀 Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with lightweight post-training. Highlights: - High-resolution efficiency: DC-Gen-FLUX.1-Krea-12B matches FLUX.1-Krea-12B quality while achieving 53× faster inference on H100 at 4K. Paired with NVFP4, it generates a 4K image in just 3.5s on a single NVIDIA 5090 GPU (20 sampling steps). - Low training cost: Adapting FLUX.1-Krea-12B to deeply-compressed autoencoder takes only 40 H100 GPU days. 📄 Paper: arxiv.org/abs/2509.25180 💻 Code (under legal review): github.com/dc-ai-projects… 🎨 Pre-trained Models (under legal review): huggingface.co/collections/dc… Contributors: Wenkun He†, Yuchao Gu†, Junyu Chen†, Dongyun Zou, Yujun Lin, Zhekai Zhang, Haocheng Xi, Muyang Li, Ligeng Zhu, Jincheng Yu, Junsong Chen, Enze Xie, Song Han, Han Cai
Han Cai tweet mediaHan Cai tweet media
English
5
37
217
17.7K
YUCHAO GU retweetledi
Han Cai
Han Cai@hancai_hm·
We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features: 🎬 Supports video generation up to 2160×3840 (4K) resolution on a single H100 GPU ⚡ Delivers 14.8× faster inference than the base model while achieving comparable or better visual quality 💰 230× lower training cost compared to training from scratch (only 10 H100 GPU days for Wan-2.1-14B) DC-VideoGen is built on two core innovations: - Deep Compression Video Autoencoder (DC-AE-V): a new family of deep compression autoencoders for video data, providing 32×/64× spatial and 4× temporal compression. - AE-Adapt-V: a robust adaptation strategy that enables rapid and stable transfer of pre-trained video diffusion models to DC-AE-V. 📄 Paper: arxiv.org/abs/2509.25182 🎬 Videos: hanlab.mit.edu/projects/dc-vi… 💻 Code (under legal review): github.com/dc-ai-projects… 🎨 Pre-trained Models (under legal review): huggingface.co/collections/dc… Contributors: Junyu Chen†, Wenkun He†, Yuchao Gu†, Yuyang Zhao, Jincheng Yu, Junsong Chen, Dongyun Zou, Yujun Lin, Zhekai Zhang, Muyang Li, Haocheng Xi, Ligeng Zhu, Enze Xie, Song Han, Han Cai
Han Cai tweet media
English
2
28
147
10.8K
YUCHAO GU retweetledi
Han Cai
Han Cai@hancai_hm·
🚀 Excited to announce DC-AE 1.5! With a spatial compression ratio boosted to f64, it accelerates high-res diffusion models while preserving text-to-image quality. Key innovation: channel-wise latent structure for faster convergence with many latent channels. 📍 Catch us at ICCV 2025, Hawai'i! 🌐 Website: hanlab.mit.edu/projects/dc-ae… 📄 Paper: arxiv.org/abs/2508.00413 💻 Code (coming soon): github.com/dc-ai-projects… 🧩 Pre-trained models (coming soon): huggingface.co/collections/dc… 🎨 Demo: dc-gen.hanlab.ai/dc_gen_sana_f6… Contributors: Junyu Chen, Dongyun Zou, Wenkun He, Junsong Chen, Enze Xie, Song Han, Han Cai
Han Cai tweet mediaHan Cai tweet media
English
1
11
51
5.7K
YUCHAO GU
YUCHAO GU@YuchaoGu·
I will present our recent work on long-context video world modeling this Friday. Welcome to join and discuss.
TwelveLabs (twelvelabs.io)@twelve_labs

@YuchaoGu will present FAR (i.e., Frame AutoRegressive Models) - a new baseline for autoregressive video generation that achieves state-of-the-art performance on both short- and long-context video modeling. x.com/YuchaoGu/statu…

English
1
1
6
804
YUCHAO GU
YUCHAO GU@YuchaoGu·
For short-video generation: FAR achieved faster convergence than Sora-like VideoDiT on the same latent space and reached state-of-the-art performance in both video generation and video prediction. (7/7)
YUCHAO GU tweet mediaYUCHAO GU tweet mediaYUCHAO GU tweet media
English
0
0
0
154
YUCHAO GU
YUCHAO GU@YuchaoGu·
For long-video generation: FAR demonstrated near-perfect long-term memory for the first time in action-controlled long video prediction. (6/7)
YUCHAO GU tweet media
English
1
0
0
172