Junsong_Chen

54 posts

Junsong_Chen banner
Junsong_Chen

Junsong_Chen

@lawrence_cjs

HKU Ph.D, NVIDIA Research Internship

Hong Kong Katılım Şubat 2022
32 Takip Edilen242 Takipçiler
Junsong_Chen
Junsong_Chen@lawrence_cjs·
nvlabs.github.io/Sana/docs/ 1’New online docs for better understanding. 2’4-step + TinyVAE fast video generation demo for experience. sana-video.hanlab.ai 3’LongSANA: DMD+LongLive training code is out for everyone to reference.
Enze Xie@xieenze_jr

Exciting updates for SANA Video! 🚀 ​1️⃣ 4-Step Video Generation: Our new demo is live! By combining DMD distillation + TVAE, we’ve achieved incredibly fast inference in just 4 steps. ⚡️ Try it here: 🔗 sana-video.hanlab.ai ​2️⃣ LongSANA is Open Source: We’ve officially released the training and inference code for LongSANA. Dive into the tech and build with us! 📦 Code & Docs: 🔗 nvlabs.github.io/Sana/docs/long…

English
0
0
2
220
Enze Xie
Enze Xie@xieenze_jr·
We (@lawrence_cjs, @yuyangzhao_ , @shanasaimoe) from the SANA team just posted a blog on the core of Linear Attention: how it achieves infinite context lengths with global awareness but constant memory usage! We explore state accumulation mechanics, the evolution from Softmax to Linear KV caches, applications in long video generation (like our SANA-Video), and advanced strategies for LLMs to overcome retrieval limits. Perfect for scalable AI! Read more: hanlab.mit.edu/blog/infinite-…
Enze Xie tweet media
Enze Xie@xieenze_jr

The training/ Inference code and checkpoints are released. Welcome to try! github.com/NVlabs/Sana

English
7
66
389
108.5K
Junsong_Chen
Junsong_Chen@lawrence_cjs·
How Linear Attention and Softmax Attention differ in compute and KV-Cache for LLMs and long-video generation. Let's start with this blog. hanlab.mit.edu/blog/infinite-…
Junsong_Chen tweet media
Enze Xie@xieenze_jr

We (@lawrence_cjs, @yuyangzhao_ , @shanasaimoe) from the SANA team just posted a blog on the core of Linear Attention: how it achieves infinite context lengths with global awareness but constant memory usage! We explore state accumulation mechanics, the evolution from Softmax to Linear KV caches, applications in long video generation (like our SANA-Video), and advanced strategies for LLMs to overcome retrieval limits. Perfect for scalable AI! Read more: hanlab.mit.edu/blog/infinite-…

English
0
1
2
175
GMI Cloud
GMI Cloud@gmi_cloud·
At @ICCVConference , we sat down with @xieenze_jr Enze Xie, Nvidia researcher, after his talk “Efficient Image & Video Generation with Diffusion Models and Acceleration.” He breaks down SANA-Sprint — a one-step diffusion model pushing the limits of image gen speed ⚡ Full convo + insights 👇 #ICCV2025 #AI #DiffusionModels #SANA
English
2
5
24
3.7K
Junsong_Chen retweetledi
Junsong_Chen retweetledi
Han Cai
Han Cai@hancai_hm·
Changing the autoencoder in latent diffusion models is easier than you think. 🚀 Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with lightweight post-training. Highlights: - High-resolution efficiency: DC-Gen-FLUX.1-Krea-12B matches FLUX.1-Krea-12B quality while achieving 53× faster inference on H100 at 4K. Paired with NVFP4, it generates a 4K image in just 3.5s on a single NVIDIA 5090 GPU (20 sampling steps). - Low training cost: Adapting FLUX.1-Krea-12B to deeply-compressed autoencoder takes only 40 H100 GPU days. 📄 Paper: arxiv.org/abs/2509.25180 💻 Code (under legal review): github.com/dc-ai-projects… 🎨 Pre-trained Models (under legal review): huggingface.co/collections/dc… Contributors: Wenkun He†, Yuchao Gu†, Junyu Chen†, Dongyun Zou, Yujun Lin, Zhekai Zhang, Haocheng Xi, Muyang Li, Ligeng Zhu, Jincheng Yu, Junsong Chen, Enze Xie, Song Han, Han Cai
Han Cai tweet mediaHan Cai tweet media
English
5
38
219
17.6K
Junsong_Chen retweetledi
Han Cai
Han Cai@hancai_hm·
We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features: 🎬 Supports video generation up to 2160×3840 (4K) resolution on a single H100 GPU ⚡ Delivers 14.8× faster inference than the base model while achieving comparable or better visual quality 💰 230× lower training cost compared to training from scratch (only 10 H100 GPU days for Wan-2.1-14B) DC-VideoGen is built on two core innovations: - Deep Compression Video Autoencoder (DC-AE-V): a new family of deep compression autoencoders for video data, providing 32×/64× spatial and 4× temporal compression. - AE-Adapt-V: a robust adaptation strategy that enables rapid and stable transfer of pre-trained video diffusion models to DC-AE-V. 📄 Paper: arxiv.org/abs/2509.25182 🎬 Videos: hanlab.mit.edu/projects/dc-vi… 💻 Code (under legal review): github.com/dc-ai-projects… 🎨 Pre-trained Models (under legal review): huggingface.co/collections/dc… Contributors: Junyu Chen†, Wenkun He†, Yuchao Gu†, Yuyang Zhao, Jincheng Yu, Junsong Chen, Dongyun Zou, Yujun Lin, Zhekai Zhang, Muyang Li, Haocheng Xi, Ligeng Zhu, Enze Xie, Song Han, Han Cai
Han Cai tweet media
English
2
28
146
10.7K
Junsong_Chen retweetledi
Enze Xie
Enze Xie@xieenze_jr·
🚀 SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos 💥 Key Features 🌟 🧠 Linear DiT everywhere → O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache → store cumulative states only (no growing KV) 🔄 🎯 Temporal Mix-FFN + 3D RoPE → local fidelity + temporal coherence 🧱 AR block training + Self-Forcing-style long rollout → minute-length generation ⏱️ Numbers 📊 ⚡️ 36s for 5s 720p on H100 → 4× faster vs. vanilla attention at 720p 🖥️ RTX 5090 + NVFP4: 2.4× latency speedup 📦 Fixed VRAM vs. sequence length; 🤝 strong text–video alignment Links 🌐 🎬 Project page: nvlabs.github.io/Sana/Video 📄 Paper: arxiv.org/abs/2509.24695
English
3
21
129
46.7K
Junsong_Chen
Junsong_Chen@lawrence_cjs·
Finally, 36s for 5s 720p on H100; 4× speedup vs vanilla attention at 720p 29s on RTX 5090 with NVFP4 (2.4x faster) Fixed VRAM vs sequence length; strong text–video alignment
English
0
0
0
56
Junsong_Chen
Junsong_Chen@lawrence_cjs·
3. Temporal Mix-FFN+3D RoPE → local fidelity + temporal coherence 🎯 4. AR block training with Self rollout → minute-length generation 📊
Junsong_Chen tweet media
English
1
0
0
31
Junsong_Chen retweetledi
Song Han
Song Han@songhan_mit·
Explore Deep Compression Autoencoder (DC-AE) 1.5 with higher token compression ratio (64x) for faster visual generation:
Han Cai@hancai_hm

🚀 Excited to announce DC-AE 1.5! With a spatial compression ratio boosted to f64, it accelerates high-res diffusion models while preserving text-to-image quality. Key innovation: channel-wise latent structure for faster convergence with many latent channels. 📍 Catch us at ICCV 2025, Hawai'i! 🌐 Website: hanlab.mit.edu/projects/dc-ae… 📄 Paper: arxiv.org/abs/2508.00413 💻 Code (coming soon): github.com/dc-ai-projects… 🧩 Pre-trained models (coming soon): huggingface.co/collections/dc… 🎨 Demo: dc-gen.hanlab.ai/dc_gen_sana_f6… Contributors: Junyu Chen, Dongyun Zou, Wenkun He, Junsong Chen, Enze Xie, Song Han, Han Cai

English
1
2
23
3.4K
Sayak Paul
Sayak Paul@RisingSayak·
The best few-step sampling model across the speed-memory frontier? 😱 Introducing SANA-Sprint in collaboration with the great SANA team! Beyond the results, perhaps more importantly, the work is about the recipe of SANA-Sprint. Code & model will be open ❤️ Let's go ⬇️
Sayak Paul tweet media
English
12
25
158
31.4K