Junsong_Chen (@lawrence_cjs) - Twitter Profili | Zamantika Mersobahis Locabet

Junsong_Chen@lawrence_cjs·22 Ara

nvlabs.github.io/Sana/docs/ 1’New online docs for better understanding. 2’4-step + TinyVAE fast video generation demo for experience. sana-video.hanlab.ai 3’LongSANA: DMD+LongLive training code is out for everyone to reference.

Enze Xie@xieenze_jr

Exciting updates for SANA Video! 🚀 1️⃣ 4-Step Video Generation: Our new demo is live! By combining DMD distillation + TVAE, we’ve achieved incredibly fast inference in just 4 steps. ⚡️ Try it here: 🔗 sana-video.hanlab.ai 2️⃣ LongSANA is Open Source: We’ve officially released the training and inference code for LongSANA. Dive into the tech and build with us! 📦 Code & Docs: 🔗 nvlabs.github.io/Sana/docs/long…

English

0

2

220

Junsong_Chen@lawrence_cjs·27 Kas

@xieenze_jr Congrats! 🎉

English

0

10

Enze Xie@xieenze_jr·27 Kas

Nice work, congratulations 🎉

LMSYS Org@lmsysorg

The wait is over! 🎉 Thrilled to share that the SGLang team has shipped the Diffusion Language Model (dLLM) framework [PR: 12766], in collaboration with Ant Group, NVLabs/Fast-dLLM, and UIUC. It includes Block diffusion, robust KV Cache management, and TP/EP parallelism. We've also integrated Ant Group's brand new models: LLaDA2.0-mini and LLaDA2.0-flash. We're already moving fast: CUDA graph and dynamic batching are coming in the next two weeks. Dive in, try it out, and watch for our momentum! #SGLang #dLLM #AI #MachineLearning #OpenSource #AntGroup

English

1

0

9

1.9K

Enze Xie@xieenze_jr·26 Kas

We (@lawrence_cjs, @yuyangzhao_ , @shanasaimoe) from the SANA team just posted a blog on the core of Linear Attention: how it achieves infinite context lengths with global awareness but constant memory usage! We explore state accumulation mechanics, the evolution from Softmax to Linear KV caches, applications in long video generation (like our SANA-Video), and advanced strategies for LLMs to overcome retrieval limits. Perfect for scalable AI! Read more: hanlab.mit.edu/blog/infinite-…

Enze Xie@xieenze_jr

The training/ Inference code and checkpoints are released. Welcome to try! github.com/NVlabs/Sana

English

7

66

389

108.5K

Junsong_Chen@lawrence_cjs·26 Kas

@xieenze_jr @yuyangzhao_ @shanasaimoe 😘

QME

0

1.7K

Junsong_Chen@lawrence_cjs·26 Kas

How Linear Attention and Softmax Attention differ in compute and KV-Cache for LLMs and long-video generation. Let's start with this blog. hanlab.mit.edu/blog/infinite-…

Enze Xie@xieenze_jr

We (@lawrence_cjs, @yuyangzhao_ , @shanasaimoe) from the SANA team just posted a blog on the core of Linear Attention: how it achieves infinite context lengths with global awareness but constant memory usage! We explore state accumulation mechanics, the evolution from Softmax to Linear KV caches, applications in long video generation (like our SANA-Video), and advanced strategies for LLMs to overcome retrieval limits. Perfect for scalable AI! Read more: hanlab.mit.edu/blog/infinite-…

English

0

1

2

175

Junsong_Chen@lawrence_cjs·22 Eki

@gmi_cloud @ICCVConference @xieenze_jr Cool!!!

English

1

0

1

93

GMI Cloud@gmi_cloud·22 Eki

At @ICCVConference , we sat down with @xieenze_jr Enze Xie, Nvidia researcher, after his talk “Efficient Image & Video Generation with Diffusion Models and Acceleration.” He breaks down SANA-Sprint — a one-step diffusion model pushing the limits of image gen speed ⚡ Full convo + insights 👇 #ICCV2025 #AI #DiffusionModels #SANA

English

2

5

24

3.7K

Junsong_Chen retweetledi

Enze Xie@xieenze_jr·6 Eki

Sora 2 is amazing!, But AI video generation inference speed is too slow. Try our Deep Compression Autoencoder + Linear Attention! 🚀🔥 nvlabs.github.io/Sana/Video github.com/dc-ai-projects…

Enze Xie@xieenze_jr

🚀 SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos 💥 Key Features 🌟 🧠 Linear DiT everywhere → O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache → store cumulative states only (no growing KV) 🔄 🎯 Temporal Mix-FFN + 3D RoPE → local fidelity + temporal coherence 🧱 AR block training + Self-Forcing-style long rollout → minute-length generation ⏱️ Numbers 📊 ⚡️ 36s for 5s 720p on H100 → 4× faster vs. vanilla attention at 720p 🖥️ RTX 5090 + NVFP4: 2.4× latency speedup 📦 Fixed VRAM vs. sequence length; 🤝 strong text–video alignment Links 🌐 🎬 Project page: nvlabs.github.io/Sana/Video 📄 Paper: arxiv.org/abs/2509.24695

English

1

8

74

10.1K

Junsong_Chen@lawrence_cjs·3 Eki

Thanks so much @_akhaliq for sharing our recent work. Our homepage is here: nvlabs.github.io/Sana/Video/

AK@_akhaliq

SANA-Video Efficient Video Generation with Block Linear Diffusion Transformer

English

0

1

116

Junsong_Chen retweetledi

Han Cai@hancai_hm·30 Eyl

Changing the autoencoder in latent diffusion models is easier than you think. 🚀 Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with lightweight post-training. Highlights: - High-resolution efficiency: DC-Gen-FLUX.1-Krea-12B matches FLUX.1-Krea-12B quality while achieving 53× faster inference on H100 at 4K. Paired with NVFP4, it generates a 4K image in just 3.5s on a single NVIDIA 5090 GPU (20 sampling steps). - Low training cost: Adapting FLUX.1-Krea-12B to deeply-compressed autoencoder takes only 40 H100 GPU days. 📄 Paper: arxiv.org/abs/2509.25180 💻 Code (under legal review): github.com/dc-ai-projects… 🎨 Pre-trained Models (under legal review): huggingface.co/collections/dc… Contributors: Wenkun He†, Yuchao Gu†, Junyu Chen†, Dongyun Zou, Yujun Lin, Zhekai Zhang, Haocheng Xi, Muyang Li, Ligeng Zhu, Jincheng Yu, Junsong Chen, Enze Xie, Song Han, Han Cai

English

5

38

219

17.6K

Junsong_Chen retweetledi

Han Cai@hancai_hm·30 Eyl

We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features: 🎬 Supports video generation up to 2160×3840 (4K) resolution on a single H100 GPU ⚡ Delivers 14.8× faster inference than the base model while achieving comparable or better visual quality 💰 230× lower training cost compared to training from scratch (only 10 H100 GPU days for Wan-2.1-14B) DC-VideoGen is built on two core innovations: - Deep Compression Video Autoencoder (DC-AE-V): a new family of deep compression autoencoders for video data, providing 32×/64× spatial and 4× temporal compression. - AE-Adapt-V: a robust adaptation strategy that enables rapid and stable transfer of pre-trained video diffusion models to DC-AE-V. 📄 Paper: arxiv.org/abs/2509.25182 🎬 Videos: hanlab.mit.edu/projects/dc-vi… 💻 Code (under legal review): github.com/dc-ai-projects… 🎨 Pre-trained Models (under legal review): huggingface.co/collections/dc… Contributors: Junyu Chen†, Wenkun He†, Yuchao Gu†, Yuyang Zhao, Jincheng Yu, Junsong Chen, Dongyun Zou, Yujun Lin, Zhekai Zhang, Muyang Li, Haocheng Xi, Ligeng Zhu, Enze Xie, Song Han, Han Cai

English

2

28

146

10.7K

Junsong_Chen retweetledi

Enze Xie@xieenze_jr·30 Eyl

🚀 SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos 💥 Key Features 🌟 🧠 Linear DiT everywhere → O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache → store cumulative states only (no growing KV) 🔄 🎯 Temporal Mix-FFN + 3D RoPE → local fidelity + temporal coherence 🧱 AR block training + Self-Forcing-style long rollout → minute-length generation ⏱️ Numbers 📊 ⚡️ 36s for 5s 720p on H100 → 4× faster vs. vanilla attention at 720p 🖥️ RTX 5090 + NVFP4: 2.4× latency speedup 📦 Fixed VRAM vs. sequence length; 🤝 strong text–video alignment Links 🌐 🎬 Project page: nvlabs.github.io/Sana/Video 📄 Paper: arxiv.org/abs/2509.24695

English

3

21

129

46.7K

Junsong_Chen@lawrence_cjs·30 Eyl

@yuyangzhao_ ❤️

QME

0

10

Yuyang Zhao@yuyangzhao_·30 Eyl

Thrilled to announce SANA-Video, a project our great team has been working on for nearly half a year! I am so proud to have been a core contributor to this project.

Enze Xie@xieenze_jr

🚀 SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos 💥 Key Features 🌟 🧠 Linear DiT everywhere → O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache → store cumulative states only (no growing KV) 🔄 🎯 Temporal Mix-FFN + 3D RoPE → local fidelity + temporal coherence 🧱 AR block training + Self-Forcing-style long rollout → minute-length generation ⏱️ Numbers 📊 ⚡️ 36s for 5s 720p on H100 → 4× faster vs. vanilla attention at 720p 🖥️ RTX 5090 + NVFP4: 2.4× latency speedup 📦 Fixed VRAM vs. sequence length; 🤝 strong text–video alignment Links 🌐 🎬 Project page: nvlabs.github.io/Sana/Video 📄 Paper: arxiv.org/abs/2509.24695

English

1

0

2

197

Junsong_Chen@lawrence_cjs·30 Eyl

Finally, 36s for 5s 720p on H100; 4× speedup vs vanilla attention at 720p 29s on RTX 5090 with NVFP4 (2.4x faster) Fixed VRAM vs sequence length; strong text–video alignment

English

0

56

Junsong_Chen@lawrence_cjs·30 Eyl

3. Temporal Mix-FFN+3D RoPE → local fidelity + temporal coherence 🎯 4. AR block training with Self rollout → minute-length generation 📊

English

1

0

31

Junsong_Chen@lawrence_cjs·30 Eyl

🚀 SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos 💥 It's time for a new SANA family member! Links 🌐 📖 Paper: huggingface.co/papers/2509.24… 💻 Project Page: nvlabs.github.io/Sana/Video

English

1

3

122

Junsong_Chen@lawrence_cjs·29 Eyl

Explore recent work from our team. Long-Live generates minute-length videos and interacts as you want with real-time fast speed! Very cool project. 🎉

Yukang Chen@yukangchen_

🚀 We open-sourced LongLive — interactive, real-time long-video generation. 👥Generates video in real time as users enter text prompts. ⚡️20.7 FPS on a single H100,⏱️up to 240s per clip. 🎬Fine-tunes SOTA short-video models (e.g., Wan) into long-video generators. 🌍One step closer to World Models. All code for training & inference, model weights, demo page, and videos released! Paper: huggingface.co/papers/2509.22… Code: github.com/NVlabs/LongLive Model: huggingface.co/Efficient-Larg… Demo Page: nvlabs.github.io/LongLive Introduction Video: youtube.com/watch?v=CO1QC7…

English

0

1

41

Junsong_Chen retweetledi

Song Han@songhan_mit·30 Ağu

Explore Deep Compression Autoencoder (DC-AE) 1.5 with higher token compression ratio (64x) for faster visual generation:

Han Cai@hancai_hm

🚀 Excited to announce DC-AE 1.5! With a spatial compression ratio boosted to f64, it accelerates high-res diffusion models while preserving text-to-image quality. Key innovation: channel-wise latent structure for faster convergence with many latent channels. 📍 Catch us at ICCV 2025, Hawai'i! 🌐 Website: hanlab.mit.edu/projects/dc-ae… 📄 Paper: arxiv.org/abs/2508.00413 💻 Code (coming soon): github.com/dc-ai-projects… 🧩 Pre-trained models (coming soon): huggingface.co/collections/dc… 🎨 Demo: dc-gen.hanlab.ai/dc_gen_sana_f6… Contributors: Junyu Chen, Dongyun Zou, Wenkun He, Junsong Chen, Enze Xie, Song Han, Han Cai

English

1

2

23

3.4K

Junsong_Chen@lawrence_cjs·16 Mar

@RisingSayak @CorradoPapers @madebyollin We are working on a tiny fast version of DCAE☺️

English

0

5

55

Sayak Paul@RisingSayak·16 Mar

@CorradoPapers @lawrence_cjs I think the DCAE used in this work is good enough. Perhaps @madebyollin needs to do a TAE version of it. github.com/madebyollin/ta…

English

1

0

128

Sayak Paul@RisingSayak·14 Mar

The best few-step sampling model across the speed-memory frontier? 😱 Introducing SANA-Sprint in collaboration with the great SANA team! Beyond the results, perhaps more importantly, the work is about the recipe of SANA-Sprint. Code & model will be open ❤️ Let's go ⬇️

English

12

25

158

31.4K

Junsong_Chen

Keşfet