Lu Jiang

1

67

OverPowered@OverPowere13959·13 Haz

@CeyuanY With these many video models from bytedance, are any of them going to be open sourced?

English

0

2

54

Ceyuan Yang@CeyuanY·12 Haz

Humans interact with this world in real-time. Our latest APT2 makes this happen in video foundation model. Now you can explore the generative world by controlling 6DoF camera poses with negligible latency. Check out more cool stuff at seaweed-apt.com/2

Peter Lin@peter9863

Introducing Seaweed APT2, a real-time, interactive, streaming video generation model. seaweed-apt.com/2 Adversarial training for autoregressive modeling! Streaming 1 minute videos, 1 diffusion step, 24fps real-time on 1xh100, with interactive controls!

English

5

2

28

2.8K

Lu Jiang@roadjiang·12 Haz

@katedeyneka Thank you for attending. I am glad that you like it.

English

1

53

Kate Deyneka@katedeyneka·11 Haz

Just attended a great talk by @roadjiang at #CVPR2025 on Cost-Effective Training of Video Generation Foundation Model. I liked it because it summarized really well the core techniques for optimal and high quality video generation. Here’s a quick breakdown of key insights from the presentation👇🏻 📄 Paper: seaweed.video

English

0

6

5.8K

Lu Jiang retweetledi

Ceyuan Yang@CeyuanY·14 Nis

Glad to share Seaweed-7B, a cost-effective foundation model for video generation. Our tech report highlights the key designs that significantly improve compute efficiency and performance given limited resources, achieving comparable quality against other industry-level models. To unleash the power of the foundation model, Seaweed-7B further enables a wide range of downstream applications including image-to-video generation, human video generation, subject-consistent video generation, video-audio joint generation, long video generation and storytelling, real-time generation, super-resolution generation, camera controlled generation. Check out our webpage and report for more details: Webpage: seaweed.video Paper: seaweed.video/seaweed.pdf It's a wonderful journey of the last year. Thanks to all teammates for their contributions, sincerely.

English

34

95

516

77.2K

Lu Jiang@roadjiang·31 Mar

@rtk254 Ronen, interesting discussion! We recently have a work showing that training on synthetically generated CGI videos can indeed help models learn to generate videos that better respect physical constraints: kevinz8866.github.io/simulation/ @ronen

English

2

233

Ronen Tamari@rtk254·20 Oca

Video models != world models "We find that across a range of current models (Sora, Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical understanding is severely limited, and unrelated to visual realism"

English

18

124

887

175.8K

Lu Jiang@roadjiang·31 Mar

@_akhaliq Thanks for posting the video from our work. More information can be found at: huggingface.co/papers/2503.20…

English

2

10

8.5K

Lu Jiang retweetledi

AK@_akhaliq·28 Mar

Synthetic Video Enhances Physical Fidelity in Video Synthesis A turtle swimming in a green background. + video matting illustration

English

4

14

100

16.6K

Lu Jiang@roadjiang·31 Mar

@dreamingtulpa Thanks for reporting our work and discussion. Like mentioned in the paper's abstract: while the model still lacks a deep understanding of physics, it offers one of the first empirical demonstrations that synthetic video enhances physical fidelity in video synthesis.

English

57

Dreaming Tulpa 🥓👑@dreamingtulpa·30 Mar

better real-world physics are coming to video models thanks to synthetic video data

English

19

17

139

16.1K

Lu Jiang retweetledi

Ceyuan Yang@CeyuanY·14 Mar

We propose Long Context Tuning (LCT) for scene-level video generation to bridge the gap between current single-shot generation and real-world narrative video productions. Homepage: guoyww.github.io/projects/long-… Report: arxiv.org/abs/2503.10589

English

4

23

104

46.8K

Lu Jiang retweetledi

Junfei Xiao@never1andd·13 Oca

Want the deep dive? • arXiv: arxiv.org/abs/2501.06173 • Project Page: videoauteur.github.io See how VideoAuteur + CookGen are shaping long narrative video generation. Big shout out to my co-authors and advisors: @fncheng2333 @liangkegui @YuilleAlan @roadjiang

English

1

2

484

Lu Jiang retweetledi

AK@_akhaliq·15 Oca

Seaweed APT Diffusion Adversarial Post-Training for One-Step Video Generation Existing diffusion and autoregressive generative models require repeated neural network evaluations. It is extremely slow for the high-resolution video generation task, as a few-second video can take many minutes to generate. Our work is the first to demonstrate the generation of an entire video using a single neural function evaluation (1NFE) by using our proposed adversarial post-training technique. Our model generates 2 seconds of 1280x720 24fps videos in real-time. We showcase some of the results below:

English

9

34

202

21.2K

Lu Jiang@roadjiang·5 Ağu

@windx0303 @IJCAIconf promising field

English

39

Kenneth Huang@windx0303·5 Ağu

AI4Research workshop poster session at @IJCAIconf #IJCAI2024

Seogwipo-si, Republic of Korea 🇰🇷 English

1

8

852

Lu Jiang@roadjiang·23 Ara

Interesting comparison between our VideoPoet and other competitive models. The comparison is incredibly helpful and reinforces my belief that VideoPoet excels in generating larger motions. We know the exact reasons for this and are working on improving single frame quality.

Anu Aakash@anuaakash

Google VideoPoet, Runway, Pika & Genmo Google recently announced Video Poet. Google's VideoPoet is a large language model (LLM) that is capable of a wide variety of video generation tasks, including: - text-to-video - image-to-video - video stylization - video inpainting and outpainting - video-to-audio. I tried some of their text-to-image prompts (from their demo) in Pika, Runway and Genmo. Here are the results: 10 examples 1/10 Two teddy bears holding hands, walking down rainy 5th avenue.

English

6

1.3K

Lu Jiang@roadjiang·23 Ara

@anuaakash VideoPoet co-author here. Thanks a ton! Due to policy constraints, we weren't able to perform such comparisons. Your analysis is incredibly helpful and reinforces my belief that VideoPoet excels in creating larger motions. Its per frame quality can be further improved.

English

0

1

36

Anu Aakash@anuaakash·22 Ara

Thoughts: - The results of Runway, Pika and Genmo are comparable in terms of the quality of the results, however sometimes it took more attempts for a tool to arrive at the results. - The results of Google VideoPoet are from their demo not created by me. twitter.com/anukaakash/sta…

Anu Aakash@anuaakash

Google VideoPoet, Runway, Pika & Genmo Google recently announced Video Poet. Google's VideoPoet is a large language model (LLM) that is capable of a wide variety of video generation tasks, including: - text-to-video - image-to-video - video stylization - video inpainting and outpainting - video-to-audio. I tried some of their text-to-image prompts (from their demo) in Pika, Runway and Genmo. Here are the results: 10 examples 1/10 Two teddy bears holding hands, walking down rainy 5th avenue.

English

0

4

1.7K

Anu Aakash@anuaakash·22 Ara

Google VideoPoet, Runway, Pika & Genmo Google recently announced Video Poet. Google's VideoPoet is a large language model (LLM) that is capable of a wide variety of video generation tasks, including: - text-to-video - image-to-video - video stylization - video inpainting and outpainting - video-to-audio. I tried some of their text-to-image prompts (from their demo) in Pika, Runway and Genmo. Here are the results: 10 examples 1/10 Two teddy bears holding hands, walking down rainy 5th avenue.

English

11

109

400

50K

Lu Jiang@roadjiang·11 Ara

Excited to be at #NeurIPS2023 this week! Can't wait to reconnect with colleagues and make new connections. If you're up for a coffee chat, feel free to reach out. Find me at our spotlight/posters. arxiv.org/abs/2306.17842 Tue 12 5:15 p.m. arxiv.org/abs/2306.00983 Wed 13 10:45 a.m

English

#plagiarism #AcademicTwitter #aaai

1

5

903

Lu Jiang retweetledi

Agrim Gupta@agrimgupta92·11 Ara

We introduce W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space. 🧵👇

English

49

248

1.2K

431K

Lu Jiang@roadjiang·20 Kas

QHT

1

436

Lu Jiang@roadjiang·20 Kas

😲While preparing the meta-review for #aaai24, I stumbled upon a new form of parallelism. It wasn't about the paper's concepts, but rather in the review comments, where two reviewers listed identical comments, word for word, over 200 matching words. #PeerReview #AIResearch

GIF

English

0

4

852

Lu Jiang@roadjiang·2 Ağu

📢 Call for Papers! International Journal of Computer Vision (IJCV) invites submissions for its special issue on "Generative Models for Content Creation and Manipulation." 🗓️ Manuscript Submission Deadline: February 28, 2024 🔗 Check it out here: springer.com/journal/11263/…

English

1

4

534

Lu Jiang@roadjiang·8 Tem

@k_saifullaah It seems relevant and a common problem we can try to reduce the human-supervision. Thanks for sharing!

English

1

104

khalid@k_saifullaah·8 Tem

@roadjiang Our paper "Seeing in Words" might be of interest, where we use LLM (lang. bottleneck) to serve as a universal interface for image classification. It's truly exciting to see that this kind of approach also demonstrates effectiveness in image reconstruction arxiv.org/abs/2307.00028

English

6

1K

Lu Jiang@roadjiang·5 Tem

Fascinating research by Google reveals the power of Language Models (LLMs) like PaLM or GPT in tackling visual tasks using in-context learning. This novel method enables LLMs to perform image generation tasks without requiring any parameter updates. #palm #GPT4 #LLMs

English