Willi Menapace

33 posts

Willi Menapace

Willi Menapace

@WilliMenapace

PhD Student - University of Trento, Italy

Trento, Trentino-South Tyrol Katılım Haziran 2021
86 Takip Edilen192 Takipçiler
Willi Menapace retweetledi
Alexander Pondaven
Alexander Pondaven@alexpondaven·
Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments. We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵
Alexander Pondaven tweet media
English
6
10
51
9.1K
Willi Menapace
Willi Menapace@WilliMenapace·
Bringing real-time egocentric video editing to #CVPR2026! 🚀 It was a pleasure to supervise this fantastic collaboration at @Snapchat . We've just open-sourced our video editing dataset—check out the amazing work below! 👇👏
Runjia Li@RunjiaLi

🎉EgoEdit @Snapchat has been accepted to CVPR 2026! 🏆👻 We are bringing high-quality, real-time editing to egocentric videos. Our massive 100k video dataset and benchmark are ALREADY PUBLIC! 🔓🚀 🏠 Project Page: snap-research.github.io/EgoEdit/ 🤗 Dataset: huggingface.co/datasets/ligua…

English
0
1
15
939
Willi Menapace
Willi Menapace@WilliMenapace·
Why allocate compute uniformly when not all pixels are equally hard? 🤔 Our new work, ELIT, solves DiT compute waste by focusing on hard regions. It also acts as a runtime knob, letting you easily dial your inference budget up or down. See you at CVPR 2026! 👇
Moayed Haji Ali@moayedhajiali

Not all pixels are equally hard, but DiTs still allocate compute uniformly across pixels, wasting efforts on easy regions. ELIT adds two lightweight cross-attention layers to focus compute where it matters, cutting FID by 53%. ELIT: snap-research.github.io/elit

English
0
0
5
469
Willi Menapace
Willi Menapace@WilliMenapace·
Why is progressive generation so complex? 🤔 It doesn't have to be. Our Decomposable Flow Matching (DFM) simplifies the process into a single, straightforward flow model, 🚀 beating prior work in image and video synthesis. #AI #Research #MachineLearning
Moayed Haji Ali@moayedhajiali

Where are good old progressive diffusion models? 🤔 Breaking generation to multiple resolution scales is a great idea, but complexity (multiple models, custom diffusion process, etc) stalled scaling. Our Decomposable Flow Matching packs multi-scale perks into one scalable model.

English
0
1
6
728
Willi Menapace retweetledi
Ashkan Mirzaei
Ashkan Mirzaei@ashmrz10·
[1/9] 🚀 We introduce 4Real-Video-V2, a method that can generate 4D scenes from a simple text prompt, viewable from any angle at any moment in time. It’s fast, photorealistic, and works on full scenes. Here's how it works and why it matters. 👇 snap-research.github.io/4Real-Video-V2/
English
2
25
90
10K
Willi Menapace retweetledi
Snap Inc.
Snap Inc.@Snap·
Heading to @CVPR 2025 in Nashville this week? So are we! We’re proud to have 12 papers accepted — including SnapGen and 4Real-Video, both highlighted among the top 3% of submissions. Come find us to learn more about the cutting edge work we’re doing in AI and computer vision. 📍 See you in Nashville! Learn more: newsroom.snap.com/snap-research-…
English
2
2
17
4.4K
Willi Menapace retweetledi
Ziyi Wu
Ziyi Wu@Dazitu_616·
📢 Introducing DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Compared to vanilla DPO, we improve paired data construction and preference label granularity, leading to better visual quality and motion strength with only 1/3 of the data. 🧵
English
2
35
179
35.2K
Willi Menapace retweetledi
Ivan Skorokhodov
Ivan Skorokhodov@isskoro·
In the past 1.5 weeks, there appeared 2 papers by 2 different research groups which develop the exactly same (and embarrassingly simple) trick to improve convergence of image/video diffusion models by 20-100+% (sic!) arxiv.org/abs/2502.14831 arxiv.org/abs/2502.09509
Ivan Skorokhodov tweet mediaIvan Skorokhodov tweet media
English
9
58
402
39.2K
Willi Menapace retweetledi
Kfir Aberman
Kfir Aberman@AbermanKfir·
We discovered that imposing a spatio-temporal weight space via LoRAs on DIT-based video models unlocks powerful customization! It captures dynamic concepts with precision and even enables composition of multiple videos together!🎥✨
English
15
85
605
59.5K
Willi Menapace retweetledi
Rameen Abdal
Rameen Abdal@AbdalRameen·
What if you could compose videos— merging multiple clips, even capturing complex athletic moves where video models struggle - all while preserving motion and context? And yes, you can still edit them with text after! Stay tuned for more results. #AI #VideoGeneration #SnapResearch
English
7
24
147
17.9K
Willi Menapace
Willi Menapace@WilliMenapace·
Check out Video Alchemist Our latest work enables Multi-subject open-set personalization with no need for inference-time tuning 👇👇👇
Tsai-Shien Chen@tsaishien_chen

Introducing ⚗️ Video Alchemist Our new video model supporting 👪 Multi-subject open-set personalization 🏞️ Foreground & background personalization 🚀 Without the need of inference-time tuning snap-research.github.io/open-set-video… [Results] 1. Sora girl rides a dinosaur on a savanna 🧵👇

English
0
0
7
345
Willi Menapace
Willi Menapace@WilliMenapace·
Video-to-Audio and Audio-to-Video models struggle with temporal alignment. AV-Link solves the problem by conditioning on diffusion model features Great collaboration with @moayedhajiali , @siarohin9013 , @isskoro , @alpercanbe , Kwot Sin Lee, Vicente Ordonez and @SergeyTulyakov
Moayed Haji Ali@moayedhajiali

Can pretrained diffusion models connect for cross-modal generation? 📢 Introducing AV-Link ♾ Bridging unimodal diffusion models in one framework to enable: 📽️ ➡️ 🔊 Video-to-Audio 🔊 ➡️ 📽️ Audio-to-Video 🌐: snap-research.github.io/AVLink/ 📄: hf.co/papers/2412.15… ⤵️ Results

English
0
3
10
843
Willi Menapace retweetledi
Ziyi Wu
Ziyi Wu@Dazitu_616·
MinT beats Sora in multi-event generation! One week after the release of MinT, Sora also released a *storyboard* tool that targets the same task (sequential events + time control). Below are a few comparisons, where MinT shows better event transition and timing: (1/N)
Ziyi Wu@Dazitu_616

📢MinT: Temporally-Controlled Multi-Event Video Generation📢 mint-video.github.io TL;DR: We identify a fundamental failure mode of existing video generators: they cannot produce videos with sequential events. MinT unlocks this capability with temporal grounding of events. 🧵

English
1
12
48
7.7K
Willi Menapace retweetledi
Ziyi Wu
Ziyi Wu@Dazitu_616·
📢MinT: Temporally-Controlled Multi-Event Video Generation📢 mint-video.github.io TL;DR: We identify a fundamental failure mode of existing video generators: they cannot produce videos with sequential events. MinT unlocks this capability with temporal grounding of events. 🧵
English
12
52
189
33K
Willi Menapace retweetledi
Andrea Tagliasacchi 🇨🇦
📢📢📢 𝐀𝐂𝟑𝐃: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers snap-research.github.io/ac3d TL;DR: for 3D camera control in generative video, it really helps knowing *which* part of your model you should mess with Internship by @sherwinbahmani at Snap
English
3
26
128
23K
Vlad Golyanik
Vlad Golyanik@VGolyanik·
Congratulations, Dr. @WilliMenapace, on today's successful thesis defence in the beautiful Trento! 🎾👏👏🙌 Co-supervising you was a great and enriching experience. Best of luck with your continuing scientific journey! ...in the photo with @eliricci_ and @lambertoballan
Vlad Golyanik tweet media
English
2
1
24
1.7K