Michal Stary
10 posts


Introducing Generative View Stitching (GVS), a non-autoregressive sampling method for length extrapolation of video diffusion models. GVS enables collision-free camera-guided video generation for predefined trajectories, including Oscar Reutersvärd's Impossible Staircase (1/9).

Understanding Multi-View Transformers Michal Stary @jgaubil @_atewari @vincesitzmann tl;dr: DUSt3R self-attention is it secretly a diffusion model, and cross-attention is matching. arxiv.org/abs/2510.24907






GVS also stably scales to longer videos (1080 frames) given more test-time compute, establishing itself as a promising alternative to autoregression for long video generation. Note that this video is generated without any keyframe interpolation! (8/9)

DUSt3R et al. are impressive, but how do they actually work? We explored this, and share insights on iterative reconstruction, the roles of cross- and self-attention, and emerging correspondences across the network [1/8] ⬇️

Introducing Generative View Stitching (GVS), a non-autoregressive sampling method for length extrapolation of video diffusion models. GVS enables collision-free camera-guided video generation for predefined trajectories, including Oscar Reutersvärd's Impossible Staircase (1/9).

