
YO, I adapted VACE to work with real-time autoregressive video generation. Here's what it can do right now in real time: - Depth, pose, optical flow, scribble, edge maps — all the v2v control stuff - First frame animation / last frame lead-in / keyframe interpolation - Inpainting with static or dynamic masks - Stacking stuff together (e.g. depth + LoRA, inpainting + reference images) - Reference-to-video is in there too but quality isn't great yet compared to batch Getting ~20 fps for most control modes on a 5090 at 368x640 with the 1.3B models. Image-to-video hits ~28 fps. Works with 14b models as well, but doesnt fit on 5090 with VACE. This is all part of Daydream Scope, an open source tool for running real-time interactive video generation pipelines. The demos were created in/with Scope, and is a combination of Longlive, VACE, and Custom LoRA. There's also a very early WIP ComfyUI node pack wrapping Scope. But how is a real-time, autoregressive model relevant to @ComfyUI ? Ultra long video generation. You can use these models distilled from Wan to do V2V tasks on thousands of frames at once, technically infinite length. I havent experimented much more than validating the concept on a couple thousand frames gen. It works! Full technical details on real-time VACE + more examples here (link in comments) Curious what people think. Happy to answer questions. Video + Custom LoRA links also in comments. Love, Ryan p.s. I will be back with a sick update on ACEStep implementation tomorrow (links in first comment)










