Ligong Han retweetou

World models have made impressive progress in video generation, yet they still struggle with a fundamental challenge: memory. In long rollouts, the camera trajectory gradually drifts from the user-specified motion and revisited scenes no longer align with earlier observations. These errors accumulate over time, causing the generated world to steadily lose coherence.
🚀Excited to share our solution MosaicMem 🌍🧠 — our new hybrid spatial memory for video world models.
Project Page: mosaicmem.github.io/mosaicmem/
Paper: huggingface.co/papers/2603.17…
English


























