Zhiqiu Lin@ZhiqiuLin
📷 Can AI understand camera motion like a cinematographer?
Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video – films, games, drone shots, vlogs, etc. Links below!
We contribute a taxonomy of motion primitives, co-designed over months with professional cinematographers, and apply rigorous quality control to label and caption all aspects of camera motion.
CameraBench shows that even the best SfMs and VLMs struggle with real-world, dynamic videos. Yet, a generative VLM post-trained on our high-quality data matches SOTA SfM (MegaSAM) in geometric understanding and outperforms SOTA VLMs (Gemini-2.5 / GPT-4o) in semantic understanding, e.g., describing how the camera moves.
📄 Paper: huggingface.co/papers/2504.15…
🌐 Website: linzhiqiu.github.io/papers/camerab…
Work led by CMU, MIT-IBM, UMass, Adobe, Harvard, Emerson with @censiyuan1, @chancharikm, @JayKarhade, @du_yilun, @gan_chuang, and @RamananDeva.