

Alan Melling
7.4K posts

@alanmelling
junction of Computer Vision and Graphics, Principal R&D Engineer at Carvana, Co-Creator at Nature Time










📷 Can AI understand camera motion like a cinematographer? Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video – films, games, drone shots, vlogs, etc. Links below! We contribute a taxonomy of motion primitives, co-designed over months with professional cinematographers, and apply rigorous quality control to label and caption all aspects of camera motion. CameraBench shows that even the best SfMs and VLMs struggle with real-world, dynamic videos. Yet, a generative VLM post-trained on our high-quality data matches SOTA SfM (MegaSAM) in geometric understanding and outperforms SOTA VLMs (Gemini-2.5 / GPT-4o) in semantic understanding, e.g., describing how the camera moves. 📄 Paper: huggingface.co/papers/2504.15… 🌐 Website: linzhiqiu.github.io/papers/camerab… Work led by CMU, MIT-IBM, UMass, Adobe, Harvard, Emerson with @censiyuan1, @chancharikm, @JayKarhade, @du_yilun, @gan_chuang, and @RamananDeva.




@teortaxesTex @JingyuanLiu123 I bet OpenAI/xAI is laughing so hard, this result is obvious tbh, they took a permanent architectural debuff in order to save on compute costs.






Hyperscape: The future of VR and the Metaverse Excited that Zuckerberg @finkd announced what I have been working on at Connect. Hyperscape enables people to create high fidelity replicas of physical spaces, and embody them in VR. Check out the demo app: meta.com/experiences/79…





