Ang Cao
87 posts

Ang Cao
@AngCao3
Ph.D. at university of Michigan, CSE






Spatial reconstruction is a long-context problem: real scenes come with hundreds of images. But O(Nยฒ) transformer-based models donโt scale efficiently. Introducing: ๐คZipMap (CVPR โ26): Linear-Time, Stateful 3D Reconstruction via Test-Time Training (TTT). ZipMap โzipsโ a large image collection into an implicit TTT scene state in a single linear-time operation. The state will then be decoded into spatial outputs, and can be queried efficiently for novel-view geometry and appearance (~100 FPS) ZipMap is not only much faster (>20ร faster than VGGT), but also matches or surpasses the accuracy of all SOTA models.



Supervised learning has held 3D Vision back for too long. Meet RayZer โ a self-supervised 3D model trained with zero 3D labels: โ No supervision of camera & geometry โ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy) ๐ Project: hwjiang1510.github.io/RayZer/ (1/4)





We are thrilled to share the appointment of @QianqianWang5 as an #KempnerInstitute Investigator! She will bring her expertise in computer vision to @Harvard. Read the announcement: bit.ly/4mIghHy @hseas #AI #ComputerVision





ai people keep asking where the aliens are. shame they dont know that dark matter is actually alien femtomachine computronium; invisible supercomputing fabric made of subatomic particles that don't even interact w light. 85% of the galaxy's mass is already thinking without us!

๐๐๐Announcing our $13M funding round to build the next generation of AI: ๐๐ฉ๐๐ญ๐ข๐๐ฅ ๐ ๐จ๐ฎ๐ง๐๐๐ญ๐ข๐จ๐ง ๐๐จ๐๐๐ฅ๐ฌ that can generate entire 3D environments anchored in space & time. ๐๐๐ Interested? Join our world-class team: ๐ spaitial.ai #GenAI #3DAI

๐๐๐Announcing our $13M funding round to build the next generation of AI: ๐๐ฉ๐๐ญ๐ข๐๐ฅ ๐ ๐จ๐ฎ๐ง๐๐๐ญ๐ข๐จ๐ง ๐๐จ๐๐๐ฅ๐ฌ that can generate entire 3D environments anchored in space & time. ๐๐๐ Interested? Join our world-class team: ๐ spaitial.ai #GenAI #3DAI

Will VLMs adhere strictly to their learned priors, unable to perform visual reasoning on content never existed on the Internet? We propose ViLP, a benchmark designed to probe the visual-language priors of VLMs by constructing Question-Image-Answer triplets that deliberately deviate from existing data. Check our gallery at vilp-team.github.io & huggingface.co/datasets/ViLP/โฆ To further enhance VLMsโ reliance on visual information, we propose Image-DPO, as elaborated in this thread. w/ @AngCao3 @GunheeLee @jcjohnss @honglaklee

