

Marc Pollefeys
184 posts

@mapo1
Director of Science at @Microsoft @HoloLens, Professor of Computer Science at @ETH Zurich, working on #ComputerVision





After two fantastic years at @UCBerkeley I'm thrilled to share that I've joined @Microsoft in Zurich🇨🇭to pioneer the next generation of multimodal foundation models to drive agents 🤖 that can seamlessly interact across the digital and physical worlds 🌍 We are hiring! 🧵

After two fantastic years at @UCBerkeley I'm thrilled to share that I've joined @Microsoft in Zurich🇨🇭to pioneer the next generation of multimodal foundation models to drive agents 🤖 that can seamlessly interact across the digital and physical worlds 🌍 We are hiring! 🧵
















👏Big congratulations to @arkrause for being named @TheOfficialACM Fellow. The distinction recognises Krause's extensive research contributions to learning-based decision making under uncertainty. @ETH_en @ETH_AI_Center bit.ly/429CxBg











HoloAssist is a new multimodal dataset consisting of 166 hours of interactive task executions with 222 participants. Discover how it offers invaluable data to advance the capabilities of next-gen AI copilots for real-world tasks: msft.it/60139Uv4d


OpenMask3D: Open-Vocabulary 3D Instance Segmentation paper page: huggingface.co/papers/2306.13… We introduce the task of open-vocabulary 3D instance segmentation. Traditional approaches for 3D instance segmentation largely rely on existing 3D annotated datasets, which are restricted to a closed-set of object categories. This is an important limitation for real-life applications where one might need to perform tasks guided by novel, open-vocabulary queries related to objects from a wide variety. Recently, open-vocabulary 3D scene understanding methods have emerged to address this problem by learning queryable features per each point in the scene. While such a representation can be directly employed to perform semantic segmentation, existing methods have limitations in their ability to identify object instances. In this work, we address this limitation, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation. Guided by predicted class-agnostic 3D instance masks, our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings. We conduct experiments and ablation studies on the ScanNet200 dataset to evaluate the performance of OpenMask3D, and provide insights about the open-vocabulary 3D instance segmentation task. We show that our approach outperforms other open-vocabulary counterparts, particularly on the long-tail distribution. Furthermore, OpenMask3D goes beyond the limitations of close-vocabulary approaches, and enables the segmentation of object instances based on free-form queries describing object properties such as semantics, geometry, affordances, and material properties.