

Paul-Edouard Sarlin
409 posts

@pesarlin
Researcher at @Google, 3D computer vision & machine learning. Previously PhD at ETH Zurich, intern at @Google, @Meta, @Microsoft, @magicleap.



We are publishing our first deep dive on what we believe is one of the most challenging layers in egocentric data - SLAM and VIO in the context of long-horizon state tracking. We break down how SLAM and VIO fail in egocentric settings - visual features vanish at close range, depth sensors saturate, fast head motion blurs frames, and these failures don't always occur in isolation. They hit at the exact same moment, leading to compounding errors and making the downstream data unusable. We believe the foundation for high-quality egocentric data demands sub-centimeter precision over long episodes ranging from a few minutes to up to an hour.


Today we release Boxer, a new lightweight approach that lifts open-world 2D bounding boxes to *metric* 3D: facebookresearch.github.io/boxer/ Here we show Boxer in action on an egocentric sequence captured from smart glasses:






github.com/zju3dv/Efficie… This is the depth of conversations between @pesarlin and Yifan Wang, one would dream to see in peer review. I'd dare to say, that is exactly peer review we want to have.













Industry SLAM systems are far ahead of academic open source systems.



@liu_shikun Or we could try to combine learning and geometry.