
Daniel DeTone
513 posts

Daniel DeTone
@ddetone
Deep Nets and Geometry — what could go wrong?






How do you decompose a 2D image into accurate 3D object detections? You use🥊Boxer. A new model from Reality Labs Research enables robust 3D object detection by "lifting" 2D proposals from off-the-shelf detectors like OWL-ViT and SAM into metric 3D space. No more "flat" AI—this is about spatial intelligence for the next generation of wearables. Blog🔗 projectaria.com/news/introduci… Website with links to download: facebookresearch.github.io/boxer/ 👉@ddetone

Glad to see followups to neural-os.com, but disappointed that neither the blog (with 34 refs) nor the code repo acknowledged NeuralOS, even tho the released data code appears to build directly on top of ours. That omission is hard to understand given our shared vision.






Today we release Boxer, a new lightweight approach that lifts open-world 2D bounding boxes to *metric* 3D: facebookresearch.github.io/boxer/ Here we show Boxer in action on an egocentric sequence captured from smart glasses:

I implemented it and the ~8 degree gravity correction from GeoCalib made a real difference. Look at the monitor - on the left (pose heuristic) the box is tilted and doesn't match the screen edges, on the right (GeoCalib) it wraps the monitor much more tightly. The shelf boxes at the top are also cleaner, less overshoot. Yeah, the improvement is clear.

Today we release Boxer, a new lightweight approach that lifts open-world 2D bounding boxes to *metric* 3D: facebookresearch.github.io/boxer/ Here we show Boxer in action on an egocentric sequence captured from smart glasses:












