Daniel DeTone (@ddetone) - Twitter پروفائل

@yesitsarmin yes, the main limitation is the 2D detector here, but there are tons of better models (SAM3, VLMs) if you have the compute. for very cluttered scenes it doesn't work as well

English

0

15

𓅋 𐎫𐎤𐎶𐏀 ‎ﷺ@yesitsarmin·1d

@ddetone this is very cool, can it work for any arbitrary objects, like stuff on a table, and can it work with stereo camera?

English

0

110

Daniel DeTone@ddetone·1d

Today we release Boxer, a new lightweight approach that lifts open-world 2D bounding boxes to *metric* 3D: facebookresearch.github.io/boxer/ Here we show Boxer in action on an egocentric sequence captured from smart glasses:

English

21

147

1.2K

56.8K

Daniel DeTone@ddetone·6h

@ElioenaiSiqCst Yes, I didn't show any examples of that but we trained on a massive internal-only Quest3 dataset

English

0

22

Elioenai Siqueira Costa@ElioenaiSiqCst·1d

@ddetone Works In meta quest?

English

0

192

Daniel DeTone@ddetone·6h

@BlueAquilae great question! I would not expect it to work well here, we would need to re-train it with a full 9 DoF representation. but feel free to try it out anyway, I'd be curious

English

0

20

Robert Felker 💎 🇫🇷@BlueAquilae·1d

@ddetone Would this work in space station since you specified it rely on 'gravity hints' or will the model get lost in orientation?

English

0

208

Daniel DeTone@ddetone·6h

@CleverBetTips The National?

English

0

1

21

CleverBet@CleverBetTips·1d

@ddetone Your name is competing with one of the great albums of the last 30 years 😀

English

1

0

143

Daniel DeTone@ddetone·6h

@haodongli00 One limitation I found using both of those models is the runtime. For detecting 1000+ text prompts with SAM3 it takes 20+ sec per image. SAM3D also takes ~15 sec per object, so running on large datasets can be expensive. OWLv2 runs at ~30ms and Boxer takes ~20ms

English

0

1

65

Haodong Li@haodongli00·1d

@ddetone Great work! @ddetone Also doing something very similar, using SAM3, SAM3D and many other powerful tools! 😎

English

1

0

2

206

Daniel DeTone@ddetone·1d

@nickkarpov Feel free to file a GitHub issue if you have any problems! Will do my best to answer them quickly

English

0

3

484

Nick Karpov@nickkarpov·1d

@ddetone Awesome work, going to use this

English

1

0

2

511

Daniel DeTone@ddetone·1d

For more details, check out the arxiv paper here: arxiv.org/abs/2604.05212

English

0

2

7

792

Daniel DeTone@ddetone·1d

BoxerNet runs FAST 🔥🔥, taking roughly 20ms on a 4090 with bfloat16 for ALL prompts in an image (e.g. 30 boxes in parallel)

English

1

0

13

818

Daniel DeTone@ddetone·28 Oca

Giving Claude new skills is the closest thing I’ve felt to this Matrix moment

GIF

English

0

1

225

Daniel DeTone@ddetone·19 Oca

Check out our recent work ShapeR, a generative model which gets high quality object 3D meshes from Aria glasses

Yawar Siddiqui@yawarnihal

Introducing ShapeR, a method for robust conditional 3D shape generation from casually captured sequences. ShapeR leverages a rectified flow transformer conditioned on per-object multimodal data to turn casual image sequences into full metric scene reconstructions. Project Page: facebookresearch.github.io/ShapeR Paper: arxiv.org/abs/2601.11514 Links to code and huggingface below ⬇️

English

1

0

18

2K

Daniel DeTone@ddetone·24 Mar

For those interested in 3D perception, check out the Sonata pre-trained backbone. I don’t think I’ll ever co-author another paper that triples performance (22% -> 72%) on a commonly used benchmark (linear probe scannet 3D sem seg). Released with a permissive license too

Xiaoyang Wu@XiaoyangWu_

📢Sonata: Self-Supervised Learning of Reliable Point Representations📢 Meet Sonata, our"3D-DINO" pre-trained with Point Transformer V3, accepted at #CVPR2025! 🌍: xywu.me/sonata 📦: github.com/facebookresear… 🚀: github.com/Pointcept/Poin… 🔹Semantic-aware and spatial reasoning representations learned with no label; 🔹3x linear probing accuracy (from 21.8% to 72.5%) on ScanNet; 🔹2x data efficiency performance with only 1% of the data compared to previous approaches; 🔹As always, establish new SOTA results across indoor and outdoor 3D perception tasks. Our author team: @HengshuangZhao, @jstraub6, @rapideRobot, @ddetone, @NinjaDuncan, @TianweiS, @Christopher_Xie, @NanYang719.

English

0

17

952

Daniel DeTone ری ٹویٹ کیا

Xiaoyang Wu@XiaoyangWu_·21 Mar

📢Sonata: Self-Supervised Learning of Reliable Point Representations📢 Meet Sonata, our"3D-DINO" pre-trained with Point Transformer V3, accepted at #CVPR2025! 🌍: xywu.me/sonata 📦: github.com/facebookresear… 🚀: github.com/Pointcept/Poin… 🔹Semantic-aware and spatial reasoning representations learned with no label; 🔹3x linear probing accuracy (from 21.8% to 72.5%) on ScanNet; 🔹2x data efficiency performance with only 1% of the data compared to previous approaches; 🔹As always, establish new SOTA results across indoor and outdoor 3D perception tasks. Our author team: @HengshuangZhao, @jstraub6, @rapideRobot, @ddetone, @NinjaDuncan, @TianweiS, @Christopher_Xie, @NanYang719.

English

4

48

192

30.3K

Daniel DeTone ری ٹویٹ کیا

Boz@boztank·27 Şub

The first generation of Aria glasses have made a big impact in the research community, can't wait to see all the new possibilities these will unlock meta.com/blog/project-a…

English

30

75

524

41.3K

Daniel DeTone@ddetone·3 Eki

@ZMurez @gazorp5 heres more details on the EVL model you were asking about

English

0

1

562

Daniel DeTone@ddetone·3 Eki

cc @ZMurez remember 3D surfaces? :)

English

1

0

1

594

Daniel DeTone@ddetone·3 Eki

What is the right benchmark for a 3D Egocentric Foundation model? We recently open-sourced a small, high quality egocentric benchmark consisting of 1) 3D surfaces 2) 3D objects. We released a simple 3D CNN baseline model called EVL: projectaria.com/research/efm3d/ Try to beat our model!

GIF

English

1

40

227

22.1K

Daniel DeTone

دریافت کریں