Daniel DeTone

500 posts

Daniel DeTone banner
Daniel DeTone

Daniel DeTone

@ddetone

Deep Nets and Geometry — what could go wrong?

Long Beach, CA شامل ہوئے Haziran 2009
662 فالونگ2.1K فالوورز
Daniel DeTone
Daniel DeTone@ddetone·
@yesitsarmin yes, the main limitation is the 2D detector here, but there are tons of better models (SAM3, VLMs) if you have the compute. for very cluttered scenes it doesn't work as well
English
0
0
0
15
𓅋 𐎫𐎤𐎶𐏀 ‎ﷺ
@ddetone this is very cool, can it work for any arbitrary objects, like stuff on a table, and can it work with stereo camera?
English
0
0
0
110
Daniel DeTone
Daniel DeTone@ddetone·
Today we release Boxer, a new lightweight approach that lifts open-world 2D bounding boxes to *metric* 3D: facebookresearch.github.io/boxer/ Here we show Boxer in action on an egocentric sequence captured from smart glasses:
English
21
147
1.2K
56.8K
Daniel DeTone
Daniel DeTone@ddetone·
@ElioenaiSiqCst Yes, I didn't show any examples of that but we trained on a massive internal-only Quest3 dataset
English
0
0
0
22
Daniel DeTone
Daniel DeTone@ddetone·
@BlueAquilae great question! I would not expect it to work well here, we would need to re-train it with a full 9 DoF representation. but feel free to try it out anyway, I'd be curious
English
0
0
0
20
Robert Felker 💎 🇫🇷
@ddetone Would this work in space station since you specified it rely on 'gravity hints' or will the model get lost in orientation?
English
0
0
0
208
CleverBet
CleverBet@CleverBetTips·
@ddetone Your name is competing with one of the great albums of the last 30 years 😀
English
1
0
0
143
Daniel DeTone
Daniel DeTone@ddetone·
@haodongli00 One limitation I found using both of those models is the runtime. For detecting 1000+ text prompts with SAM3 it takes 20+ sec per image. SAM3D also takes ~15 sec per object, so running on large datasets can be expensive. OWLv2 runs at ~30ms and Boxer takes ~20ms
English
0
0
1
65
Haodong Li
Haodong Li@haodongli00·
@ddetone Great work! @ddetone Also doing something very similar, using SAM3, SAM3D and many other powerful tools! 😎
English
1
0
2
206
Daniel DeTone
Daniel DeTone@ddetone·
@nickkarpov Feel free to file a GitHub issue if you have any problems! Will do my best to answer them quickly
English
0
0
3
484
Daniel DeTone
Daniel DeTone@ddetone·
BoxerNet runs FAST 🔥🔥, taking roughly 20ms on a 4090 with bfloat16 for ALL prompts in an image (e.g. 30 boxes in parallel)
English
1
0
13
818
Daniel DeTone
Daniel DeTone@ddetone·
Giving Claude new skills is the closest thing I’ve felt to this Matrix moment
GIF
English
0
0
1
225
Daniel DeTone
Daniel DeTone@ddetone·
For those interested in 3D perception, check out the Sonata pre-trained backbone. I don’t think I’ll ever co-author another paper that triples performance (22% -> 72%) on a commonly used benchmark (linear probe scannet 3D sem seg). Released with a permissive license too
Xiaoyang Wu@XiaoyangWu_

📢Sonata: Self-Supervised Learning of Reliable Point Representations📢 Meet Sonata, our"3D-DINO" pre-trained with Point Transformer V3, accepted at #CVPR2025! 🌍: xywu.me/sonata 📦: github.com/facebookresear… 🚀: github.com/Pointcept/Poin… 🔹Semantic-aware and spatial reasoning representations learned with no label; 🔹3x linear probing accuracy (from 21.8% to 72.5%) on ScanNet; 🔹2x data efficiency performance with only 1% of the data compared to previous approaches; 🔹As always, establish new SOTA results across indoor and outdoor 3D perception tasks. Our author team: @HengshuangZhao, @jstraub6, @rapideRobot, @ddetone, @NinjaDuncan, @TianweiS, @Christopher_Xie, @NanYang719.

English
0
0
17
952
Daniel DeTone ری ٹویٹ کیا
Xiaoyang Wu
Xiaoyang Wu@XiaoyangWu_·
📢Sonata: Self-Supervised Learning of Reliable Point Representations📢 Meet Sonata, our"3D-DINO" pre-trained with Point Transformer V3, accepted at #CVPR2025! 🌍: xywu.me/sonata 📦: github.com/facebookresear… 🚀: github.com/Pointcept/Poin… 🔹Semantic-aware and spatial reasoning representations learned with no label; 🔹3x linear probing accuracy (from 21.8% to 72.5%) on ScanNet; 🔹2x data efficiency performance with only 1% of the data compared to previous approaches; 🔹As always, establish new SOTA results across indoor and outdoor 3D perception tasks. Our author team: @HengshuangZhao, @jstraub6, @rapideRobot, @ddetone, @NinjaDuncan, @TianweiS, @Christopher_Xie, @NanYang719.
Xiaoyang Wu tweet media
English
4
48
192
30.3K
Daniel DeTone ری ٹویٹ کیا
Boz
Boz@boztank·
The first generation of Aria glasses have made a big impact in the research community, can't wait to see all the new possibilities these will unlock meta.com/blog/project-a…
English
30
75
524
41.3K
Daniel DeTone
Daniel DeTone@ddetone·
What is the right benchmark for a 3D Egocentric Foundation model? We recently open-sourced a small, high quality egocentric benchmark consisting of 1) 3D surfaces 2) 3D objects. We released a simple 3D CNN baseline model called EVL: projectaria.com/research/efm3d/ Try to beat our model!
GIF
English
1
40
227
22.1K