Haotong Lin (@HaotongLin) - Twitter Profili | Zamantika Mersobahis Locabet

Haotong Lin retweetledi

Haian Jin@Haian_Jin·6 Mar

Spatial reconstruction is a long-context problem: real scenes come with hundreds of images. But O(N²) transformer-based models don’t scale efficiently. Introducing: 🤐ZipMap (CVPR ’26): Linear-Time, Stateful 3D Reconstruction via Test-Time Training (TTT). ZipMap “zips” a large image collection into an implicit TTT scene state in a single linear-time operation. The state will then be decoded into spatial outputs, and can be queried efficiently for novel-view geometry and appearance (~100 FPS) ZipMap is not only much faster (>20× faster than VGGT), but also matches or surpasses the accuracy of all SOTA models.

English

19

99

744

66.7K

Haotong Lin retweetledi

Saining Xie@sainingxie·14 Kas

papers are kind of like movies: the first one is usually the best, and the sequels tend to get more complicated but not really more exciting. But that totally doesn’t apply to the DepthAnything series. @bingyikang's team somehow keeps making things simpler and more scalable each time. in this new version, they basically show that a strong representation encoder plus a depth-ray prediction objective is enough (you see the RAE vibes too, right?) to get solid, general spatial perception across a bunch of tasks. people often say they hate computer vision because it’s messy--too many tasks, too many data types, too many moving parts. but that’s exactly why I love it. I think the biggest AI breakthroughs are going to come quietly from vision and then suddenly leapfrog everything else, changing how AI interacts with the real world and with us. pretty soon we’ll realize vision is not a big list of tasks--it’s a perspective. a perspective about modeling continuous sensory data, building layered representations of the world, and inching toward human-like intelligence. and tbh we’re watching this happen every day, behind all the hype, as all these different '"tasks" slowly start to merge.

Bingyi Kang@bingyikang

After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 reveals two key insights: 💎 A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture. ✨ A single depth-ray representation is enough. No complex 3D tasks. Three series of models have been released: the main DA3 series, a monocular metric estimation series, and a monocular depth estimation series. The core team members, aside from me: @HaotongLin, Sili Chen, Jun Hao Liew, @donydchen. 👇(1/n) #DepthAnything3

English

5

40

517

75.7K

Haotong Lin retweetledi

Bingyi Kang@bingyikang·14 Kas

After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3 reveals two key insights: 💎 A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture. ✨ A single depth-ray representation is enough. No complex 3D tasks. Three series of models have been released: the main DA3 series, a monocular metric estimation series, and a monocular depth estimation series. The core team members, aside from me: @HaotongLin, Sili Chen, Jun Hao Liew, @donydchen. 👇(1/n) #DepthAnything3

English

80

496

3.6K

510.6K

Haotong Lin@HaotongLin·14 Kas

@LordMassicotte @_akhaliq Not at the moment ⏳. Currently, Depth-Anything-3 does not support 360° camera inputs, but it's an interesting area for future exploration! 🔭

English

1

0

3

69

LordMassicotte@LordMassicotte·14 Kas

@_akhaliq Can it work with 360 camera too ?

English

1

0

5

888

AK@_akhaliq·14 Kas

Depth Anything 3 Recovering the Visual Space from Any Views

English

7

111

753

47.6K

Haotong Lin@HaotongLin·14 Kas

@jianyuan_wang @bingyikang Thank you so much! We really appreciate you tweeting about our work!

English

0

1

224

Jianyuan@jianyuan_wang·14 Kas

Beautiful work Depth Anything 3 by @HaotongLin, @bingyikang , and the team! btw I thought it would be named as Depth Anything v3 😃

English

4

14

194

11.2K

Haotong Lin@HaotongLin·11 Eki

@Almorgand Not yet! Video depth support is coming soon 👀✨ Stay tuned — just a few more months!

English

1

0

1

52

Alexandre Morgand@Almorgand·10 Eki

@HaotongLin Awesome! Any video support would be in the works?

English

1

0

99

Haotong Lin@HaotongLin·10 Eki

Thank you for sharing our work! Marigold is really cool! However, it’s somewhat limited by the image VAE — many flying points appear just after encoding a perfect ground-truth depth. Pixel-space diffusion to the rescue 🚀

Anton Obukhov@AntonObukhov1

Pixel-Perfect-Depth: the paper aims to fix Marigold's loss of sharpness induced by VAE by using VFMs (VGGT/DAv2) and a DiT-based pixel decoder to refine the predictions and achieve clean depth discontinuities. Video by authors.

English

2

3

56

5.7K

Haotong Lin@HaotongLin·10 Eki

@sourav_bz @AntonObukhov1 Good question! Moge-v2 is a deterministic model — as discussed in our work, MSE loss makes it converge to middle depth near edges, causing flying points. You can see similar artifacts in Depth Anything V2, Moge V2, and Depth Pro— even though they produce very sharp 2D results.

English

0

70

Haotong Lin retweetledi

Anton Obukhov@AntonObukhov1·9 Eki

Pixel-Perfect-Depth: the paper aims to fix Marigold's loss of sharpness induced by VAE by using VFMs (VGGT/DAv2) and a DiT-based pixel decoder to refine the predictions and achieve clean depth discontinuities. Video by authors.

English

3

54

424

29.4K

Haotong Lin retweetledi

Yuxi Xiao@YuxiXiaohenry·8 Tem

🚀 We release SpatialTrackerV2: the first feedforward model for dynamic 3D reconstruction and 3D point tracking — all at once! Reconstruct dynamic scenes and predict pixel-wise 3D motion in seconds. 🔗 Webpage: spatialtracker.github.io 🔍 Online Demo: huggingface.co/spaces/Yuxihen…

English

5

86

463

41.1K

Haotong Lin retweetledi

Zhenjun Zhao@zhenjun_zhao·16 Tem

Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation @realzhenxu, Hongyu Zhou, @pengsida, @HaotongLin, @ghy990324, @jiahaoshao1, Peishan Yang, Qinglin Yang, Sheng Miao, @XingyiHe1, Yifan Wang, Yue Wang, @ruizhen_hu, @yiyi_liao_, @XiaoweiZhou5, Hujun Bao tl;dr: survey arxiv.org/abs/2507.11540

Filipino

0

12

62

2.7K

Haotong Lin@HaotongLin·29 Nis

Wow, thank you for crediting our work! Thrilled to see our project PromptDepthAnything being used in your latest release. This is awesome! Best of luck with the new version! !

Chris make some 3D scans@ChrisAtKIRI

Are you tired of the low quality of iPhone lidar scans? I am! And that is why we are bringing this cutting-edge iPhone lidar scan enhancement function into production! With the guidance of normal and depth, the geometry can now reach the next level! Showcases: kiri-innovation.github.io/LidarScanEnhan…. Please try our KIRI Engine 3.14 iOS version. Thanks to Xuqian and her AGSMesh paper (github.com/XuqianRen/AGS_…), which inspired us a lot, and also thanks to Haotong, Sida, and Jiaming for their stunning paper PromptDA (github.com/DepthAnything/…), which makes the depths way better. Of course, thanks to CJ and our intern team, Quanxiang and Ziteng, for helping with the development. #CVPR2025 #3DV2025 #GaussianSplatting #LiDAR

English

2

1

33

2K

Haotong Lin retweetledi

Pablo Vela@pablovelagomez1·3 Şub

Recently, I've been playing with my iPhone ToF sensor, but the problem has always been the abysmal resolution (256x192). The team behind DepthAnything released PromptDepthAnything that fixes this. Using @Polycam3D to collect the raw data, @Gradio to generate a UI, and @rerundotio to visualize. Links at the end of the thread

English

29

211

2.2K

244.5K

Haotong Lin retweetledi

Xingyi He@XingyiHe1·14 Oca

Excited to share our work MatchAnything: We pre-train strong universal image matching models that exhibit remarkable generalizability on unseen multi-modality matching and registration tasks. Project page: zju3dv.github.io/MatchAnything/ Huggingface Demo: huggingface.co/spaces/LittleF…

English

19

156

810

71.9K

Haotong Lin@HaotongLin·20 Ara

@Vickitidrum @songyoupeng Thanks for your interests! Any form of prompting can be applied to this setting, and we mainly verify this paradigm with iPhone LiDAR (typically a 24x24 points DToF) in our paper.😀

English

1

0

183

Vickram Rajendran@Vickitidrum·19 Ara

@songyoupeng Very cool work! Any thoughts on using focal lengths/metric depth/flat plane, or metric depth from another model instead of lidar for the prompting here? How accurate does the prompt need to be, is lidar(even cheap) the only way?

English

3

0

2

315

Songyou Peng@songyoupeng·19 Ara

Dreaming of very accurate metric depth in stunning 4K resolution at speed? Check out our Prompt Depth Anything! We "prompt" Depth Anything with sparse lidar cues, enabling a wide range of applications! 🔗 Project page with codes and cool visualizations: promptda.github.io

Bingyi Kang@bingyikang

Want to use Depth Anything, but need metric depth rather than relative depth? Thrilled to introduce Prompt Depth Anything, a new paradigm for accurate metric depth estimation with up to 4K resolution. 👉Key Message: Depth foundation models like DA have already internalized rich geometric knowledge of the 3D world but lack a proper way to elicit it. Inspired by the success of prompting in LLMs, we propose prompting Depth Anything with metric cues to produce metric depth. This method proves to be very effective when using a low-cost lidar (e.g., iPhone's LiDAR), which is widely available, as prompts. We believe the prompt can generalize to other forms as long as scale information is provided. Prompt Depth Anything offers 1⃣A series of models for iPhone lidars. 2⃣4D reconstruction from monocular videos (captured with iPhone). 3⃣Improved generalization ability for robot manipulation, e.g. Training on cans but generalizing on glasses. 4⃣More detailed depth annotations for the ScanNet++ dataset. The first author is our excellent intern @HaotongLin. Paper: huggingface.co/papers/2412.14… Huggingface: huggingface.co/papers/2412.14… Project Page: promptda.github.io Code: github.com/DepthAnything/…

English

1

13

136

12.4K

Haotong Lin@HaotongLin·19 Ara

@pzoltowski @bingyikang TrueDepth is great, but as a front camera, its range and applicable scenes are limited. That’s why we opted for the rear camera and its LiDAR for better coverage and more applicable scenes. 😀

English

0

5

97

Patryk Zoltowski@pzoltowski·19 Ara

@bingyikang Any reason why cannot use depth from front Truedepth camera? They are also available in ARKit (with head tracking config) and from my experience higher quality and bigger resolution (640x480) than those from Lidar. However only provided at 30fps.

English

1

0

3

558

Haotong Lin retweetledi

Bingyi Kang@bingyikang·19 Ara

Want to use Depth Anything, but need metric depth rather than relative depth? Thrilled to introduce Prompt Depth Anything, a new paradigm for accurate metric depth estimation with up to 4K resolution. 👉Key Message: Depth foundation models like DA have already internalized rich geometric knowledge of the 3D world but lack a proper way to elicit it. Inspired by the success of prompting in LLMs, we propose prompting Depth Anything with metric cues to produce metric depth. This method proves to be very effective when using a low-cost lidar (e.g., iPhone's LiDAR), which is widely available, as prompts. We believe the prompt can generalize to other forms as long as scale information is provided. Prompt Depth Anything offers 1⃣A series of models for iPhone lidars. 2⃣4D reconstruction from monocular videos (captured with iPhone). 3⃣Improved generalization ability for robot manipulation, e.g. Training on cans but generalizing on glasses. 4⃣More detailed depth annotations for the ScanNet++ dataset. The first author is our excellent intern @HaotongLin. Paper: huggingface.co/papers/2412.14… Huggingface: huggingface.co/papers/2412.14… Project Page: promptda.github.io Code: github.com/DepthAnything/…

English

9

78

454

67.3K

Haotong Lin@HaotongLin·19 Ara

(2/2) Something interesting I found is that recent monocular depth methods like Depth Pro can reconstruct highly detailed depth, but these depths are inconsistent in 3D, leading to poor reconstruction. Instead, our approach with low-cost LiDAR guidance yields 3D-consistent depth.

English

0

1

6

740

Haotong Lin@HaotongLin·19 Ara

Check out our new work, Prompt Depth Anything, which achieves accurate metric depth estimation at up to 4K resolution! Thanks to all our collaborators!

Bingyi Kang@bingyikang

Want to use Depth Anything, but need metric depth rather than relative depth? Thrilled to introduce Prompt Depth Anything, a new paradigm for accurate metric depth estimation with up to 4K resolution. 👉Key Message: Depth foundation models like DA have already internalized rich geometric knowledge of the 3D world but lack a proper way to elicit it. Inspired by the success of prompting in LLMs, we propose prompting Depth Anything with metric cues to produce metric depth. This method proves to be very effective when using a low-cost lidar (e.g., iPhone's LiDAR), which is widely available, as prompts. We believe the prompt can generalize to other forms as long as scale information is provided. Prompt Depth Anything offers 1⃣A series of models for iPhone lidars. 2⃣4D reconstruction from monocular videos (captured with iPhone). 3⃣Improved generalization ability for robot manipulation, e.g. Training on cans but generalizing on glasses. 4⃣More detailed depth annotations for the ScanNet++ dataset. The first author is our excellent intern @HaotongLin. Paper: huggingface.co/papers/2412.14… Huggingface: huggingface.co/papers/2412.14… Project Page: promptda.github.io Code: github.com/DepthAnything/…

English

2

5

42

4.8K

Haotong Lin retweetledi

Zhen Xu@realzhenxu·1 Haz

(1/8) Ever wanted to create an avatar of yourself that interacts realistically with different lighting? In our CVPR 2024 Highlight🌟paper, we present a method for creating relightable and animatable avatars from only sparse/monocular video. Project Page: zju3dv.github.io/relightable_av…

English

3

22

114

17.3K

Haotong Lin

Keşfet