Phuc Nguyen Duc Anh

112 posts

Phuc Nguyen Duc Anh

@phucnda

Ph.D. @UMDCS | Ex-resident at @VinAI_Research

Washington, DC Katılım Haziran 2023

210 Takip Edilen93 Takipçiler

Phuc Nguyen Duc Anh retweetledi

Zhenjun Zhao@zhenjun_zhao·2d

VGGT-SLAM++ Avilasha Mandal, Rajesh Kumar, @sudarshan_s_h, Chetan Arora tl;dr: DEM->submap; DINOv2 embeddings->retrieval; covisibility graph synthesis arxiv.org/abs/2604.06830

2.7K

Phuc Nguyen Duc Anh retweetledi

Zhenjun Zhao@zhenjun_zhao·2d

Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction Tao Xie, Peishan Yang, @krahets, Yingfeng Cai, Wei Yin, Weiqiang Ren, Qian Zhang, Wei Hua, @pengsida, @gingertata, @XiaoweiZhou5 tl;dr: neural global context with lightweight sub-networks and context aggregation arxiv.org/abs/2604.08542

English

2.3K

Phuc Nguyen Duc Anh retweetledi

Ai2@allen_ai·5d

Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵

English

269

78.6K

Phuc Nguyen Duc Anh retweetledi

Daniel DeTone@ddetone·4d

Today we release Boxer, a new lightweight approach that lifts open-world 2D bounding boxes to *metric* 3D: facebookresearch.github.io/boxer/ Here we show Boxer in action on an egocentric sequence captured from smart glasses:

English

165

1.3K

76.1K

Phuc Nguyen Duc Anh retweetledi

Anh-Quan Cao@AnhQuanCAO·5d

🧵 1/7 One model. Any sensor rig. Any domain. No poses. No intrinsics. **OccAny**: #CVPR2026 paper on generalized unconstrained urban 3D occupancy. Code release: training, data, eval & viz - OccAny+ (Depth Anything 3, SAM3), OccAny (MUSt3R, SAM2) Link: valeoai.github.io/OccAny

English

197

21.5K

Phuc Nguyen Duc Anh retweetledi

Kwang Moo Yi@kwangmoo_yi·4 Nis

Huang et al., "UniRecGen: Unifying Multi-View 3D Reconstruction and Generation" Estimate 3D point clouds (maps) in both camera and object coordinates, which leads to better multi-view feed-forward reconstruction.

English

4.2K

Phuc Nguyen Duc Anh retweetledi

Xianzheng Ma@xianzhengoxford·1 Nis

🚨 Do 3D-LLMs really understand 3D? Short answer: no. Excited to share Real-3DQA at #ICLR2026! - Project: real-3dqa.github.io - Arxiv: arxiv.org/abs/2603.23523 - Data: huggingface.co/datasets/Olive… 🎥 1-min video explainer ↓

English

198

19.4K

Phuc Nguyen Duc Anh retweetledi

Zhenjun Zhao@zhenjun_zhao·24 Şub

OpenVO: Open-World Visual Odometry with Temporal Dynamics Awareness @phucnda, @anh_n_nhu, @MingCLinCS tl;dr: temporal dynamics+scene geometry arxiv.org/abs/2602.19035

English

1.9K

Phuc Nguyen Duc Anh retweetledi

Zhenjun Zhao@zhenjun_zhao·16 Mar

VGGT-World: Transforming VGGT into an Autoregressive Geometry World Model Xiangyu Sun, Shijie Wang, Fengyi Zhang, Lin Liu, Caiyan Jia, Ziying Song, Zi Huang, Yadan Luo tl;dr: frozen VGGT features->world state; learn temporal evolution arxiv.org/abs/2603.12655

Filipino

160

13K

Phuc Nguyen Duc Anh retweetledi

Radiance Fields@RadianceFields·15 Mar

COLMAP just dropped V4.0, officially adding GLOMAP and several updates.

English

405

22.4K

Phuc Nguyen Duc Anh retweetledi

Brie Wensleydale🧀🐭@SlipperyGem·14 Mar

this is another implementation of SAM3D with multi-view. I'll be honest, SAM3D's only flaw was the lack of multi-view, and this solves it. Hopefully it gets a node too, or get rolled into another SAM3D custom node. github.com/devinli123/MV-…

GIF

English

250

11.2K

Phuc Nguyen Duc Anh retweetledi

sasaki@engineer@rsasaki0109·13 Mar

DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images xiaomi-research.github.io/dggt/ Autonomous driving needs fast, scalable 4D reconstruction and re-simulation for training and evaluation, yet most methods for dynamic driving scenes still rely on per-scene optimization, known camera calibration, or short frame windows, making them slow and impractical. We revisit this problem from a feedforward perspective and note that the existing formulations, treating camera pose as a required input, limits flexibility and scalability. Instead, we reformulate pose as an output of the model, enabling reconstruction directly from sparse, unposed images and supporting an arbitrary number of views for long sequences. Our approach jointly predicts per-frame 3D Gaussian maps and camera parameters, disentangles dynamics with a lightweight dynamic head, and preserves temporal consistency with a lifespan head that modulates visibility over time. A diffusion-based rendering refinement further reduces motion/interpolation artifacts and improves novel-view quality under sparse inputs. The result is a single-pass, pose-free algorithm that achieves state-of-the-art performance and speed. Trained and evaluated on large-scale driving benchmarks (Waymo, nuScenes, Argoverse2), our method outperforms prior work both when trained on each dataset and in zero-shot transfer across datasets, and it scales well as the number of input frames increases.

English

Phuc Nguyen Duc Anh retweetledi

Kwang Moo Yi@kwangmoo_yi·12 Mar

Cheng et al., "ReCoSplat: Autoregressive Feed-Forward Gaussian Splatting via Render-and-Compare" Feed-forward Gaussian Splats have errors. Render current estimates to incoming views, and use those together for better reconstruction. Ie, render and compare.

English

3.8K

Phuc Nguyen Duc Anh retweetledi

Zixian Liu@ZixianLiu03·9 Mar

🚀Introducing OnlineSI: Taming Large Language Model for Online 3D Understanding and Grounding! Can MLLMs truly understand an ever-changing 3D world🌍? 🔗Explore more: onlinesi.github.io

English

136

34.3K

Phuc Nguyen Duc Anh retweetledi

Dmytro Mishkin 🇺🇦@ducha_aiki·6 Mar

DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation Tuan Duc Ngo et 6 al. tl;dr: low-res multivew (Pi3-distilled) + highres single view( MoGe2 ft) arxiv.org/abs/2603.03744

English

1.8K

Phuc Nguyen Duc Anh retweetledi

Chuanxia Zheng@ChuanxiaZ·7 Mar

#ICLR2026 🔥 Excited to share NOVA3R, the scene-level version of our previous Amodal3R. ✨ Key highlights: - Amodal reasoning: reconstructs occluded geometry - Physically plausible 3D with fewer duplicated structures Page: wrchen530.github.io/nova3r/ Page: arxiv.org/pdf/2603.04179

English

138

9.6K

Phuc Nguyen Duc Anh retweetledi

Jon Barron@jon_barron·4 Mar

One of the more interesting and thought provoking research papers I've seen in a while. A system for reading and reimplementing NeRF papers, and it seems to work very well. Pretty easy to extrapolate out from here to what CVPR 2027 papers will look like. seemandhar.github.io/NERFIFY/

English

376

46.8K

Phuc Nguyen Duc Anh retweetledi

AI Bites | YouTube Channel@ai_bites·5 Mar

Recent advancements in neural visual geometry, including transformer-based models such as VGGT and Pi3, have achieved impressive accuracy on 3D reconstruction tasks. However, their reliance on full attention makes them fundamentally limited by GPU memory capacity, preventing them from scaling to large, unordered image collections. MERG3R, a training-free divide-and-conquer framework that enables geometric foundation models to operate far beyond their native memory limits. Given a large unordered set of 1,000 input images, MERG3R reconstructs accurate camera poses and a high-quality point cloud. Despite the long sequence and challenging viewpoints, the pipeline remains stable and scalable. Paper Title: MERG3R: A Divide-and-Conquer Approach to Large-Scale Neural Visual Project: leochengkx.github.io/MERG3R/ Link: arxiv.org/abs/2603.02351

English

442

Phuc Nguyen Duc Anh retweetledi

MrNeRF@janusch_patas·3 Mar

OnlineX: Unified Online 3D Reconstruction and Understanding with Active-to-Stable State Evolution Abstract (excerpt): In this paper, we introduce OnlineX, a feed-forward framework that reconstructs both 3D visual appearance and language fields in an online manner using only streaming images.A key challenge in online formulation is the cumulative drift issue, which is rooted in the fundamental conflict between two opposing roles of the memory state: → an active role that constantly refreshes to capture high-frequency local geometry → a stable role that conservatively accumulates and preserves the long-term global structureTo address this, we introduce a decoupled active-to-stable state evolution paradigm. Our framework decouples the memory state into a dedicated active state and a persistent stable state, and then cohesively fuses the information from the former into the latter to achieve both fidelity and stability.Moreover, we jointly model visual appearance and language fields and incorporate an implicit Gaussian fusion module to enhance reconstruction quality.

English

4.1K

Phuc Nguyen Duc Anh retweetledi

youming.deng@denghilbert·22 Şub

We present the SOTA feed-forward 3DGS pipeline Selfi, which was accepted by #CVPR2026 Project Page: denghilbert.github.io/selfi

English

312

52.4K

Keşfet

@sudarshan_s_h @krahets @pengsida @gingertata @XiaoweiZhou5 @anh_n_nhu @MingCLinCS @elonmusk