Phuc Nguyen Duc Anh

112 posts

Phuc Nguyen Duc Anh

Phuc Nguyen Duc Anh

@phucnda

Ph.D. @UMDCS | Ex-resident at @VinAI_Research

Washington, DC Katılım Haziran 2023
210 Takip Edilen93 Takipçiler
Phuc Nguyen Duc Anh retweetledi
Zhenjun Zhao
Zhenjun Zhao@zhenjun_zhao·
Scal3R: Scalable Test-Time Training for Large-Scale 3D Reconstruction Tao Xie, Peishan Yang, @krahets, Yingfeng Cai, Wei Yin, Weiqiang Ren, Qian Zhang, Wei Hua, @pengsida, @gingertata, @XiaoweiZhou5 tl;dr: neural global context with lightweight sub-networks and context aggregation arxiv.org/abs/2604.08542
Zhenjun Zhao tweet mediaZhenjun Zhao tweet mediaZhenjun Zhao tweet mediaZhenjun Zhao tweet media
English
0
10
50
2.3K
Phuc Nguyen Duc Anh retweetledi
Ai2
Ai2@allen_ai·
Today we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵
English
8
61
269
78.6K
Phuc Nguyen Duc Anh retweetledi
Daniel DeTone
Daniel DeTone@ddetone·
Today we release Boxer, a new lightweight approach that lifts open-world 2D bounding boxes to *metric* 3D: facebookresearch.github.io/boxer/ Here we show Boxer in action on an egocentric sequence captured from smart glasses:
English
23
165
1.3K
76.1K
Phuc Nguyen Duc Anh retweetledi
Anh-Quan Cao
Anh-Quan Cao@AnhQuanCAO·
🧵 1/7 One model. Any sensor rig. Any domain. No poses. No intrinsics. **OccAny**: #CVPR2026 paper on generalized unconstrained urban 3D occupancy. Code release: training, data, eval & viz - OccAny+ (Depth Anything 3, SAM3), OccAny (MUSt3R, SAM2) Link: valeoai.github.io/OccAny
English
2
34
197
21.5K
Phuc Nguyen Duc Anh retweetledi
Kwang Moo Yi
Kwang Moo Yi@kwangmoo_yi·
Huang et al., "UniRecGen: Unifying Multi-View 3D Reconstruction and Generation" Estimate 3D point clouds (maps) in both camera and object coordinates, which leads to better multi-view feed-forward reconstruction.
Kwang Moo Yi tweet media
English
1
9
88
4.2K
Phuc Nguyen Duc Anh retweetledi
Zhenjun Zhao
Zhenjun Zhao@zhenjun_zhao·
VGGT-World: Transforming VGGT into an Autoregressive Geometry World Model Xiangyu Sun, Shijie Wang, Fengyi Zhang, Lin Liu, Caiyan Jia, Ziying Song, Zi Huang, Yadan Luo tl;dr: frozen VGGT features->world state; learn temporal evolution arxiv.org/abs/2603.12655
Zhenjun Zhao tweet mediaZhenjun Zhao tweet mediaZhenjun Zhao tweet mediaZhenjun Zhao tweet media
Filipino
0
19
160
13K
Phuc Nguyen Duc Anh retweetledi
Radiance Fields
Radiance Fields@RadianceFields·
COLMAP just dropped V4.0, officially adding GLOMAP and several updates.
Radiance Fields tweet media
English
8
55
405
22.4K
Phuc Nguyen Duc Anh retweetledi
Brie Wensleydale🧀🐭
Brie Wensleydale🧀🐭@SlipperyGem·
this is another implementation of SAM3D with multi-view. I'll be honest, SAM3D's only flaw was the lack of multi-view, and this solves it. Hopefully it gets a node too, or get rolled into another SAM3D custom node. github.com/devinli123/MV-…
GIF
Brie Wensleydale🧀🐭 tweet media
English
6
31
250
11.2K
Phuc Nguyen Duc Anh retweetledi
sasaki@engineer
sasaki@engineer@rsasaki0109·
DGGT: Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images xiaomi-research.github.io/dggt/ Autonomous driving needs fast, scalable 4D reconstruction and re-simulation for training and evaluation, yet most methods for dynamic driving scenes still rely on per-scene optimization, known camera calibration, or short frame windows, making them slow and impractical. We revisit this problem from a feedforward perspective and note that the existing formulations, treating camera pose as a required input, limits flexibility and scalability. Instead, we reformulate pose as an output of the model, enabling reconstruction directly from sparse, unposed images and supporting an arbitrary number of views for long sequences. Our approach jointly predicts per-frame 3D Gaussian maps and camera parameters, disentangles dynamics with a lightweight dynamic head, and preserves temporal consistency with a lifespan head that modulates visibility over time. A diffusion-based rendering refinement further reduces motion/interpolation artifacts and improves novel-view quality under sparse inputs. The result is a single-pass, pose-free algorithm that achieves state-of-the-art performance and speed. Trained and evaluated on large-scale driving benchmarks (Waymo, nuScenes, Argoverse2), our method outperforms prior work both when trained on each dataset and in zero-shot transfer across datasets, and it scales well as the number of input frames increases.
sasaki@engineer tweet mediasasaki@engineer tweet media
English
0
9
74
4K
Phuc Nguyen Duc Anh retweetledi
Kwang Moo Yi
Kwang Moo Yi@kwangmoo_yi·
Cheng et al., "ReCoSplat: Autoregressive Feed-Forward Gaussian Splatting via Render-and-Compare" Feed-forward Gaussian Splats have errors. Render current estimates to incoming views, and use those together for better reconstruction. Ie, render and compare.
Kwang Moo Yi tweet media
English
2
7
62
3.8K
Phuc Nguyen Duc Anh retweetledi
Zixian Liu
Zixian Liu@ZixianLiu03·
🚀Introducing OnlineSI: Taming Large Language Model for Online 3D Understanding and Grounding! Can MLLMs truly understand an ever-changing 3D world🌍? 🔗Explore more: onlinesi.github.io
Zixian Liu tweet media
English
1
22
136
34.3K
Phuc Nguyen Duc Anh retweetledi
Dmytro Mishkin 🇺🇦
Dmytro Mishkin 🇺🇦@ducha_aiki·
DAGE: Dual-Stream Architecture for Efficient and Fine-Grained Geometry Estimation Tuan Duc Ngo et 6 al. tl;dr: low-res multivew (Pi3-distilled) + highres single view( MoGe2 ft) arxiv.org/abs/2603.03744
Dmytro Mishkin 🇺🇦 tweet mediaDmytro Mishkin 🇺🇦 tweet mediaDmytro Mishkin 🇺🇦 tweet mediaDmytro Mishkin 🇺🇦 tweet media
English
0
6
34
1.8K
Phuc Nguyen Duc Anh retweetledi
Jon Barron
Jon Barron@jon_barron·
One of the more interesting and thought provoking research papers I've seen in a while. A system for reading and reimplementing NeRF papers, and it seems to work very well. Pretty easy to extrapolate out from here to what CVPR 2027 papers will look like. seemandhar.github.io/NERFIFY/
Jon Barron tweet media
English
6
56
376
46.8K
Phuc Nguyen Duc Anh retweetledi
AI Bites | YouTube Channel
Recent advancements in neural visual geometry, including transformer-based models such as VGGT and Pi3, have achieved impressive accuracy on 3D reconstruction tasks. However, their reliance on full attention makes them fundamentally limited by GPU memory capacity, preventing them from scaling to large, unordered image collections. MERG3R, a training-free divide-and-conquer framework that enables geometric foundation models to operate far beyond their native memory limits. Given a large unordered set of 1,000 input images, MERG3R reconstructs accurate camera poses and a high-quality point cloud. Despite the long sequence and challenging viewpoints, the pipeline remains stable and scalable. Paper Title: MERG3R: A Divide-and-Conquer Approach to Large-Scale Neural Visual Project: leochengkx.github.io/MERG3R/ Link: arxiv.org/abs/2603.02351
English
0
1
4
442
Phuc Nguyen Duc Anh retweetledi
MrNeRF
MrNeRF@janusch_patas·
OnlineX: Unified Online 3D Reconstruction and Understanding with Active-to-Stable State Evolution Abstract (excerpt): In this paper, we introduce OnlineX, a feed-forward framework that reconstructs both 3D visual appearance and language fields in an online manner using only streaming images.A key challenge in online formulation is the cumulative drift issue, which is rooted in the fundamental conflict between two opposing roles of the memory state: → an active role that constantly refreshes to capture high-frequency local geometry → a stable role that conservatively accumulates and preserves the long-term global structureTo address this, we introduce a decoupled active-to-stable state evolution paradigm. Our framework decouples the memory state into a dedicated active state and a persistent stable state, and then cohesively fuses the information from the former into the latter to achieve both fidelity and stability.Moreover, we jointly model visual appearance and language fields and incorporate an implicit Gaussian fusion module to enhance reconstruction quality.
English
1
8
48
4.1K