Khiem Vuong

40 posts

Khiem Vuong

Khiem Vuong

@kvuongdev

Doing PhD @CMU_Robotics | Prev @Apple | Vision, Robotics

Pittsburgh, PA เข้าร่วม Şubat 2018
239 กำลังติดตาม212 ผู้ติดตาม
ทวีตที่ปักหมุด
Khiem Vuong
Khiem Vuong@kvuongdev·
[1/6] Recent models like DUSt3R generalize well across viewpoints, but performance drops on aerial-ground pairs. At #CVPR2025, we propose AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap.
English
7
103
552
55.6K
Khiem Vuong รีทวีตแล้ว
Ethan Weber
Ethan Weber@ethanjohnweber·
I made a Claude Code skill that generates conference posters 🛠️ Instead of a static PDF, it outputs a single HTML file — drag to resize columns, swap sections, adjust fonts, then give your layout back to Claude. 🔁 🔗 Skill 👉 github.com/ethanweber/pos…
English
29
332
2.5K
180K
Khiem Vuong รีทวีตแล้ว
Ethan Weber
Ethan Weber@ethanjohnweber·
🔦 LuxRemix🚦is out! 🌞 We’re happy to release our project that uses generative modeling to interactively relight indoor scenes (as images or as splats)! It was amazing seeing our incredible intern @RfLiang lead this effort. 😁 luxremix.github.io 👈
Christian Richardt@c_richardt

We’re excited to share LuxRemix: interactive light editing for indoor scenes! 🏠💡 Capture a room once, then turn individual lights on/off, change colors, and adjust intensity – all in real-time 3D from any viewpoint. 💡 luxremix.github.io 📄 arxiv.org/abs/2601.15283

English
0
1
14
1.6K
Gabriele Berton
Gabriele Berton@gabriberton·
Went to buy a pair of shoes in Menlo Park and they first did a 3D reconstruction of my feet It felt surreal Of course I had to ask the shop assistant if he thought E2E methods would replace COLMAP one day
English
14
5
143
15.3K
Khiem Vuong
Khiem Vuong@kvuongdev·
Awesome results, congrats on the release @Parskatt! Great to see AerialMegaDepth being used for both training and eval. About the spurious depths on sky, we noticed that as well and actually provided the segmentation masks to mask out the sky regions when we trained DUSt3R/MASt3R (see dataloader #L87-L95" target="_blank" rel="nofollow noopener">github.com/kvuong2711/aer…). The masks are actually already included in the HF's data repo (huggingface.co/datasets/kvuon…), I should have better highlighted this 🙂. If you have the bandwidth to re-train the model (maybe before the camera-ready/final version), I would be curious to see if this fixes the problem😄.
Khiem Vuong tweet media
English
2
0
5
259
Khiem Vuong
Khiem Vuong@kvuongdev·
Thanks to the inspiration from @gabriberton's mesh-based retrieval paper (from our casual chat back at CMU). I'm not very sure if using mesh models is truly scalable (constructing high-quality colored mesh is non-trivial), but for now it's a good intermediary to glue different modalities together!
English
0
0
6
131
Gabriele Berton
Gabriele Berton@gabriberton·
#ResearchIdeas n1 - TLDR: a large-scale multi-modal cross-view dataset (without actually collecting new data) Here's an idea for a good CVPR-level paper, hopefully someone reads this and works on it. (1/many)
Gabriele Berton@gabriberton

My PhD is over, but I still had a few big and small research projects I thought were very promising. I don't work on computer vision anymore, so I'll post these detailed ideas here over the next weeks: feel free to work on them and claim them as yours. (1/2)

English
4
11
213
46.2K
Jianyuan
Jianyuan@jianyuan_wang·
@kvuongdev @zhiwen_fan_ yeah corner cases such as AerialMegaDepth would be quite challenging and it is expected if models have low performance there. While for in-domain datasets such as CO3Dv2 or ScanNet, the results look super weird. Just hoping to double confirm
English
1
0
1
72
Khiem Vuong
Khiem Vuong@kvuongdev·
@jianyuan_wang I believe the biggest failure mode is on ULTRRA dataset which contains extreme ground-aerial pairs. In our paper (aerial-megadepth.github.io), we observed the poor performance of DUSt3R/MASt3R on ground-aerial and proposed a new training/finetuning dataset to mitigate this. It seems that newer models like VGGT still struggles on this, and would be interesting to see if finetuning VGGT on our dataset helps with this scenario. Happy to chat more in-person at CVPR!
English
1
0
1
87
Jianyuan
Jianyuan@jianyuan_wang·
@zhiwen_fan_ It appears that camera poses might have been evaluated in a coordinate system or convention inconsistent with the ground truth, for example, comparing camera-to-world vs. world-to-camera. That's often the case when I have such unusually high number.
English
1
0
2
234
Khiem Vuong
Khiem Vuong@kvuongdev·
Ground-aerial 3D prediction is tough! @zhiwen_fan_'s work highlights the struggle for existing models (DUSt3R/MASt3R/VGGT/etc.), indicating that much work still needs to be done in this area. At #CVPR2025, we will be presenting AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap. If curious, please stop by our poster: 🗓️ Sunday 06/15 10:30AM ExHall D | Poster #59
Zhiwen(Aaron) Fan@zhiwen_fan_

Discover the right 3D Geometric Foundation Model for your task—whether it’s stereo matching, multi-view depth estimation, video depth, pose estimation, semantic understanding, or novel view synthesis. Explore more insights in our #E3DBench #FoundationModel #3D #GaussianSplatting. Project Webpage: e3dbench.github.io

English
1
26
153
8K
Khiem Vuong รีทวีตแล้ว
Anish Madan
Anish Madan@anishmadan23·
🚨 The 2nd iteration of our @CVPR Foundational Few-Shot Object Detection Challenge is LIVE! Can your model think like an annotator? 🧠💥 Align Vision-Language Models (VLMs) with a few multi-modal examples & win 💰 cash prizes! 🔗 Challenge: eval.ai/web/challenges…
Anish Madan tweet media
English
1
15
36
5.7K
Khiem Vuong
Khiem Vuong@kvuongdev·
[1/6] Recent models like DUSt3R generalize well across viewpoints, but performance drops on aerial-ground pairs. At #CVPR2025, we propose AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap.
English
7
103
552
55.6K