Khiem Vuong (@kvuongdev) - โปรไฟล์ Twitter

ทวีตที่ปักหมุด

Khiem Vuong@kvuongdev·18 Nis

[1/6] Recent models like DUSt3R generalize well across viewpoints, but performance drops on aerial-ground pairs. At #CVPR2025, we propose AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap.

English

7

103

552

55.6K

Khiem Vuong รีทวีตแล้ว

Ethan Weber@ethanjohnweber·17 Mar

I made a Claude Code skill that generates conference posters 🛠️ Instead of a static PDF, it outputs a single HTML file — drag to resize columns, swap sections, adjust fonts, then give your layout back to Claude. 🔁 🔗 Skill 👉 github.com/ethanweber/pos…

English

29

332

2.5K

180K

Khiem Vuong รีทวีตแล้ว

Ethan Weber@ethanjohnweber·22 Oca

🔦 LuxRemix🚦is out! 🌞 We’re happy to release our project that uses generative modeling to interactively relight indoor scenes (as images or as splats)! It was amazing seeing our incredible intern @RfLiang lead this effort. 😁 luxremix.github.io 👈

Christian Richardt@c_richardt

We’re excited to share LuxRemix: interactive light editing for indoor scenes! 🏠💡 Capture a room once, then turn individual lights on/off, change colors, and adjust intensity – all in real-time 3D from any viewpoint. 💡 luxremix.github.io 📄 arxiv.org/abs/2601.15283

English

0

1

14

1.6K

Khiem Vuong@kvuongdev·18 Ara

Super cool real2sim human-scene reconstruction work from @1mNotPrepared!

Zihan Wang@Z1hanW

Introduce CRISP, a real-to-sim pipeline that recovers human motion and simulatable scene geometry from monocular video! CRISP builds contact-faithful 3D scene for simulation - 8× fewer sim failures, +43% faster sim, and improves human motion! Interactive demos👉: crisp-real2sim.github.io/CRISP-Real2Sim/ Exciting collaboration w/ @JiashunWang @jefftan969 @_Tsukasane @ Jessica Hodgins @shubhtuls @RamananDeva

English

0

2

228

Khiem Vuong@kvuongdev·12 Ara

Super cool 4D work from @JayKarhade! A lot of sweats (and tears?) went into selecting the right data to properly scale up the model, so make sure to check it out!

Jay Karhade@JayKarhade

Introducing Any4D, a unified transformer for fully feed-forward, dense, metric-scale 4D reconstruction from flexible inputs! Any4D regresses per-pixel motion + geometry across frames in one pass — 15× faster, 2–3× more accurate reconstructions ⚡📈 Details + code below 👇 Exciting collab with @Nik__V__ @YuchenZhan54250 Tanisha Gupta @akashshrm02 @smash0190 @RamananDeva

English

0

2

6

632

Khiem Vuong@kvuongdev·22 Kas

@gabriberton 🤣🤣

QME

0

1

320

Gabriele Berton@gabriberton·22 Kas

Went to buy a pair of shoes in Menlo Park and they first did a 3D reconstruction of my feet It felt surreal Of course I had to ask the shop assistant if he thought E2E methods would replace COLMAP one day

English

14

5

143

15.3K

Khiem Vuong@kvuongdev·21 Kas

Awesome results, congrats on the release @Parskatt! Great to see AerialMegaDepth being used for both training and eval. About the spurious depths on sky, we noticed that as well and actually provided the segmentation masks to mask out the sky regions when we trained DUSt3R/MASt3R (see dataloader #L87-L95" target="_blank" rel="nofollow noopener">github.com/kvuong2711/aer…). The masks are actually already included in the HF's data repo (huggingface.co/datasets/kvuon…), I should have better highlighted this 🙂. If you have the bandwidth to re-train the model (maybe before the camera-ready/final version), I would be curious to see if this fixes the problem😄.

English

2

0

5

259

Dmytro Mishkin 🇺🇦@ducha_aiki·20 Kas

RoMa v2: Harder Better Faster Denser Feature Matching @Parskatt et 11 al. tl;dr: in title. Predict covariance per-pixel, more datasets, use DINOv3, adjust architecture. arxiv.org/abs/2511.15706

English

4

18

130

7.1K

Khiem Vuong@kvuongdev·29 Eki

@tarashakhurana @CMU_Robotics @RamananDeva @shubhtuls @KaterinaFragiad @cvondrick @GuibasLeonidas Congrats @tarashakhurana !🔥

English

0

1

84

Tarasha Khurana@tarashakhurana·29 Eki

Life update: I recently defended my PhD at @CMU_Robotics where I was advised by @RamananDeva! Last few years were so much fun. Incredibly grateful to everyone in Smith and especially to my committee @shubhtuls @KaterinaFragiad @cvondrick and @GuibasLeonidas. I am now at @Tesla_Optimus working on cool perception problems for humanoids!

English

24

17

379

26.6K

Khiem Vuong@kvuongdev·17 Eyl

Check out @Nik__V__ ‘s latest work that allows flexible input conditioning on top of a VGGT-like architecture! Especially, huge effort in providing data processing for all the datasets + nice code & documentation — this will help the 3D community to build upon this!

Nikhil Keetha@Nik__V__

Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀 One universal model enables SoTA for: 🔥 Mono Depth Estimation 🔥 Multi-View SfM 🔥 Multi-View Stereo 🔥 Depth Completion 🔥 Registration … and many more possibilities! – plus everything is metric 🎯 We release code for data processing, training, benchmarking & ablations – everything Apache 2.0! Details & Links 👇

English

0

1

7

821

Khiem Vuong@kvuongdev·9 Ağu

Thanks to the inspiration from @gabriberton's mesh-based retrieval paper (from our casual chat back at CMU). I'm not very sure if using mesh models is truly scalable (constructing high-quality colored mesh is non-trivial), but for now it's a good intermediary to glue different modalities together!

English

0

6

131

Anurag Ghosh@anuragxel·9 Ağu

@gabriberton Did you see @kvuongdev’s brilliant paper from this CVPR? 😬 Covers many of your wants, lays down how to generate/glue such data cleanly. :) aerial-megadepth.github.io

English

3

1

8

1.3K

Gabriele Berton@gabriberton·8 Ağu

#ResearchIdeas n1 - TLDR: a large-scale multi-modal cross-view dataset (without actually collecting new data) Here's an idea for a good CVPR-level paper, hopefully someone reads this and works on it. (1/many)

Gabriele Berton@gabriberton

My PhD is over, but I still had a few big and small research projects I thought were very promising. I don't work on computer vision anymore, so I'll post these detailed ideas here over the next weeks: feel free to work on them and claim them as yours. (1/2)

English

4

11

213

46.2K

Khiem Vuong รีทวีตแล้ว

Tarasha Khurana@tarashakhurana·18 Tem

Excited to share recent work with @kaihuac5 and @RamananDeva where we learn to do novel view synthesis for dynamic scenes in a self-supervised manner, only from 2D videos! webpage: cog-nvs.github.io arxiv: arxiv.org/abs/2507.12646 code (soon): github.com/Kaihua-Chen/co…

English

4

29

113

23.7K

Khiem Vuong@kvuongdev·15 Haz

Will be presenting our poster shortly at 📍#59! #CVPR2025

Khiem Vuong@kvuongdev

[1/6] Recent models like DUSt3R generalize well across viewpoints, but performance drops on aerial-ground pairs. At #CVPR2025, we propose AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap.

English

0

9

131

8.8K

Khiem Vuong@kvuongdev·9 Haz

@jianyuan_wang @zhiwen_fan_ Ah now I see what you are talking about. I agree, hopefully one of the authors could clarify this! @zhiwen_fan_

English

0

42

Jianyuan@jianyuan_wang·9 Haz

@kvuongdev @zhiwen_fan_ yeah corner cases such as AerialMegaDepth would be quite challenging and it is expected if models have low performance there. While for in-domain datasets such as CO3Dv2 or ScanNet, the results look super weird. Just hoping to double confirm

English

1

0

1

72

Zhiwen(Aaron) Fan@zhiwen_fan_·4 Haz

Discover the right 3D Geometric Foundation Model for your task—whether it’s stereo matching, multi-view depth estimation, video depth, pose estimation, semantic understanding, or novel view synthesis. Explore more insights in our #E3DBench #FoundationModel #3D #GaussianSplatting. Project Webpage: e3dbench.github.io

English

8

73

379

29.8K

Khiem Vuong@kvuongdev·7 Haz

@jianyuan_wang I believe the biggest failure mode is on ULTRRA dataset which contains extreme ground-aerial pairs. In our paper (aerial-megadepth.github.io), we observed the poor performance of DUSt3R/MASt3R on ground-aerial and proposed a new training/finetuning dataset to mitigate this. It seems that newer models like VGGT still struggles on this, and would be interesting to see if finetuning VGGT on our dataset helps with this scenario. Happy to chat more in-person at CVPR!

English

1

0

1

87

Jianyuan@jianyuan_wang·4 Haz

@zhiwen_fan_ It appears that camera poses might have been evaluated in a coordinate system or convention inconsistent with the ground truth, for example, comparing camera-to-world vs. world-to-camera. That's often the case when I have such unusually high number.

English

1

0

2

234

Khiem Vuong@kvuongdev·7 Haz

Ground-aerial 3D prediction is tough! @zhiwen_fan_'s work highlights the struggle for existing models (DUSt3R/MASt3R/VGGT/etc.), indicating that much work still needs to be done in this area. At #CVPR2025, we will be presenting AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap. If curious, please stop by our poster: 🗓️ Sunday 06/15 10:30AM ExHall D | Poster #59

Zhiwen(Aaron) Fan@zhiwen_fan_

Discover the right 3D Geometric Foundation Model for your task—whether it’s stereo matching, multi-view depth estimation, video depth, pose estimation, semantic understanding, or novel view synthesis. Explore more insights in our #E3DBench #FoundationModel #3D #GaussianSplatting. Project Webpage: e3dbench.github.io

English

1

26

153

8K

Khiem Vuong@kvuongdev·23 May

@mangahomanga @CMU_Robotics @gupta_abhinav_ @shubhtuls @svlevine @Oliver_Kroemer Congrats!

English

0

1

102

Homanga Bharadhwaj@mangahomanga·23 May

Happy to share that I've defended my PhD thesis a few weeks ago @CMU_Robotics ! Grateful to countless individuals for their selfless support over the years, in particular my advisors @gupta_abhinav_ @shubhtuls committee members @svlevine @Oliver_Kroemer all my collaborators...

English

46

6

261

11.4K

Khiem Vuong รีทวีตแล้ว

Anish Madan@anishmadan23·23 Nis

🚨 The 2nd iteration of our @CVPR Foundational Few-Shot Object Detection Challenge is LIVE! Can your model think like an annotator? 🧠💥 Align Vision-Language Models (VLMs) with a few multi-modal examples & win 💰 cash prizes! 🔗 Challenge: eval.ai/web/challenges…

English

1

15

36

5.7K

Khiem Vuong@kvuongdev·19 Nis

@AljosaOsep Thanks Aljosa, glad you like the work!

English

0

1

49

Aljosa@AljosaOsep·19 Nis

One of the coolest papers I've read in a while

Khiem Vuong@kvuongdev

[1/6] Recent models like DUSt3R generalize well across viewpoints, but performance drops on aerial-ground pairs. At #CVPR2025, we propose AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap.

English

1

0

8

744

Khiem Vuong@kvuongdev·19 Nis

@BDuisterhof Thanks Bart!

English

0

254

Bardienus Duisterhof@BDuisterhof·19 Nis

@kvuongdev Nice work!

English

1

0

1

378

Khiem Vuong@kvuongdev·18 Nis

[1/6] Recent models like DUSt3R generalize well across viewpoints, but performance drops on aerial-ground pairs. At #CVPR2025, we propose AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap.

English

7

103

552

55.6K

Khiem Vuong

ค้นพบ