@janusch_patas@jonstephens85@NeoVroloK From what we've seen, training longer in cases like this helps a lot — significantly reducing floaters even with imperfect initialization. That said, we’re not claiming it’s floater-free or universally robust — results still depend on the setup.
EDGS: Eliminating Densification for Efficient Convergence of 3DGS
Contributions:
• We show that initial triangulation based on 2D correspondences can replace the incremental refinement process, fundamentally changing how 3DGS models allocate resources.
• Our method reduces the path each Gaussian must travel in parameter space. Careful initialization not only accelerates convergence but also guides optimization toward a convergence point corresponding to lower reconstruction error and thus higher reconstruction quality.
• Our approach outperforms both speed-optimized and quality-focused state-of-the-art models while using only half the splats of standard 3DGS. By improving initialization rather than altering the optimization process, this method is compatible with other 3DGS acceleration techniques, making it a flexible enhancement to existing models.
@ChrisAtKIRI Also, if the content is front-facing even 16 views can be enough for a full reconstruction, so we don’t necessarily need a lot of views. But if the scene is complex, then dense views are needed, and yes, that’s when time spent on pose estimation becomes necessary.
@ChrisAtKIRI Thanks a lot for testing EDGS and sharing it!
In our experiments, we focused on evaluating quality and convergence speed under the same precomputed pose conditions as 3DGS — where the poses are given. In the paper, all our main results are based on known poses.
Ok, so, EDGS (github.com/CompVis/EDGS). Let me share some testing insight about this work. Works well on front-facing video (now I know why they emphasize this), extracted a few views and matched with colmap, and trained very fast with dense initialization. If it's dense view, I tested with my existing data 290 camera poses, the training took 27 minutes on 4090. And the most annoying part is matching the poses. If you want fast, you need to have sparse input, but the sparse input cannot be easily matched with SFM. That is probably why the author put VGGT there but hasn't used it yet.
@ChrisAtKIRI For the demo, we implemented a basic pose estimation module, but it's admittedly slow since our efforts went into improving reconstruction quality and speed, given poses. That’s actually why we mentioned VGGT as a potential integration, though we haven’t fully explored that yet.
🔥Join LMU's Computer Vision & Learning Group as Scientific Lab Coordinator! Work with Prof. Ommer on cutting-edge AI projects like Stable Diffusion. Drive research, manage grants, foster collaborations. Full-time (TV-L E13). Apply by Sept 6, to push scientific boundaries.