I created a small wrapper for the amazing Trellis 3D generation tool. It allows you to generate meshes using partially occluded images. It doesn't require additional computations like inpainting (and even requires fewer computations than full-image inference:)
I made awesome progress in temporal stability! Trained 300 frames in 50 minutes on an RTX 4090, using only 7.8 GiB of VRAM.
There is still room for improvement.
Next, I plan to develop a viewer. I'll also explore training longer sequences. It feels super exciting!
Clearly, the result is far from ideal, and the text has deteriorated further. However, I still believe this approach holds significant potential for novel view synthesis.
Tools:
github.com/nerfstudio-pro…github.com/kijai/ComfyUI-…
An example of using the Video Diffusion Model to enhance novel views generated by Gaussian Splatting. On the left are the new views generated in Nerfstudio, while on the right is the same novel views video but enhanced using the Hunyuan Video model.