Umangi Jain

31 posts

Umangi Jain

Umangi Jain

@JainUmangi

CS PhD student @UofT

Toronto, Canada Katılım Nisan 2019
419 Takip Edilen191 Takipçiler
Umangi Jain
Umangi Jain@JainUmangi·
For evaluation, we create a 100-mesh benchmark with manually refined material groups. Material Magic Wand outperforms geometry, vision foundation embeddings, and 3D part-feature baselines. We hope this work enables faster material assignment workflows!
English
0
0
0
100
Umangi Jain
Umangi Jain@JainUmangi·
For training, we curate supervision from material IDs in Objaverse. Raw data is highly imbalanced: many materials appear in only one part, while some meshes have a single material covering nearly all parts. We use data balancing to mitigate both within and across-mesh imbalance.
English
1
0
0
102
Umangi Jain
Umangi Jain@JainUmangi·
🪄 Introducing Material Magic Wand: Material-Aware Grouping of 3D Parts in Untextured Meshes accepted to #CVPR2026. We present a tool for selecting material-consistent parts in untextured meshes. Click on one part and retrieve other parts likely to share the same material.
English
3
8
49
4.4K
Umangi Jain retweetledi
Ziyi Wu
Ziyi Wu@Dazitu_616·
📢 Introducing DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Compared to vanilla DPO, we improve paired data construction and preference label granularity, leading to better visual quality and motion strength with only 1/3 of the data. 🧵
English
2
35
181
35.3K
Umangi Jain retweetledi
Kai He
Kai He@Kai__He·
🚀Excited to announce that our paper “CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion” has been accepted to #CVPR2025 ! 🌟We introduce a simple yet effective framework for controllable and consistent editing in dynamic 3D scenes. (1/5)
English
2
31
138
12.7K
Umangi Jain retweetledi
Yash Kant
Yash Kant@yash2kant·
🚀 Introducing Pippo – our diffusion transformer pre-trained on 3B Human Images and post-trained with 400M high-res studio images! ✨Pippo can generate 1K resolution turnaround video from a single iPhone photo! 🧵👀 Full deep dive thread coming up next!
Aran Komatsuzaki@arankomatsuzaki

Meta presents: Pippo : High-Resolution Multi-View Humans from a Single Image Generates 1K resolution, multi-view, studio-quality images from a single photo in a one forward pass

English
5
37
160
16.7K
Umangi Jain
Umangi Jain@JainUmangi·
This enables using the graph-cut algorithm, which has been used extensively in image segmentation, to minimize an energy function and effectively partition the Gaussians into foreground and background.
English
0
0
3
226
Umangi Jain
Umangi Jain@JainUmangi·
We propose a method for extracting objects in scenes obtained from 3DGS without modifications to the Gaussian optimization process. Our method interprets the Gaussians in 3DGS as nodes in a graph and introduces weighted edges based on proximity and perceptual similarity.
English
1
0
3
311
Umangi Jain retweetledi
Andrea Tagliasacchi 🇨🇦
📢📢📢 RoMo: Robust Motion Segmentation Improves Structure from Motion romosfm.github.io arxiv.org/pdf/2411.18650 TL;DR: boost your SfM pipeline on dynamic scenes. We use epipolar cues + SAMv2 features to find robust masks for moving objects in a zero-shot manner. 🧵👇
English
2
13
98
18.6K
Umangi Jain retweetledi
MrNeRF
MrNeRF@janusch_patas·
GaussianCut: Interactive Segmentation nvia Graph Cut for 3D Gaussian Splatting Contributions: 1) We propose a method for graph construction from a 3DGS model that utilizes the properties of the corresponding Gaussians to obtain edge weights, and 2) based on nthis graph, we propose and minimize an energy function (Equation 3) that combines the user inputs with the inherent representation of the scene. Our experimental evaluations show that GaussianCut obtains high-fidelity segmentation outperforming previous segmentation baselines.
MrNeRF tweet mediaMrNeRF tweet mediaMrNeRF tweet mediaMrNeRF tweet media
English
1
12
91
6.4K
Umangi Jain retweetledi
Yash Kant
Yash Kant@yash2kant·
📢🔍 Super excited to present Spatially Aware Multiview Diffusers (SPAD) at #CVPR2024! SPAD enables 3D consistent multi-view image generation from text or image inputs. It is trained using a high-quality Objaverse subset on 32 H100s! Code & Paper links at the end! 🧵👇
English
2
14
69
6.9K
Umangi Jain retweetledi
Ziyi Wu
Ziyi Wu@Dazitu_616·
1/ Excited to share our #CVPR2024 work LEOD! We propose a label-efficient learning framework for object detection with event cameras, which performs on par with SOTA models with **10x fewer labels**! Paper: arxiv.org/abs/2311.17286 Code: github.com/Wuziyi616/LEOD
Ziyi Wu tweet media
English
2
12
33
3.9K
Umangi Jain retweetledi
Ashkan Mirzaei
Ashkan Mirzaei@ashmrz10·
📢We introduce “RefFusion”, a novel inpainting method for scenes reconstructed using 3D Gaussian Splatting. 🔗reffusion.github.io TLDR: we personalize an image diffusion model to a given reference image and distill its knowledge to 3D through score distillation sampling.
English
9
32
186
34.7K
Umangi Jain retweetledi
Prateek Jain
Prateek Jain@jainprateek_·
Gecko, a new text embedding model is released with surprisingly strong MTEB performance despite using 1B sized encoder. It is equipped with MRL -- nested 256 and 512 dimensional embeddings! Provides nearly SOTA performance for 256 dimensional embeddings as well. [1/2]
Jinhyuk Lee@leejnhk

Introducing Gecko 🦎, a new text embedding model from Google DeepMind! Distilled from LLMs, Gecko offers powerful embeddings for various NLP tasks. Gecko is now available in Google Cloud API 👉bit.ly/google-gecko-a… Paper: bit.ly/google-gecko Colab: bit.ly/google-gecko-c…

English
1
5
72
6.1K
Umangi Jain retweetledi
Sherwin Bahmani
Sherwin Bahmani@sherwinbahmani·
Happy to share our new work 🥳 TC4D: Trajectory-Conditioned Text-to-4D Generation Project page: sherwinbahmani.github.io/tc4d Code: github.com/sherwinbahmani… We show controllable motion for text-to-4D object generation and compositional text-to-4D scenes! Thanks @_akhaliq for sharing!
AK@_akhaliq

TC4D Trajectory-Conditioned Text-to-4D Generation Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent

English
6
17
136
20.2K