Niladri Dutt

161 posts

Niladri Dutt banner
Niladri Dutt

Niladri Dutt

@niladridutt

ML PhD @ucl | Ex @adobe @nvidia | @ELLISforEurope | Interested in 3D generative modelling

London, UK Katılım Temmuz 2012
460 Takip Edilen328 Takipçiler
Sabitlenmiş Tweet
Niladri Dutt
Niladri Dutt@niladridutt·
🎉 LoST: Level of Semantics Tokenization (CVPR 2026) What if 3D generation didn’t start from coarse geometry, but from semantics? We introduce LoST, a semantic-first 3D tokenizer that orders tokens by semantic salience, so even very short prefixes can already decode into complete, plausible, recognizable 3D shapes 🧩 Early tokens capture the principal semantics. Later tokens refine the details. Result: Generate 3D shapes starting from as few as 1 token! The goal is to have high fidelity shapes even with very few tokens, with more tokens semantic alignment increases. Project: lost3d.github.io Paper: arxiv.org/abs/2603.17995 @CVPR @AdobeResearch @Adobe
Niladri Dutt tweet media
English
3
21
116
6.2K
Sayan Deb Sarkar
Sayan Deb Sarkar@debsarkar_sayan·
✨🗼Paris week recap: 🥐🥖 + talks on CoPE-VideoLM: 🔹 @GoogleDeepMind (hosted by @GuillaumeMoing and @TengdaHan) 🔹 @valeoai (hosted by @abursuc) Loved the discussions, thanks for the invites! 👉 Link to the slides: tinyurl.com/wu6u4m3b
Sayan Deb Sarkar@debsarkar_sayan

🚀 New paper: arxiv.org/abs/2602.13191 VideoLMs are bottlenecked by a simple problem: they treat video like a stack of images. That means huge token costs, slow responses, and missed temporal details. What if we processed video the way codecs do? 🎬 Instead of dense per-frame RGB embeddings, we tokenize motion vectors + residuals and only encode sparse keyframes — turning video redundancy into a powerful inductive bias for efficient temporal reasoning. 🧵👇

English
2
1
19
782
Niladri Dutt
Niladri Dutt@niladridutt·
🎉 LoST: Level of Semantics Tokenization (CVPR 2026) What if 3D generation didn’t start from coarse geometry, but from semantics? We introduce LoST, a semantic-first 3D tokenizer that orders tokens by semantic salience, so even very short prefixes can already decode into complete, plausible, recognizable 3D shapes 🧩 Early tokens capture the principal semantics. Later tokens refine the details. Result: Generate 3D shapes starting from as few as 1 token! The goal is to have high fidelity shapes even with very few tokens, with more tokens semantic alignment increases. Project: lost3d.github.io Paper: arxiv.org/abs/2603.17995 @CVPR @AdobeResearch @Adobe
Niladri Dutt tweet media
English
3
21
116
6.2K
Niladri Dutt
Niladri Dutt@niladridutt·
The base 3D VAE is still trained on objaverse, this would act as an upper bound. We train the tokenizer using a semantic loss, the AR model is trained using diffusion loss. The semantic loss is not directly compatible with the compared AR methods but it can be used with other diffusion/flow based methods. Our primary aim was to show the tokenizer which offers various token decoding levels.
English
0
0
0
25
Alex Nichol
Alex Nichol@unixpickle·
@niladridutt @DanielCohenOr1 But didn't you compare to a bunch of other models that were trained on different datasets with different AR losses etc? How do you know your semantic loss in particular wouldn't help the other methods more than your own?
English
1
0
0
68
Niladri Dutt retweetledi
Daniel Cohen-Or
Daniel Cohen-Or@DanielCohenOr1·
A really refreshing take on 3D generation: instead of building shapes from coarse geometry, build them from semantics. huggingface.co/papers/2603.17…
Daniel Cohen-Or tweet media
English
3
27
137
6.8K
Niladri Dutt
Niladri Dutt@niladridutt·
Valid questions @unixpickle 1. We chose to generate the dataset to keep things tight with the pretrained 3D VAE. 2. One can quantize and choose a discrete AR model but this formulation is based on this paper-- arxiv.org/abs/2406.11838 3. We pretrain a network using our proposed Relational information Distance Alignment, this helps to get semantic loss without decoding the latents and then rendering. This network is used during training the network. 4. We also evaluate on objaverse data in the supplemental which is consistent. The synthetic dataset is using a different 3D VAE to maintain fairness and we do this to focus on the evaluation on high quality shapes.
English
1
0
0
61
Alex Nichol
Alex Nichol@unixpickle·
@DanielCohenOr1 I can't help but think that this paper is too complicated and doing too many interacting things at once: 1. Generate their own synthetic dataset 2. Use an unusual continuous AR model 3. Stack many new aux losses together 4. Roll their own eval with more synthetic data
English
1
0
1
256
Niladri Dutt
Niladri Dutt@niladridutt·
Excited to share the 3rd Workshop on Visual Concepts at @CVPR 2026! Call for papers is now open till April 1. We welcome submissions on the following topics. See our website for more info: sites.google.com/stanford.edu/v… Join us in Denver! #CVPR2026
Niladri Dutt tweet media
English
0
6
24
3.9K
Niladri Dutt retweetledi
Niladri Dutt retweetledi
Joy Hsu
Joy Hsu@joycjhsu·
Happy to bring back the 3rd Workshop on Visual Concepts at @CVPR 2026! Call for papers is now open. We welcome submissions on the following topics. See our website for more info: sites.google.com/stanford.edu/v… Join us in Denver!
Joy Hsu tweet media
English
1
16
94
15.5K
Niladri Dutt retweetledi
Remy Sabathier
Remy Sabathier@RemySabathier·
Excited to introduce 🎬ActionMesh, a fast model transforming any video → high-quality animated 3D mesh ! Generate animated mesh seamlessly importable into any 3D software in less than a minute. 🤗Try it out: huggingface.co/spaces/faceboo… 🌐Project Page: remysabathier.github.io/actionmesh/ 📄Paper: remysabathier.github.io/actionmesh/act… 💻Code: github.com/facebookresear… #Video4D #GenAI #3DGeneration @Meta @RealityLabs @AIatMeta @ucl
English
7
83
608
42.3K
Niladri Dutt
Niladri Dutt@niladridutt·
It even works on multi-legged animals and can handle very long motion without drift
English
1
0
0
73
Niladri Dutt
Niladri Dutt@niladridutt·
Animation brings characters to life 🎬✨ Introducing our #SIGGRAPHAsia '25 (TOG) paper: Self-Supervised Motion Fields (SMF) that can transfer motion from any source (even 2D video) to any stylized character! 🚫 No Rigging 🚫 No Templates (SMPL) ✅ Any source 🧵Thread👇
English
2
1
7
444
Niladri Dutt
Niladri Dutt@niladridutt·
Can an LLM learn professional photo retouching by solving puzzles? Our new work, MonetGPT, presented at #SIGGRAPH2025, shows it can. By solving visual puzzles, our MLLM becomes operation-aware and develops image aesthetics. Check out our blog from @AdobeResearch & code below👇 #AI #MLLM
English
2
0
7
305
Niladri Dutt
Niladri Dutt@niladridutt·
@RejaullahmdMd 100s of JEE 2024 solutions are already on the web, it's easy for a LLM to memorize this during training (Gemini cut off date is Jan'25). Benchmarks should be done on unseen data / past cut off date of model training.
English
1
0
1
321
Md Rejaullah
Md Rejaullah@RejaullahmdMd·
🤯 They said AI couldn't solve it. google/gemini-2.5-pro-preview-03-25 just scored 322/360 on the fiendishly complex JEE Advanced 2024! This exam is a true test of deep reasoning. (Initial run, 2025 paper test next week for cleaner data!)
Md Rejaullah tweet media
English
18
19
263
31.3K
Niladri Dutt
Niladri Dutt@niladridutt·
@siggraph 🧵9/10 We quantitaively evaluate on the Adobe5k dataset as well as conduct user studies by expert and novice users. Our evaluations show that MonetGPT outperforms open-source alternatives and performs comparably to Google Photos AutoEnhance (closed-source).
Niladri Dutt tweet media
English
1
0
0
125
Niladri Dutt
Niladri Dutt@niladridutt·
🧵1/10 Excited to share our @siggraph paper "MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills" 🌟 We explore how to make MLLMs operation-aware by solving visual puzzles and propose a procedural framework for image retouching #SIGGRAPH #MLLM
Niladri Dutt tweet media
English
1
2
9
659