Niladri Dutt (@niladridutt) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Niladri Dutt@niladridutt·25 Mar

🎉 LoST: Level of Semantics Tokenization (CVPR 2026) What if 3D generation didn’t start from coarse geometry, but from semantics? We introduce LoST, a semantic-first 3D tokenizer that orders tokens by semantic salience, so even very short prefixes can already decode into complete, plausible, recognizable 3D shapes 🧩 Early tokens capture the principal semantics. Later tokens refine the details. Result: Generate 3D shapes starting from as few as 1 token! The goal is to have high fidelity shapes even with very few tokens, with more tokens semantic alignment increases. Project: lost3d.github.io Paper: arxiv.org/abs/2603.17995 @CVPR @AdobeResearch @Adobe

English

3

21

116

6.2K

Niladri Dutt@niladridutt·9 Nis

@debsarkar_sayan @GoogleDeepMind @GuillaumeMoing @TengdaHan @valeoai @abursuc Really interesting work 🚀🚀🚀

English

0

2

70

Sayan Deb Sarkar@debsarkar_sayan·7 Nis

✨🗼Paris week recap: 🥐🥖 + talks on CoPE-VideoLM: 🔹 @GoogleDeepMind (hosted by @GuillaumeMoing and @TengdaHan) 🔹 @valeoai (hosted by @abursuc) Loved the discussions, thanks for the invites! 👉 Link to the slides: tinyurl.com/wu6u4m3b

Sayan Deb Sarkar@debsarkar_sayan

🚀 New paper: arxiv.org/abs/2602.13191 VideoLMs are bottlenecked by a simple problem: they treat video like a stack of images. That means huge token costs, slow responses, and missed temporal details. What if we processed video the way codecs do? 🎬 Instead of dense per-frame RGB embeddings, we tokenize motion vectors + residuals and only encode sparse keyframes — turning video redundancy into a powerful inductive bias for efficient temporal reasoning. 🧵👇

English

2

1

19

782

Niladri Dutt@niladridutt·25 Mar

#GenAI #3DGeneration #CVPR #tokenization #3D

QHT

0

276

Niladri Dutt@niladridutt·25 Mar

🎉 LoST: Level of Semantics Tokenization (CVPR 2026) What if 3D generation didn’t start from coarse geometry, but from semantics? We introduce LoST, a semantic-first 3D tokenizer that orders tokens by semantic salience, so even very short prefixes can already decode into complete, plausible, recognizable 3D shapes 🧩 Early tokens capture the principal semantics. Later tokens refine the details. Result: Generate 3D shapes starting from as few as 1 token! The goal is to have high fidelity shapes even with very few tokens, with more tokens semantic alignment increases. Project: lost3d.github.io Paper: arxiv.org/abs/2603.17995 @CVPR @AdobeResearch @Adobe

English

3

21

116

6.2K

Niladri Dutt@niladridutt·23 Mar

The base 3D VAE is still trained on objaverse, this would act as an upper bound. We train the tokenizer using a semantic loss, the AR model is trained using diffusion loss. The semantic loss is not directly compatible with the compared AR methods but it can be used with other diffusion/flow based methods. Our primary aim was to show the tokenizer which offers various token decoding levels.

English

0

25

Alex Nichol@unixpickle·22 Mar

@niladridutt @DanielCohenOr1 But didn't you compare to a bunch of other models that were trained on different datasets with different AR losses etc? How do you know your semantic loss in particular wouldn't help the other methods more than your own?

English

1

0

68

Niladri Dutt retweetledi

Daniel Cohen-Or@DanielCohenOr1·20 Mar

A really refreshing take on 3D generation: instead of building shapes from coarse geometry, build them from semantics. huggingface.co/papers/2603.17…

English

3

27

137

6.8K

Niladri Dutt@niladridutt·22 Mar

Valid questions @unixpickle 1. We chose to generate the dataset to keep things tight with the pretrained 3D VAE. 2. One can quantize and choose a discrete AR model but this formulation is based on this paper-- arxiv.org/abs/2406.11838 3. We pretrain a network using our proposed Relational information Distance Alignment, this helps to get semantic loss without decoding the latents and then rendering. This network is used during training the network. 4. We also evaluate on objaverse data in the supplemental which is consistent. The synthetic dataset is using a different 3D VAE to maintain fairness and we do this to focus on the evaluation on high quality shapes.

English

1

0

61

Alex Nichol@unixpickle·21 Mar

@DanielCohenOr1 I can't help but think that this paper is too complicated and doing too many interacting things at once: 1. Generate their own synthetic dataset 2. Use an unusual continuous AR model 3. Stack many new aux losses together 4. Roll their own eval with more synthetic data

English

1

0

1

256

Niladri Dutt@niladridutt·2 Mar

Excited to share the 3rd Workshop on Visual Concepts at @CVPR 2026! Call for papers is now open till April 1. We welcome submissions on the following topics. See our website for more info: sites.google.com/stanford.edu/v… Join us in Denver! #CVPR2026

English

0

6

24

3.9K

Niladri Dutt retweetledi

Adhiraj Ghosh@adhiraj_ghosh98·21 Şub

🎉Thrilled to share that CABS is accepted to #CVPR2026! See you in Denver! Everything is now open-sourced: Webpage:cabs-vlp.github.io DataConcept:huggingface.co/datasets/bethg… Paper:arxiv.org/abs/2511.20643 Code:github.com/bethgelab/cabs Excited to see what the community builds!

Adhiraj Ghosh@adhiraj_ghosh98

🚨Current data curation results in the creation of static datasets and the use of model-based filters that induce many biases. Can we fix this? We propose ✨CABS✨, a flexible concept-aware online batch curation method that improves CLIP pretraining! arxiv.org/abs/2511.20643 🧵👇

English

0

6

25

6.6K

Niladri Dutt retweetledi

Joy Hsu@joycjhsu·21 Oca

Happy to bring back the 3rd Workshop on Visual Concepts at @CVPR 2026! Call for papers is now open. We welcome submissions on the following topics. See our website for more info: sites.google.com/stanford.edu/v… Join us in Denver!

English

1

16

94

15.5K

Niladri Dutt retweetledi

Remy Sabathier@RemySabathier·22 Oca

Excited to introduce 🎬ActionMesh, a fast model transforming any video → high-quality animated 3D mesh ! Generate animated mesh seamlessly importable into any 3D software in less than a minute. 🤗Try it out: huggingface.co/spaces/faceboo… 🌐Project Page: remysabathier.github.io/actionmesh/ 📄Paper: remysabathier.github.io/actionmesh/act… 💻Code: github.com/facebookresear… #Video4D #GenAI #3DGeneration @Meta @RealityLabs @AIatMeta @ucl

English

7

83

608

42.3K

Niladri Dutt@niladridutt·24 Kas

Project website: motionfields.github.io Paper: arxiv.org/abs/2504.04831 Code: github.com/sanjeevmk/SMF

English

0

58

Niladri Dutt@niladridutt·24 Kas

It even works on multi-legged animals and can handle very long motion without drift

English

1

0

73

Niladri Dutt@niladridutt·24 Kas

Animation brings characters to life 🎬✨ Introducing our #SIGGRAPHAsia '25 (TOG) paper: Self-Supervised Motion Fields (SMF) that can transfer motion from any source (even 2D video) to any stylized character! 🚫 No Rigging 🚫 No Templates (SMPL) ✅ Any source 🧵Thread👇

English

2

1

7

444

Niladri Dutt@niladridutt·19 Ağu

@rushanLimboEra @AdobeResearch Thanks Rushan!

English

0

1

17

gujjubong@rushanLimboEra·19 Ağu

@niladridutt @AdobeResearch Very Cool Niladri. Kudos 🙌🫶

Eesti

1

0

1

25

Niladri Dutt@niladridutt·19 Ağu

Can an LLM learn professional photo retouching by solving puzzles? Our new work, MonetGPT, presented at #SIGGRAPH2025, shows it can. By solving visual puzzles, our MLLM becomes operation-aware and develops image aesthetics. Check out our blog from @AdobeResearch & code below👇 #AI #MLLM

English

2

0

7

305

Niladri Dutt@niladridutt·19 Ağu

@AdobeResearch Check out this thread to learn more x.com/niladridutt/st…

Niladri Dutt@niladridutt

🧵1/10 Excited to share our @siggraph paper "MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills" 🌟 We explore how to make MLLMs operation-aware by solving visual puzzles and propose a procedural framework for image retouching #SIGGRAPH #MLLM

English

0

2

135

Niladri Dutt@niladridutt·19 Ağu

@AdobeResearch ✍️ Blog Post: research.adobe.com/news/adobe-res… 💻 GitHub: github.com/niladridutt/mo… 🌐 Project Website: monetgpt.github.io

English

1

0

1

122

Niladri Dutt@niladridutt·28 May

@RejaullahmdMd 100s of JEE 2024 solutions are already on the web, it's easy for a LLM to memorize this during training (Gemini cut off date is Jan'25). Benchmarks should be done on unseen data / past cut off date of model training.

English

1

0

1

321

Md Rejaullah@RejaullahmdMd·28 May

🤯 They said AI couldn't solve it. google/gemini-2.5-pro-preview-03-25 just scored 322/360 on the fiendishly complex JEE Advanced 2024! This exam is a true test of deep reasoning. (Initial run, 2025 paper test next week for cleaner data!)

English

18

19

263

31.3K

Niladri Dutt@niladridutt·27 May

@siggraph 🧵10/10 Lastly, huge thanks to my co-advisors Niloy and Duygu! For more details check out our paper below- 🌐 Project Website: monetgpt.github.io 📄 Arxiv: arxiv.org/abs/2505.06176

English

0

139

Niladri Dutt@niladridutt·27 May

@siggraph 🧵9/10 We quantitaively evaluate on the Adobe5k dataset as well as conduct user studies by expert and novice users. Our evaluations show that MonetGPT outperforms open-source alternatives and performs comparably to Google Photos AutoEnhance (closed-source).

English

1

0

125

Niladri Dutt@niladridutt·27 May

🧵1/10 Excited to share our @siggraph paper "MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills" 🌟 We explore how to make MLLMs operation-aware by solving visual puzzles and propose a procedural framework for image retouching #SIGGRAPH #MLLM

English

1

2

9

659

Niladri Dutt

Keşfet