Deva Ramanan (@RamananDeva) - Twitter Profili | Zamantika Mersobahis Locabet

Deva Ramanan retweetledi

CRISP is accepted at ICLR 2026!!! @iclr_conf Excited to see more impact of building simulation-ready assets from monocular video on animation / robotics code is ready (github.com/Z1hanW/CRISP-R…) with the cleaned-up videos, including several parkour videos clipped from YouTube.

Zihan Wang@Z1hanW

Introduce CRISP, a real-to-sim pipeline that recovers human motion and simulatable scene geometry from monocular video! CRISP builds contact-faithful 3D scene for simulation - 8× fewer sim failures, +43% faster sim, and improves human motion! Interactive demos👉: crisp-real2sim.github.io/CRISP-Real2Sim/ Exciting collaboration w/ @JiashunWang @jefftan969 @_Tsukasane @ Jessica Hodgins @shubhtuls @RamananDeva

English

2

13

55

6.1K

Deva Ramanan retweetledi

Nikhil Keetha@Nik__V__·28 Oca

MapAnything V1.1 Release is live! 🚨 ✅ Improved Checkpoints ✅ Model factory to test & train many models ✅ Profiling ✅ New COLMAP demos & voxelization tooling ✅ WAI format Benchmarking Data Time to update the CVPR subs 😉 Comparisons to DA3 (1.1), Pi3X & more info in 🧵👇

English

9

91

738

42.8K

Deva Ramanan retweetledi

CMU Robotics Institute@CMU_Robotics·21 Eki

#ICCV2025 best paper award & best paper honorable mention! RI researchers collaborated with @CSDatCMU and @CMU_ECE to bring some incredible work to the conference this year👏🧠🔥 Check out the SCS news post on BrickGPT, which brought home best paper! bit.ly/4hqz7lc

#ICCV2025@ICCVConference

#ICCV2025 Best paper awards!!

English

0

8

57

14.1K

Deva Ramanan retweetledi

Chancharik Mitra@chancharikm·26 Ara

🎉Despite massive pretraining, VLAs need to adapt to specific physical contexts. We introduce Robotic Steering, a novel finetuning method using mechanistic interpretability to surpass standard FT: 🎁 22× fewer parameters 🎁 +53% on unseen tasks 🎁 Interpretable Thread below👇

English

10

48

273

34.7K

Deva Ramanan retweetledi

Jay Karhade@JayKarhade·12 Ara

Introducing Any4D, a unified transformer for fully feed-forward, dense, metric-scale 4D reconstruction from flexible inputs! Any4D regresses per-pixel motion + geometry across frames in one pass — 15× faster, 2–3× more accurate reconstructions ⚡📈 Details + code below 👇 Exciting collab with @Nik__V__ @YuchenZhan54250 Tanisha Gupta @akashshrm02 @smash0190 @RamananDeva

English

6

44

206

47.1K

Deva Ramanan retweetledi

Zihan Wang@Z1hanW·18 Ara

Introduce CRISP, a real-to-sim pipeline that recovers human motion and simulatable scene geometry from monocular video! CRISP builds contact-faithful 3D scene for simulation - 8× fewer sim failures, +43% faster sim, and improves human motion! Interactive demos👉: crisp-real2sim.github.io/CRISP-Real2Sim/ Exciting collaboration w/ @JiashunWang @jefftan969 @_Tsukasane @ Jessica Hodgins @shubhtuls @RamananDeva

English

6

62

341

43.3K

Deva Ramanan retweetledi

Kangle Deng@kangle_deng·22 Eki

@AvaLovelace0 @RuixuanLiu_ @RamananDeva @ChangliuL @junyanz89 If you miss the talk or want to dive deeper, please also check out our poster and our interview! Poster Session Details: - Location: Exhibit Hall I #306 - Time: Wed 22 Oct 2:45 p.m. HST — 4:45 p.m Read the interview in ICCV Daily: rsipvision.com/ICCV2025-Wedne…

English

1

2

19

2.4K

Deva Ramanan retweetledi

Kangle Deng@kangle_deng·22 Eki

🏆 Excited to share that BrickGPT (avalovelace1.github.io/BrickGPT/) received the ICCV Best Paper Award! Our first author, @AvaLovelace0, will present it from 1:30 to 1:45 p.m. today in Exhibit Hall III. Huge thanks to all the co-authors @RuixuanLiu_ @RamananDeva @ChangliuL @junyanz89

English

8

20

184

25.1K

Deva Ramanan retweetledi

Tarasha Khurana@tarashakhurana·7 Eki

CogNVS was accepted to @NeurIPSConf 2025! 🎉We are releasing the code today for you all to try: 🆕Code: github.com/Kaihua-Chen/co… Paper: arxiv.org/pdf/2507.12646 With CogNVS, we reformulate dynamic novel-view synthesis as a structured inpainting task: (1) we reconstruct input views with off-the-shelf SLAM systems, (2) create self-supervised training pairs for learning to inpaint, and (3) test-time finetune to the input at inference. with @kaihuac5 and @RamananDeva

Tarasha Khurana@tarashakhurana

Excited to share recent work with @kaihuac5 and @RamananDeva where we learn to do novel view synthesis for dynamic scenes in a self-supervised manner, only from 2D videos! webpage: cog-nvs.github.io arxiv: arxiv.org/abs/2507.12646 code (soon): github.com/Kaihua-Chen/co…

English

0

11

72

10.3K

Deva Ramanan retweetledi

Zhiqiu Lin@ZhiqiuLin·19 Eyl

🎉CameraBench has been accepted as a Spotlight (3%) @ NeurIPS 2025. Huge congrats to all collaborators at CMU, MIT-IBM, UMass, Harvard, and Adobe. CameraBench is a large-scale effort that pushes video-language models to reason about the language of camera motion just like professional cinematographers. 🌍 Our open-source dataset, models, and code are also gaining strong interest and adoption from frontier labs such as DeepMind and Kling to advance video generation research. 📄Paper: arxiv.org/abs/2504.15376 🌐 Website: linzhiqiu.github.io/papers/camerab…

Zhiqiu Lin@ZhiqiuLin

📷 Can AI understand camera motion like a cinematographer? Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video – films, games, drone shots, vlogs, etc. Links below! We contribute a taxonomy of motion primitives, co-designed over months with professional cinematographers, and apply rigorous quality control to label and caption all aspects of camera motion. CameraBench shows that even the best SfMs and VLMs struggle with real-world, dynamic videos. Yet, a generative VLM post-trained on our high-quality data matches SOTA SfM (MegaSAM) in geometric understanding and outperforms SOTA VLMs (Gemini-2.5 / GPT-4o) in semantic understanding, e.g., describing how the camera moves. 📄 Paper: huggingface.co/papers/2504.15… 🌐 Website: linzhiqiu.github.io/papers/camerab… Work led by CMU, MIT-IBM, UMass, Adobe, Harvard, Emerson with @censiyuan1, @chancharikm, @JayKarhade, @du_yilun, @gan_chuang, and @RamananDeva.

English

7

23

160

25.8K

Deva Ramanan retweetledi

Nikhil Keetha@Nik__V__·17 Eyl

Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀 One universal model enables SoTA for: 🔥 Mono Depth Estimation 🔥 Multi-View SfM 🔥 Multi-View Stereo 🔥 Depth Completion 🔥 Registration … and many more possibilities! – plus everything is metric 🎯 We release code for data processing, training, benchmarking & ablations – everything Apache 2.0! Details & Links 👇

English

30

129

744

120.9K

Deva Ramanan retweetledi

Zhiqiu Lin@ZhiqiuLin·28 Nis

📷 Can AI understand camera motion like a cinematographer? Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video – films, games, drone shots, vlogs, etc. Links below! We contribute a taxonomy of motion primitives, co-designed over months with professional cinematographers, and apply rigorous quality control to label and caption all aspects of camera motion. CameraBench shows that even the best SfMs and VLMs struggle with real-world, dynamic videos. Yet, a generative VLM post-trained on our high-quality data matches SOTA SfM (MegaSAM) in geometric understanding and outperforms SOTA VLMs (Gemini-2.5 / GPT-4o) in semantic understanding, e.g., describing how the camera moves. 📄 Paper: huggingface.co/papers/2504.15… 🌐 Website: linzhiqiu.github.io/papers/camerab… Work led by CMU, MIT-IBM, UMass, Adobe, Harvard, Emerson with @censiyuan1, @chancharikm, @JayKarhade, @du_yilun, @gan_chuang, and @RamananDeva.

GIF

English

10

43

217

55.1K

Deva Ramanan retweetledi

Zhiqiu Lin@ZhiqiuLin·21 Nis

Fresh GPT‑o3 results on our vision‑centric #NaturalBench (NeurIPS’24) benchmark! 🎯 Its new visual chain‑of‑thought—by “zooming in” on details—cracks questions that still stump GPT‑4o. Yet vision reasoning isn’t solved: o3 can still hallucinate even after a full minute of analysis 🤯

Zhiqiu Lin@ZhiqiuLin

🚀 Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with pairs of simple questions about natural imagery. 🌍📸 Here’s what we found after testing 53 models (GPT-4o, Llama3.2, Qwen2VL, Molmo, and more): 1️⃣ All models struggle: They perform only 10-20% above chance, while human accuracy exceeds 90%! This shows that models still struggle with natural images and simple questions that humans answer easily—what we call natural adversarial samples. 2️⃣ Models appear strong in previous benchmarks like MME/ScienceQA by exploiting their strong language bias. However, even a blind ChatGPT (without vision) can outperform vision models on these benchmarks. 3️⃣ Debiasing is crucial: Most models prefer "Yes" far more than "No"; correcting this bias can nearly double performance, even for GPT-4o. 📄 Paper: huggingface.co/papers/2410.14… 🌐 Arxiv: arxiv.org/abs/2410.14669 Work led by CMU & UW with @Jeande_d, @baiqil0203, @zixianma02, @simi_97k, and co-advised by @RanjayKrishna, @gneubig, and @RamananDeva.

English

3

21

109

21.4K

Deva Ramanan retweetledi

Khiem Vuong@kvuongdev·18 Nis

[1/6] Recent models like DUSt3R generalize well across viewpoints, but performance drops on aerial-ground pairs. At #CVPR2025, we propose AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap.

English

7

103

552

55.6K

Deva Ramanan retweetledi

Chancharik Mitra@chancharikm·13 Oca

🎯 Introducing Sparse Attention Vectors (SAVs): A breakthrough method for extracting powerful multimodal features from Large Multimodal Models (LMMs). SAVs enable SOTA performance on discriminative vision-language tasks (classification, safety alignment, etc.)! Links in replies! 🔎Using just ~20 attention heads & only few-shot examples, SAVs: - Outperform both LoRA and few-shot baselines - Work with image, text, & interleaved inputs - Extract features without finetuning - ready to go at test time! This project was a cross-collaborative effort between researchers from UC Berkeley, Carnegie Mellon University, and MIT-IBM Research (@berkeley_ai, @CMU_Robotics, @MITIBMLab). Many thanks to all of the collaborators and co-authors on this work: Brandon Huang, Tianning (Ray) Chai, @ZhiqiuLin, @ArbelleAssaf, @RogerioFeris, @leokarlin, @trevordarrell, @RamananDeva, @roeiherzig

English

5

37

146

26.4K

Deva Ramanan retweetledi

Zhiqiu Lin@ZhiqiuLin·21 Eki

🚀 Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with pairs of simple questions about natural imagery. 🌍📸 Here’s what we found after testing 53 models (GPT-4o, Llama3.2, Qwen2VL, Molmo, and more): 1️⃣ All models struggle: They perform only 10-20% above chance, while human accuracy exceeds 90%! This shows that models still struggle with natural images and simple questions that humans answer easily—what we call natural adversarial samples. 2️⃣ Models appear strong in previous benchmarks like MME/ScienceQA by exploiting their strong language bias. However, even a blind ChatGPT (without vision) can outperform vision models on these benchmarks. 3️⃣ Debiasing is crucial: Most models prefer "Yes" far more than "No"; correcting this bias can nearly double performance, even for GPT-4o. 📄 Paper: huggingface.co/papers/2410.14… 🌐 Arxiv: arxiv.org/abs/2410.14669 Work led by CMU & UW with @Jeande_d, @baiqil0203, @zixianma02, @simi_97k, and co-advised by @RanjayKrishna, @gneubig, and @RamananDeva.

English

1

48

223

67.3K

Deva Ramanan retweetledi

Anirudh Chakravarthy@anirudhchak·28 Eyl

Lidar Panoptic Segmentation (LPS) is crucial for the safe deployment of autonomous vehicles, but it fails to consider realistic open-world environments. Our IJCV paper introduces LPS in the Open World (LiPSOW) to discover novel classes from the open-world! (1/5)

GIF

English

1

5

23

1.9K

Deva Ramanan retweetledi

Kangle Deng@kangle_deng·30 Eyl

[1/2] 📢 I'll present "FlashTex: Fast Relightable Mesh Texturing with LightControlNet" at #ECCV2024’s Oral Session tomorrow! 🎉 Join me to explore how our generated textures can be properly relit in various lighting environments. ⚡ 📅 Oral: Tue, Oct 1st, 2 PM 📍 Poster: #159

English

1

8

35

11.3K

Deva Ramanan retweetledi

Zhiqiu Lin@ZhiqiuLin·1 Eki

Sharing exciting news from Milan 🇮🇹: VQAScore (ECCV’24) was highlighted as the strongest text-to-image metric in DeepMind’s Imagen3 tech report! Imagen3 also used our GenAI-Bench (CVPR’24 SynData Best Short Paper) to evaluate compositional text-to-image generation. Catch our poster this Thursday at ECCV! #ECCV2024 Link q VQAScore: linzhiqiu.github.io/papers/vqascor… Link to GenAI-Bench: linzhiqiu.github.io/papers/genai_b… Imagen3 tech report: storage.googleapis.com/deepmind-media…

English

5

15

73

10.1K

Deva Ramanan retweetledi

Tarasha Khurana@tarashakhurana·19 Nis

Our new work explores generating future observations (blue) given the past (gray), by leveraging large-scale pretraining of image diffusion models for video prediction, and conditioning on timestamps & invariant data modalities. w/ @RamananDeva page: cs.cmu.edu/~tkhurana/dept…

English

0

7

37

6.5K

Deva Ramanan

Keşfet