Deva Ramanan

21 posts

Deva Ramanan

Deva Ramanan

@RamananDeva

Professor at Carnegie Mellon University

Katılım Mart 2023
3 Takip Edilen496 Takipçiler
Deva Ramanan retweetledi
Zihan Wang
Zihan Wang@Z1hanW·
CRISP is accepted at ICLR 2026!!! @iclr_conf Excited to see more impact of building simulation-ready assets from monocular video on animation / robotics code is ready (github.com/Z1hanW/CRISP-R…) with the cleaned-up videos, including several parkour videos clipped from YouTube.
Zihan Wang@Z1hanW

Introduce CRISP, a real-to-sim pipeline that recovers human motion and simulatable scene geometry from monocular video! CRISP builds contact-faithful 3D scene for simulation - 8× fewer sim failures, +43% faster sim, and improves human motion! Interactive demos👉: crisp-real2sim.github.io/CRISP-Real2Sim/ Exciting collaboration w/ @JiashunWang @jefftan969 @_Tsukasane @ Jessica Hodgins @shubhtuls @RamananDeva

English
2
13
55
6.1K
Deva Ramanan retweetledi
Nikhil Keetha
Nikhil Keetha@Nik__V__·
MapAnything V1.1 Release is live! 🚨 ✅ Improved Checkpoints ✅ Model factory to test & train many models ✅ Profiling ✅ New COLMAP demos & voxelization tooling ✅ WAI format Benchmarking Data Time to update the CVPR subs 😉 Comparisons to DA3 (1.1), Pi3X & more info in 🧵👇
English
9
91
738
42.8K
Deva Ramanan retweetledi
Chancharik Mitra
Chancharik Mitra@chancharikm·
🎉Despite massive pretraining, VLAs need to adapt to specific physical contexts. We introduce Robotic Steering, a novel finetuning method using mechanistic interpretability to surpass standard FT: 🎁 22× fewer parameters 🎁 +53% on unseen tasks 🎁 Interpretable Thread below👇
English
10
48
273
34.7K
Deva Ramanan retweetledi
Jay Karhade
Jay Karhade@JayKarhade·
Introducing Any4D, a unified transformer for fully feed-forward, dense, metric-scale 4D reconstruction from flexible inputs! Any4D regresses per-pixel motion + geometry across frames in one pass — 15× faster, 2–3× more accurate reconstructions ⚡📈 Details + code below 👇 Exciting collab with @Nik__V__ @YuchenZhan54250 Tanisha Gupta @akashshrm02 @smash0190 @RamananDeva
English
6
44
206
47.1K
Deva Ramanan retweetledi
Tarasha Khurana
Tarasha Khurana@tarashakhurana·
CogNVS was accepted to @NeurIPSConf 2025! 🎉We are releasing the code today for you all to try: 🆕Code: github.com/Kaihua-Chen/co… Paper: arxiv.org/pdf/2507.12646 With CogNVS, we reformulate dynamic novel-view synthesis as a structured inpainting task: (1) we reconstruct input views with off-the-shelf SLAM systems, (2) create self-supervised training pairs for learning to inpaint, and (3) test-time finetune to the input at inference. with @kaihuac5 and @RamananDeva
Tarasha Khurana@tarashakhurana

Excited to share recent work with @kaihuac5 and @RamananDeva where we learn to do novel view synthesis for dynamic scenes in a self-supervised manner, only from 2D videos! webpage: cog-nvs.github.io arxiv: arxiv.org/abs/2507.12646 code (soon): github.com/Kaihua-Chen/co…

English
0
11
72
10.3K
Deva Ramanan retweetledi
Zhiqiu Lin
Zhiqiu Lin@ZhiqiuLin·
🎉CameraBench has been accepted as a Spotlight (3%) @ NeurIPS 2025. Huge congrats to all collaborators at CMU, MIT-IBM, UMass, Harvard, and Adobe. CameraBench is a large-scale effort that pushes video-language models to reason about the language of camera motion just like professional cinematographers. 🌍 Our open-source dataset, models, and code are also gaining strong interest and adoption from frontier labs such as DeepMind and Kling to advance video generation research. 📄Paper: arxiv.org/abs/2504.15376 🌐 Website: linzhiqiu.github.io/papers/camerab…
Zhiqiu Lin@ZhiqiuLin

📷 Can AI understand camera motion like a cinematographer? Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video – films, games, drone shots, vlogs, etc. Links below! We contribute a taxonomy of motion primitives, co-designed over months with professional cinematographers, and apply rigorous quality control to label and caption all aspects of camera motion. CameraBench shows that even the best SfMs and VLMs struggle with real-world, dynamic videos. Yet, a generative VLM post-trained on our high-quality data matches SOTA SfM (MegaSAM) in geometric understanding and outperforms SOTA VLMs (Gemini-2.5 / GPT-4o) in semantic understanding, e.g., describing how the camera moves. 📄 Paper: huggingface.co/papers/2504.15… 🌐 Website: linzhiqiu.github.io/papers/camerab… Work led by CMU, MIT-IBM, UMass, Adobe, Harvard, Emerson with @censiyuan1, @chancharikm, @JayKarhade, @du_yilun, @gan_chuang, and @RamananDeva.

English
7
23
160
25.8K
Deva Ramanan retweetledi
Nikhil Keetha
Nikhil Keetha@Nik__V__·
Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀 One universal model enables SoTA for: 🔥 Mono Depth Estimation 🔥 Multi-View SfM 🔥 Multi-View Stereo 🔥 Depth Completion 🔥 Registration … and many more possibilities! – plus everything is metric 🎯 We release code for data processing, training, benchmarking & ablations – everything Apache 2.0! Details & Links 👇
English
30
129
744
120.9K
Deva Ramanan retweetledi
Zhiqiu Lin
Zhiqiu Lin@ZhiqiuLin·
📷 Can AI understand camera motion like a cinematographer? Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video – films, games, drone shots, vlogs, etc. Links below! We contribute a taxonomy of motion primitives, co-designed over months with professional cinematographers, and apply rigorous quality control to label and caption all aspects of camera motion. CameraBench shows that even the best SfMs and VLMs struggle with real-world, dynamic videos. Yet, a generative VLM post-trained on our high-quality data matches SOTA SfM (MegaSAM) in geometric understanding and outperforms SOTA VLMs (Gemini-2.5 / GPT-4o) in semantic understanding, e.g., describing how the camera moves. 📄 Paper: huggingface.co/papers/2504.15… 🌐 Website: linzhiqiu.github.io/papers/camerab… Work led by CMU, MIT-IBM, UMass, Adobe, Harvard, Emerson with @censiyuan1, @chancharikm, @JayKarhade, @du_yilun, @gan_chuang, and @RamananDeva.
GIF
English
10
43
217
55.1K
Deva Ramanan retweetledi
Zhiqiu Lin
Zhiqiu Lin@ZhiqiuLin·
Fresh GPT‑o3 results on our vision‑centric #NaturalBench (NeurIPS’24) benchmark! 🎯 Its new visual chain‑of‑thought—by “zooming in” on details—cracks questions that still stump GPT‑4o. Yet vision reasoning isn’t solved: o3 can still hallucinate even after a full minute of analysis 🤯
Zhiqiu Lin tweet mediaZhiqiu Lin tweet media
Zhiqiu Lin@ZhiqiuLin

🚀 Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with pairs of simple questions about natural imagery. 🌍📸 Here’s what we found after testing 53 models (GPT-4o, Llama3.2, Qwen2VL, Molmo, and more): 1️⃣ All models struggle: They perform only 10-20% above chance, while human accuracy exceeds 90%! This shows that models still struggle with natural images and simple questions that humans answer easily—what we call natural adversarial samples. 2️⃣ Models appear strong in previous benchmarks like MME/ScienceQA by exploiting their strong language bias. However, even a blind ChatGPT (without vision) can outperform vision models on these benchmarks. 3️⃣ Debiasing is crucial: Most models prefer "Yes" far more than "No"; correcting this bias can nearly double performance, even for GPT-4o. 📄 Paper: huggingface.co/papers/2410.14… 🌐 Arxiv: arxiv.org/abs/2410.14669 Work led by CMU & UW with @Jeande_d, @baiqil0203, @zixianma02, @simi_97k, and co-advised by @RanjayKrishna, @gneubig, and @RamananDeva.

English
3
21
109
21.4K
Deva Ramanan retweetledi
Khiem Vuong
Khiem Vuong@kvuongdev·
[1/6] Recent models like DUSt3R generalize well across viewpoints, but performance drops on aerial-ground pairs. At #CVPR2025, we propose AerialMegaDepth (aerial-megadepth.github.io), a hybrid dataset combining mesh renderings with real ground images (MegaDepth) to bridge this gap.
English
7
103
552
55.6K
Deva Ramanan retweetledi
Chancharik Mitra
Chancharik Mitra@chancharikm·
🎯 Introducing Sparse Attention Vectors (SAVs): A breakthrough method for extracting powerful multimodal features from Large Multimodal Models (LMMs). SAVs enable SOTA performance on discriminative vision-language tasks (classification, safety alignment, etc.)! Links in replies! 🔎Using just ~20 attention heads & only few-shot examples, SAVs: - Outperform both LoRA and few-shot baselines - Work with image, text, & interleaved inputs - Extract features without finetuning - ready to go at test time! This project was a cross-collaborative effort between researchers from UC Berkeley, Carnegie Mellon University, and MIT-IBM Research (@berkeley_ai, @CMU_Robotics, @MITIBMLab). Many thanks to all of the collaborators and co-authors on this work: Brandon Huang, Tianning (Ray) Chai, @ZhiqiuLin, @ArbelleAssaf, @RogerioFeris, @leokarlin, @trevordarrell, @RamananDeva, @roeiherzig
English
5
37
146
26.4K
Deva Ramanan retweetledi
Zhiqiu Lin
Zhiqiu Lin@ZhiqiuLin·
🚀 Make Vision Matter in Visual-Question-Answering (VQA)! Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with pairs of simple questions about natural imagery. 🌍📸 Here’s what we found after testing 53 models (GPT-4o, Llama3.2, Qwen2VL, Molmo, and more): 1️⃣ All models struggle: They perform only 10-20% above chance, while human accuracy exceeds 90%! This shows that models still struggle with natural images and simple questions that humans answer easily—what we call natural adversarial samples. 2️⃣ Models appear strong in previous benchmarks like MME/ScienceQA by exploiting their strong language bias. However, even a blind ChatGPT (without vision) can outperform vision models on these benchmarks. 3️⃣ Debiasing is crucial: Most models prefer "Yes" far more than "No"; correcting this bias can nearly double performance, even for GPT-4o. 📄 Paper: huggingface.co/papers/2410.14… 🌐 Arxiv: arxiv.org/abs/2410.14669 Work led by CMU & UW with @Jeande_d, @baiqil0203, @zixianma02, @simi_97k, and co-advised by @RanjayKrishna, @gneubig, and @RamananDeva.
Zhiqiu Lin tweet media
English
1
48
223
67.3K
Deva Ramanan retweetledi
Anirudh Chakravarthy
Anirudh Chakravarthy@anirudhchak·
Lidar Panoptic Segmentation (LPS) is crucial for the safe deployment of autonomous vehicles, but it fails to consider realistic open-world environments. Our IJCV paper introduces LPS in the Open World (LiPSOW) to discover novel classes from the open-world! (1/5)
GIF
English
1
5
23
1.9K
Deva Ramanan retweetledi
Kangle Deng
Kangle Deng@kangle_deng·
[1/2] 📢 I'll present "FlashTex: Fast Relightable Mesh Texturing with LightControlNet" at #ECCV2024’s Oral Session tomorrow! 🎉 Join me to explore how our generated textures can be properly relit in various lighting environments. ⚡ 📅 Oral: Tue, Oct 1st, 2 PM 📍 Poster: #159
English
1
8
35
11.3K
Deva Ramanan retweetledi
Zhiqiu Lin
Zhiqiu Lin@ZhiqiuLin·
Sharing exciting news from Milan 🇮🇹: VQAScore (ECCV’24) was highlighted as the strongest text-to-image metric in DeepMind’s Imagen3 tech report! Imagen3 also used our GenAI-Bench (CVPR’24 SynData Best Short Paper) to evaluate compositional text-to-image generation. Catch our poster this Thursday at ECCV! #ECCV2024 Link q VQAScore: linzhiqiu.github.io/papers/vqascor… Link to GenAI-Bench: linzhiqiu.github.io/papers/genai_b… Imagen3 tech report: storage.googleapis.com/deepmind-media…
Zhiqiu Lin tweet mediaZhiqiu Lin tweet mediaZhiqiu Lin tweet mediaZhiqiu Lin tweet media
English
5
15
73
10.1K
Deva Ramanan retweetledi
Tarasha Khurana
Tarasha Khurana@tarashakhurana·
Our new work explores generating future observations (blue) given the past (gray), by leveraging large-scale pretraining of image diffusion models for video prediction, and conditioning on timestamps & invariant data modalities. w/ @RamananDeva page: cs.cmu.edu/~tkhurana/dept…
English
0
7
37
6.5K