New model from Meta, SAM 3D Body, powered by people from Smith Hall (@kkitani,@jinkuncao, David Park, Jyun-Ting Song) of course! #goSmithHall Introducing SAM 3D: a New Standard for 3D Object & Human Reconstruction ... youtu.be/B7PZuM55ayc?si… via @YouTube

YouTube

English

CMU Center for Perceptual Computing and Learning retweetledi

Kris Kitani@kkitani·19 Kas

Super excited to share the release of SAM 3D. It's been a year in the making. Two models for lifting object and people to 3D!

AI at Meta@AIatMeta

Introducing SAM 3D, the newest addition to the SAM collection, bringing common sense 3D understanding of everyday images. SAM 3D includes two models: 🛋️ SAM 3D Objects for object and scene reconstruction 🧑‍🤝‍🧑 SAM 3D Body for human pose and shape estimation Both models achieve state-of-the-art performance transforming static 2D images into vivid, accurate reconstructions. 🔗 Learn more: go.meta.me/305985

English

166

15.6K

CMU Center for Perceptual Computing and Learning@roboVisionCMU·22 Eki

TWO Best Paper Awards at ICCV Generating Physically Stable and Buildable Brick Structures from Text Ava Pun*, Kangle Deng*, Ruixuan Liu*, Deva Ramanan, Changliu Liu, Jun-Yan Zhu Spatially-Varying Autofocus Yingsi Qin, Aswin C. Sankaranarayanan, Matthew O'Toole #goSmithHall

English

696

CMU Center for Perceptual Computing and Learning retweetledi

Yishu Li@LisaYishu·18 Eyl

A closed door looks the same whether it pushes or pulls. Two identical-looking boxes might have different center of mass. How should robots act when a single visual observation isn't enough? Introducing HAVE 🤖, our method that reasons about past interactions online! #CORL2025

English

7.5K

CMU Center for Perceptual Computing and Learning retweetledi

Tarasha Khurana@tarashakhurana·18 Tem

Excited to share recent work with @kaihuac5 and @RamananDeva where we learn to do novel view synthesis for dynamic scenes in a self-supervised manner, only from 2D videos! webpage: cog-nvs.github.io arxiv: arxiv.org/abs/2507.12646 code (soon): github.com/Kaihua-Chen/co…

English

112

23.9K

CMU Center for Perceptual Computing and Learning retweetledi

Unnat Jain@unnatjain2010·11 Haz

✨New edition of our community-building workshop series!✨ Tomorrow at @CVPR, we invite speakers to share their stories, values, and approaches for navigating a crowded and evolving field, especially for early-career researchers. Cheeky title🤭: How to Stand Out in the Crowd🙋? Details & context here: sites.google.com/view/standoutcv

Anand Bhattad@anand_bhattad

In this #CVPR2025 edition of our community-building workshop series, we focus on supporting the growth of early-career researchers. Join us tomorrow (Jun 11) at 12:45 PM in Room 209 Schedule: sites.google.com/view/standoutc… We have an exciting lineup of invited talks and candid panels: @sarameghanbeery, @dimadamen, @jbhuang0604, @lealtaixe, @LerrelPinto, @lschmidt3, @shubhtuls, @gulvarol, @cvondrick, @sainingxie Co-organizing with @unnatjain2010, @ap229997, @georgiagkioxari, @akanazawa, and Lana Lazebnik @CVPR

English

12.5K

CMU Center for Perceptual Computing and Learning@roboVisionCMU·25 Nis

New work on unifying 2D and 3D vision-language models from CMU and Meta!

Ayush Jain@ayushjain1144

1/ Despite having access to rich 3D inputs, embodied agents still rely on 2D VLMs—due to the lack of large-scale 3D data and pre-trained 3D encoders. We introduce UniVLG, a unified 2D-3D VLM that leverages 2D scale to improve 3D scene understanding. univlg.github.io

English

413

CMU Center for Perceptual Computing and Learning retweetledi

Guanya Shi@GuanyaShi·4 Şub

ASAP learns diverse, agile, whole-body humanoid motions via learning a residual action model from the real world to align sim and real physics, enabling motions that were previously difficult to achieve. It has two stages: Stage 1 pretrains a phase-based motion tracking policy to mimic human motions in simulation. Stage 2 rolls out such a policy in real to collect data, learns a residual action model to compensate for the dynamics mismatch, and finally fine-tunes the pretrained policy with the learned residual model. ASAP is fully open-sourced: agile.human2humanoid.com ASAP is not just for sim2real. It provides a general framework to align physics in training and deployment environments. To facilitate smooth transfer between different simulators, we also released HumanoidVerse, a multi-simulator humanoid learning framework: github.com/LeCAR-Lab/Huma… A key design principle of HumanoidVerse is the separation and modularization of simulators, tasks, and algorithms, allowing switching between simulators and tasks with minimal effort. We support training & evaluation in multiple simulators (IsaacGym, IsaacSim, Genesis) by changing only ONE command line: +simulator= Led by @TairanHe99 @WinstonGu_ @_wenlixiao @Yuanhang__Zhang. Collaborations with the @nvidia GEAR lab led by @yukez and @DrJimFan.

Tairan He@TairanHe99

🚀 Can we make a humanoid move like Cristiano Ronaldo, LeBron James and Kobe Byrant? YES! 🤖 Introducing ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills Website: agile.human2humanoid.com Code: github.com/LeCAR-Lab/ASAP

English

132

11.9K

CMU Center for Perceptual Computing and Learning retweetledi

Zhengyi “Zen” Luo@zhengyiluo·4 Şub

Should have recorded our reactions when the first successful siuuu happened! 🎉 Collecting and learning real world data will be incredibly important for humanoids moving forward, and we have just took our first step ASAP🫡

Tairan He@TairanHe99

English

1.3K

CMU Center for Perceptual Computing and Learning retweetledi

Mehul Agarwal@meh_agarwal·6 Şub

🎵✨Excited to share our #NeurIPS2024 paper on personalized music video generation! We combine multimodal AI with identity protection to let listeners be co-creators, generating custom music videos that reflect both music and themselves. 🎥🔒 arxiv.org/abs/2502.02610 #CreativeAI

English

4.2K

CMU Center for Perceptual Computing and Learning retweetledi

Tarasha Khurana@tarashakhurana·9 Ara

Excited to present new work on using diffusion priors for video amodal segmentation and content completion! with @kaihuac5 (lead author) and @RamananDeva arXiv: arxiv.org/abs/2412.04623 project page: diffusion-vas.github.io

GIF

English

10.5K

CMU Center for Perceptual Computing and Learning@roboVisionCMU·27 Kas

Spot-on tips for faculty applicants from RI postdoc @unnatjain2010. Big congrats to him and UC Irvine! 🎉

Unnat Jain@unnatjain2010

Excited to share that I'll be joining University of California at Irvine as a CS faculty in '25!🌟 Faculty apps: @_krishna_murthy, @liuzhuang1234 & I share our tips: unnat.github.io/notes/Hidden_C… PhD apps: I'm looking for students in vision, robot learning, & AI4Science. Details👇

English

1.3K

CMU Center for Perceptual Computing and Learning retweetledi

Unnat Jain@unnatjain2010·26 Kas

English

392

66.9K

CMU Center for Perceptual Computing and Learning retweetledi

Rohan Choudhury@rchoudhury997·15 Kas

Excited to finally release our NeurIPS 2024 (spotlight) paper! We introduce Run-Length Tokenization (RLT), a simple way to significantly speed up your vision transformer on video with no loss in performance!

GIF

English

169

1.4K

155.9K

CMU Center for Perceptual Computing and Learning retweetledi

Murtaza Dalal@mihdalal·30 Eki

Can my robot cook my food, rearrange my dresser, tidy my messy table and do so much more without ANY demos or real-world training data? Introducing ManipGen: A generalist agent for manipulation that can solve long-horizon robotics tasks entirely zero shot, from text input! 1/N

English

115

616

148.7K

CMU Center for Perceptual Computing and Learning retweetledi

Mihir Prabhudesai@mihirp98·19 Ağu

1/ Happy to share VADER: Video Diffusion Alignment via Reward Gradients. We adapt foundational video diffusion models using pre-trained reward models to generate high-quality, aligned videos for various end-applications. Below we generated a short movie using VADER 😀, we used ChatGPT to write a script and an off-the-shelf AI music generator to generate the sound. Our code & weights are open-sourced: vader-vid.github.io

English

134

13.3K

Keşfet

@kkitani @jinkuncao @YouTube @kaihuac5 @RamananDeva @CVPR @TairanHe99 @WinstonGu_