David Held

509 posts

David Held

@davheld

Associate Professor at Carnegie Mellon University | he/him

Pittsburgh, PA Katılım Eylül 2011

623 Takip Edilen5.7K Takipçiler

David Held@davheld·1d

Excellent analysis by @Majumdar_Ani, recommended read!

Anirudha Majumdar@Majumdar_Ani

x.com/i/article/2033…

English

3.9K

David Held@davheld·3d

@siddancha What is the consumer business case for this? Do you mean for some kind of robot sports competition? Or as a training partner for athletes?

English

145

Siddharth Ancha@siddancha·5d

OMG it's finally here! Humanoid lawn tennis!!! 🤖+🎾 zzk273.github.io/LATENT/ I have a feeling that robot tennis/table-tennis will be the first successful consumer application of humanoid robots even before they start doing household chores.

C Zhang@ChongZitaZhang

zzk273.github.io/LATENT/ really cool work, humanoid playing tennis against humans

English

39.1K

David Held retweetledi

Tal Daniel@TalDaniel8·6 Mar

🚀 #ICLR2026 Oral 💥 How can we design world models that capture object interactions directly from pixels? Introducing Latent Particle World Models-the first end-to-end self-supervised, object-centric world model, trained from videos, supporting action/img/lang conditioning. 1/n

English

266

22.2K

David Held@davheld·26 Şub

@bowenwen_me Looking forward to it! I would also love to see a comparison to MapAnything or DepthAnything3.

English

146

Bowen Wen@bowenwen_me·26 Şub

Our paper has been accepted to CVPR 2026🎊. Code will be released very soon! Stay tuned at github.com/NVlabs/Fast-Fo…

Bowen Wen@bowenwen_me

A new milestone for real-time accurate 3D spatial computing! Introducing ⚡️Fast-FoundationStereo⚡️, a real-time zero-shot stereo depth estimation model that accelerates the original FoundationStereo by >10x with comparable quality. Details in threads 🧵 (1/N)

English

8.9K

David Held@davheld·26 Şub

@m_wulfmeier Robot vacuuming!

English

164

Markus Wulfmeier@m_wulfmeier·25 Şub

Name #robotics use cases where 90% success rate is enough, go!

English

3.7K

David Held@davheld·21 Şub

@bowenwen_me Any guesses when you will release the code for this method?

English

239

Bowen Wen@bowenwen_me·17 Ara

English

475

76K

David Held@davheld·21 Şub

@vincesitzmann Clarification: you call a point cloud an "intermediate representation" but what about 3D sensors that directly record point clouds? Should we not develop methods that can learn from such data?

English

1.2K

Vincent Sitzmann@vincesitzmann·16 Şub

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…

English

157

365.6K

David Held@davheld·21 Şub

@xwang_lk @jon_barron @TuliMathieu So "natural" just means "recorded using sensors similar to the sensors that humans have"? What about sonar (bats)? And why couldn't some alien species have a lidar sensor built in, making it "natural"?

English

Xin Eric Wang@xwang_lk·19 Şub

@jon_barron @TuliMathieu Agreed

English

284

Xin Eric Wang@xwang_lk·19 Şub

About a year ago, I had a discussion with several 3D researchers around a view I still hold today: 3D representations are human-engineered, non-natural abstractions. Using explicit 3D representations as an intermediate step to enhance spatial reasoning in large models may be a detour. A more direct path is to learn from the most fundamental signal we have — raw video data itself. That said, 3D research is absolutely valuable. It plays a critical role in domains where 3D representation is the end product, such as game development, digital content creation, and 3D printing. The question is less about whether 3D matters, and more about what the ultimate objective is.

Vincent Sitzmann@vincesitzmann

English

106

15.6K

David Held@davheld·7 Oca

@nblqbl Now try this with humans.

English

24.3K

Nabil Iqbal@nblqbl·7 Oca

Here is a game: ask an AI to generate an image. Ask a fresh model to describe it. Use that description to generate a new image. Repeat and see what happens. The results are strange and unsettling. To start: "An apple sits on a table." 1/N

English

160

209.6K

David Held@davheld·18 Ara

@bowenwen_me Great work! How does it compare to MapAnything or Depth Anything 3?

English

300

Bowen Wen@bowenwen_me·17 Ara

Our code, pretrained weights, and pseudo-labeled internet stereo data will be released. Welcome to watch and star our repo! (8/N) Code: github.com/NVlabs/Fast-Fo… Webpage: nvlabs.github.io/Fast-Foundatio… Paper: arxiv.org/abs/2512.11130

English

1.5K

David Held@davheld·14 Kas

@AvivTamar1 What is the max score here? What if I want to say that their ability is much greater than is reflected by their achievements?

English

459

Aviv Tamar@AvivTamar1·14 Kas

@davheld The only Nash equilibrium in this game is to give max score on all questions, no?

English

526

David Held@davheld·14 Kas

I am filling out a fellowship application letter of recommendation, and one question is: "Applicant's achievements reflect his or her ability: * - select -Not at All; Well; Moderately Well; Very Well; Extremely Well". If I select "Well" then what does that imply?

English

3.5K

David Held@davheld·12 Kas

@AvivTamar1 What would a person mean if they said it?

English

686

Aviv Tamar@AvivTamar1·11 Kas

Discussing a research idea with ChatGPT. Response I got began with "Love this line of work—..." What does "love" mean here exactly?? So confusing.

English

3.1K

David Held@davheld·7 Kas

What are the best slides you have seen to explain transformers to someone who has never seen them before?

English

David Held@davheld·31 Eki

@_wenlixiao Where can I find the paper?

English

1.5K

Wenli Xiao@_wenlixiao·31 Eki

What if robots could improve themselves by learning from their own failures in the real-world? Introducing 𝗣𝗟𝗗 (𝗣𝗿𝗼𝗯𝗲, 𝗟𝗲𝗮𝗿𝗻, 𝗗𝗶𝘀𝘁𝗶𝗹𝗹) — a recipe that enables Vision-Language-Action (VLA) models to self-improve for high-precision manipulation tasks. PLD couples real-world residual reinforcement learning with standard supervised fine-tuning — letting robots discover, recover, and distill their own data flywheel. Quick 🧵

English

159

744

182.5K

David Held retweetledi

Alexis Hao@hao_alexis·23 Eyl

Introducing FMVP: a method that adapts to natural arm motions during robot-assisted dressing. Pre-trained on vision in sim, fine-tuned with limited real-world vision+force data, and tested in a 12-user, 264-trial study, FMVP is robust across garments and motions. #CoRL2025

English

4.9K

David Held@davheld·19 Eyl

Check out our new CoRL paper on how to find shapes that cause failures for robot manipulation policies!

Divyam Goel@divyamgo10

How do we discover a robot's failure modes before deploying it in the real world? Standard benchmarks often don't capture the full picture, leaving policies vulnerable to plausible variations in object shape. Thrilled that our work, "Geometric Red-Teaming for Robotic Manipulation," has been accepted as an oral presentation at #CoRL2025! We introduce a framework to automatically find these geometric blindspots. georedteam.github.io 🧵

English

3.4K

David Held@davheld·18 Eyl

Check out our CoRL 2025 paper on learning from past interactions; we show, theoretically and experimentally, the benefit of using a generator-verifier paradigm!

Yishu Li@LisaYishu

A closed door looks the same whether it pushes or pulls. Two identical-looking boxes might have different center of mass. How should robots act when a single visual observation isn't enough? Introducing HAVE 🤖, our method that reasons about past interactions online! #CORL2025

English

2.8K

David Held@davheld·18 Eyl

Check out our new CoRL paper on planning in continuous action spaces by searching over sequences of object transformations!

Kallol Saha@_ksaha

🚨Introducing SPOT: Search over Point Cloud Object Transformations. SPOT is a combined learning-and-planning approach that searches in the space of object transformations. Website: planning-from-point-clouds.github.io Paper: arxiv.org/abs/2509.04645 Code: github.com/kallol-saha/SP…

English

3.7K

David Held@davheld·17 Eyl

@Nik__V__ Great work!!! Where can I find inference speeds?

English

898

Nikhil Keetha@Nik__V__·17 Eyl

Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀 One universal model enables SoTA for: 🔥 Mono Depth Estimation 🔥 Multi-View SfM 🔥 Multi-View Stereo 🔥 Depth Completion 🔥 Registration … and many more possibilities! – plus everything is metric 🎯 We release code for data processing, training, benchmarking & ablations – everything Apache 2.0! Details & Links 👇

English

132

744

120.9K

Keşfet

@Majumdar_Ani @siddancha @bowenwen_me @m_wulfmeier @vincesitzmann @xwang_lk @jon_barron @TuliMathieu