David Held

509 posts

David Held

David Held

@davheld

Associate Professor at Carnegie Mellon University | he/him

Pittsburgh, PA Katılım Eylül 2011
623 Takip Edilen5.7K Takipçiler
David Held
David Held@davheld·
@siddancha What is the consumer business case for this? Do you mean for some kind of robot sports competition? Or as a training partner for athletes?
English
1
0
2
145
David Held retweetledi
Tal Daniel
Tal Daniel@TalDaniel8·
🚀 #ICLR2026 Oral 💥 How can we design world models that capture object interactions directly from pixels? Introducing Latent Particle World Models-the first end-to-end self-supervised, object-centric world model, trained from videos, supporting action/img/lang conditioning. 1/n
English
4
43
266
22.2K
David Held
David Held@davheld·
@bowenwen_me Looking forward to it! I would also love to see a comparison to MapAnything or DepthAnything3.
English
0
0
1
146
David Held
David Held@davheld·
@bowenwen_me Any guesses when you will release the code for this method?
English
1
0
0
239
Bowen Wen
Bowen Wen@bowenwen_me·
A new milestone for real-time accurate 3D spatial computing! Introducing ⚡️Fast-FoundationStereo⚡️, a real-time zero-shot stereo depth estimation model that accelerates the original FoundationStereo by >10x with comparable quality. Details in threads 🧵 (1/N)
English
16
72
475
76K
David Held
David Held@davheld·
@vincesitzmann Clarification: you call a point cloud an "intermediate representation" but what about 3D sensors that directly record point clouds? Should we not develop methods that can learn from such data?
English
1
0
5
1.2K
Vincent Sitzmann
Vincent Sitzmann@vincesitzmann·
In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…
English
43
157
1K
365.6K
David Held
David Held@davheld·
@xwang_lk @jon_barron @TuliMathieu So "natural" just means "recorded using sensors similar to the sensors that humans have"? What about sonar (bats)? And why couldn't some alien species have a lidar sensor built in, making it "natural"?
English
0
0
0
40
Xin Eric Wang
Xin Eric Wang@xwang_lk·
About a year ago, I had a discussion with several 3D researchers around a view I still hold today: 3D representations are human-engineered, non-natural abstractions. Using explicit 3D representations as an intermediate step to enhance spatial reasoning in large models may be a detour. A more direct path is to learn from the most fundamental signal we have — raw video data itself. That said, 3D research is absolutely valuable. It plays a critical role in domains where 3D representation is the end product, such as game development, digital content creation, and 3D printing. The question is less about whether 3D matters, and more about what the ultimate objective is.
Vincent Sitzmann@vincesitzmann

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…

English
3
11
106
15.6K
Nabil Iqbal
Nabil Iqbal@nblqbl·
Here is a game: ask an AI to generate an image. Ask a fresh model to describe it. Use that description to generate a new image. Repeat and see what happens. The results are strange and unsettling. To start: "An apple sits on a table." 1/N
Nabil Iqbal tweet media
English
31
15
160
209.6K
David Held
David Held@davheld·
@bowenwen_me Great work! How does it compare to MapAnything or Depth Anything 3?
English
1
0
0
300
David Held
David Held@davheld·
@AvivTamar1 What is the max score here? What if I want to say that their ability is much greater than is reflected by their achievements?
English
1
0
2
459
Aviv Tamar
Aviv Tamar@AvivTamar1·
@davheld The only Nash equilibrium in this game is to give max score on all questions, no?
English
1
0
1
526
David Held
David Held@davheld·
I am filling out a fellowship application letter of recommendation, and one question is: "Applicant's achievements reflect his or her ability: * - select -Not at All; Well; Moderately Well; Very Well; Extremely Well". If I select "Well" then what does that imply?
English
3
0
19
3.5K
Aviv Tamar
Aviv Tamar@AvivTamar1·
Discussing a research idea with ChatGPT. Response I got began with "Love this line of work—..." What does "love" mean here exactly?? So confusing.
English
3
1
12
3.1K
David Held
David Held@davheld·
What are the best slides you have seen to explain transformers to someone who has never seen them before?
English
2
0
8
2K
Wenli Xiao
Wenli Xiao@_wenlixiao·
What if robots could improve themselves by learning from their own failures in the real-world? Introducing 𝗣𝗟𝗗 (𝗣𝗿𝗼𝗯𝗲, 𝗟𝗲𝗮𝗿𝗻, 𝗗𝗶𝘀𝘁𝗶𝗹𝗹) — a recipe that enables Vision-Language-Action (VLA) models to self-improve for high-precision manipulation tasks. PLD couples real-world residual reinforcement learning with standard supervised fine-tuning — letting robots discover, recover, and distill their own data flywheel. Quick 🧵
English
26
159
744
182.5K
David Held retweetledi
Alexis Hao
Alexis Hao@hao_alexis·
Introducing FMVP: a method that adapts to natural arm motions during robot-assisted dressing. Pre-trained on vision in sim, fine-tuned with limited real-world vision+force data, and tested in a 12-user, 264-trial study, FMVP is robust across garments and motions. #CoRL2025
English
1
16
36
4.9K
David Held
David Held@davheld·
@Nik__V__ Great work!!! Where can I find inference speeds?
English
2
1
1
898
Nikhil Keetha
Nikhil Keetha@Nik__V__·
Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀 One universal model enables SoTA for: 🔥 Mono Depth Estimation 🔥 Multi-View SfM 🔥 Multi-View Stereo 🔥 Depth Completion 🔥 Registration … and many more possibilities! – plus everything is metric 🎯 We release code for data processing, training, benchmarking & ablations – everything Apache 2.0! Details & Links 👇
English
30
132
744
120.9K