Tommy Mitchel

35 posts

Tommy Mitchel banner
Tommy Mitchel

Tommy Mitchel

@twmitchel

Senior Research Scientist @Adobe. Trying teach machines to understand geometry without telling them about geometry. PhD in weird math from Johns Hopkins.

Katılım Ağustos 2022
155 Takip Edilen122 Takipçiler
Tommy Mitchel retweetledi
reactor
reactor@reactorworld·
For a century, video has been something you watch. World models make it something you inhabit. We're building for that shift. We're hiring: reactor.inc/careers
English
18
6
79
4.3K
Tommy Mitchel retweetledi
Evan Kim
Evan Kim@evnkimm·
How do you train compute-optimal novel view synthesis models? In our CVPR ‘26 paper Scaling View Synthesis Transformers, we uncover key design choices through scaling and careful ablations--and along the way train a new SoTA with 3x less compute. (1/n)
Evan Kim tweet media
English
13
19
166
33.2K
Tommy Mitchel retweetledi
Zhenjun Zhao
Zhenjun Zhao@zhenjun_zhao·
Scaling View Synthesis Transformers Evan Kim, Hyunwoo Ryu, Thomas W. Mitchel, @vincesitzmann tl;dr: ncoder-decoder+effective batch size->scaling good! arxiv.org/abs/2602.21341
Zhenjun Zhao tweet mediaZhenjun Zhao tweet mediaZhenjun Zhao tweet mediaZhenjun Zhao tweet media
English
0
15
84
5.1K
Tommy Mitchel retweetledi
Vincent Sitzmann
Vincent Sitzmann@vincesitzmann·
Excited that our paper "True Self-Supervised Novel View Synthesis is Transferable" is accepted to ICLR 2026 as an oral! We formulate novel view synthesis without relying on any concepts from multi-view geometry... mitchel.computer/xfactor/
English
2
11
153
9.3K
Nick Sharp
Nick Sharp@nmwsharp·
@MattNiessner I disagree; the most important role of a paper is not any numerical score it achieves, but its ability to coherently communicate a new idea, so the reader can build something even better atop it. Clear presentation is hugely important. Papers are not high-score leaderboards!
English
2
0
6
372
Matthias Niessner
Matthias Niessner@MattNiessner·
Historically, academia used presentation quality as a proxy for scientific merit. Now that AI is eliminating polish overhead, everyone is confused, often stuck in debates whether we should allow LLMs. On the bright side, we are finally forced to evaluate the actual research content rather than extrapolating value from the text and visuals.
English
22
19
234
35.7K
Charles Qi
Charles Qi@charles_rqi·
The future of computer vision is end-to-end learning. The boundary between vision, robot learning, and control will disappear. Autonomous driving proved this works — when you have massive, diverse imitation data at scale. But most other perception-action domains (housework, factory robotics, computer use) don’t have that data yet — either no deployed hardware, or no high-quality logging. To realize general perception-action AGI, we need to find out the scalable data and training recipe.
Vincent Sitzmann@vincesitzmann

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…

English
10
15
191
30.6K
Tommy Mitchel retweetledi
Vincent Sitzmann
Vincent Sitzmann@vincesitzmann·
In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…
English
43
157
1K
366.9K
Takeru Miyato
Takeru Miyato@takeru_miyato·
Thrilled to receive the Google PhD Fellowship! Huge thanks to @Googleorg for supporting my research and to my supervisors, Andreas and @wellingmax, and everyone who has supported me along the way!
Google.org@Googleorg

🎉 We're excited to announce the 2025 Google PhD Fellows! @GoogleOrg is providing over $10 million to support 255 PhD students across 35 countries, fostering the next generation of research talent to strengthen the global scientific landscape. Read more: goo.gle/43wJWw8

English
2
0
34
3.6K
Tommy Mitchel retweetledi
Hansen Lillemark
Hansen Lillemark@hansenlillemark·
State of the art World Models still lack a unified world memory for representing and predicting dynamics out of their field of view. Why is that, and how can we fix it? Introducing Flow Equivariant World Models: models with memory capable of predicting out of view dynamics!🧵⬇️
English
17
104
756
88.8K
Tommy Mitchel
Tommy Mitchel@twmitchel·
@chrisoffner3d Just train depth/pose/etc probes in the latent space of a “geometry-free” model. If it’s effective at the 3D task, its internal representations should have some 3D knowledge.
English
0
0
0
78
Chris Offner
Chris Offner@chrisoffner3d·
Asking people to put their lives in the hands of entirely non-interpretable tech products is a tall order.
Achyuta Rajaram@AchyutaBot

@vincesitzmann @chrisoffner3d @ducha_aiki @CSProfKGD @jon_barron a take is that the intermediaries, although completely useless from a “performance standpoint”, still have utility Like I trust Waymos more because I can easily verify that they are seeing everything on the road (via the helpful UX) Monitorability is important!

English
1
0
6
1.3K
Tommy Mitchel retweetledi
Vincent Sitzmann
Vincent Sitzmann@vincesitzmann·
Introducing XFactor: the first pose- and geometry-free method capable of true Novel View Synthesis (NVS). We re-think NVS and the concept of camera poses completely without concepts from multi-view geometry as a pure representation learning problem! mitchel.computer/xfactor/ (1/n)
English
2
23
151
8.8K
Tommy Mitchel retweetledi
Vincent Sitzmann
Vincent Sitzmann@vincesitzmann·
Meet us and chat with us about symmetry discovery at today's afternoon poster session at NeurIPS, East Exhibit Hall A-C #2110, where we will be presenting Neural Isometries! @twmitchel
Vincent Sitzmann@vincesitzmann

Introducing Neural Isometries where we show how to exploit equivariant ML even for transformations that are “nasty”, e.g. non-compact, projective, nonlinear, or not even a group action! arxiv.org/abs/2405.19296 Collab w/ the amazing Tommy Mitchel @twmitchel and Mike Taylor! 1/n

English
0
2
26
3.9K
Tommy Mitchel retweetledi
Vincent Sitzmann
Vincent Sitzmann@vincesitzmann·
Really happy to see this study! Always wanted to do something like this myself, if only to support calming words to grad students: current-gen generative models have nothing to do with intelligence, and AI research remains fascinating and unsolved!
Bingyi Kang@bingyikang

Curious whether video generation models (like #SORA) qualify as world models? We conduct a systematic study to answer this question by investigating whether a video gen model is able to learn physical laws. Three are three key messages to take home: 1⃣The model generalises perfectly for in-distribution data, but fails to do out-of-distribution generalization. For combinatorial scenarios, scaling law is observed. 2⃣The models fail to abstract general rules and instead tries to mimic the closest training example. 3⃣The model prioritizes different attributes when referencing training data: color > size > velocity > shape. This work is a joint effort with our outstanding intern @YangYue_THU. Paper: arxiv.org/abs/2411.02385 Webpage: phyworld.github.io

English
3
8
102
9.8K
Tommy Mitchel retweetledi
Tommy Mitchel
Tommy Mitchel@twmitchel·
@simo_foti I will be very curious to see what you find! In any case, looking forward to chatting at NeurIPS and congratulations again on a neat paper! 🙂
English
1
0
1
61
Simone Foti
Simone Foti@simo_foti·
@twmitchel Interesting to know! However I am not entirely sure the blotchy artifacts come from heat diffusion per se, I'd be more inclined to think they come from its non convergence, the spatial gradients, or potentially from the mass-vector approximation. We will further investigate this.
English
1
0
1
32
Simone Foti
Simone Foti@simo_foti·
🚨 It's confirmed, "UV-free Texture Generation with Denoising and Geodesic Heat Diffusions" just landed at #NeurIPS2024! No more UV map struggles—just point cloud textures & heat diffusion magic. 🔥 Curious? Keep reading. Oh, and definitely turn up the audio 🎧👇
English
10
47
317
25.1K
Tommy Mitchel
Tommy Mitchel@twmitchel·
@simo_foti We tried a several vector (Laplacian) extensions of DN with field latents and got similar “paintbrush” artifacts. Unfortunately, I think high-frequency outputs are the critical limitation for DN-based approaches though would be happy to be wrong about this.
English
1
0
0
45
Simone Foti
Simone Foti@simo_foti·
@twmitchel Thanks! We performed a promising experiment with CelebA, which provides higher frequency content than ShapeNet and ABO (note that in these results UV3-TeD has not fully converged!). We believe our diffusion can handle higher frequencies, but efficiency improvements are needed.
Simone Foti tweet media
English
1
0
0
206