Tommy Mitchel

35 posts

Tommy Mitchel

@twmitchel

Senior Research Scientist @Adobe. Trying teach machines to understand geometry without telling them about geometry. PhD in weird math from Johns Hopkins.

Katılım Ağustos 2022

155 Takip Edilen122 Takipçiler

Tommy Mitchel retweetledi

reactor@reactorworld·6d

For a century, video has been something you watch. World models make it something you inhabit. We're building for that shift. We're hiring: reactor.inc/careers

English

4.3K

Tommy Mitchel retweetledi

Evan Kim@evnkimm·6 Mar

How do you train compute-optimal novel view synthesis models? In our CVPR ‘26 paper Scaling View Synthesis Transformers, we uncover key design choices through scaling and careful ablations--and along the way train a new SoTA with 3x less compute. (1/n)

English

166

33.2K

Tommy Mitchel retweetledi

Zhenjun Zhao@zhenjun_zhao·26 Şub

Scaling View Synthesis Transformers Evan Kim, Hyunwoo Ryu, Thomas W. Mitchel, @vincesitzmann tl;dr: ncoder-decoder+effective batch size->scaling good! arxiv.org/abs/2602.21341

English

5.1K

Tommy Mitchel retweetledi

Vincent Sitzmann@vincesitzmann·19 Şub

Excited that our paper "True Self-Supervised Novel View Synthesis is Transferable" is accepted to ICLR 2026 as an oral! We formulate novel view synthesis without relying on any concepts from multi-view geometry... mitchel.computer/xfactor/

English

153

9.3K

Tommy Mitchel@twmitchel·18 Şub

@nmwsharp @MattNiessner Well said, Nick.

English

Nick Sharp@nmwsharp·18 Şub

@MattNiessner I disagree; the most important role of a paper is not any numerical score it achieves, but its ability to coherently communicate a new idea, so the reader can build something even better atop it. Clear presentation is hugely important. Papers are not high-score leaderboards!

English

372

Matthias Niessner@MattNiessner·16 Şub

Historically, academia used presentation quality as a proxy for scientific merit. Now that AI is eliminating polish overhead, everyone is confused, often stuck in debates whether we should allow LLMs. On the bright side, we are finally forced to evaluate the actual research content rather than extrapolating value from the text and visuals.

English

234

35.7K

Tommy Mitchel@twmitchel·18 Şub

@charles_rqi As @vincesitzmann says in the post, we make some first steps in that direction in our recent work on self-supervised NVS!

English

420

Charles Qi@charles_rqi·18 Şub

The future of computer vision is end-to-end learning. The boundary between vision, robot learning, and control will disappear. Autonomous driving proved this works — when you have massive, diverse imitation data at scale. But most other perception-action domains (housework, factory robotics, computer use) don’t have that data yet — either no deployed hardware, or no high-quality logging. To realize general perception-action AGI, we need to find out the scalable data and training recipe.

Vincent Sitzmann@vincesitzmann

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…

English

191

30.6K

Tommy Mitchel retweetledi

Vincent Sitzmann@vincesitzmann·16 Şub

English

157

366.9K

Tommy Mitchel@twmitchel·22 Oca

@takeru_miyato @Googleorg @wellingmax Late to the party but richly deserved!

English

Takeru Miyato@takeru_miyato·24 Eki

Thrilled to receive the Google PhD Fellowship! Huge thanks to @Googleorg for supporting my research and to my supervisors, Andreas and @wellingmax, and everyone who has supported me along the way!

Google.org@Googleorg

🎉 We're excited to announce the 2025 Google PhD Fellows! @GoogleOrg is providing over $10 million to support 255 PhD students across 35 countries, fostering the next generation of research talent to strengthen the global scientific landscape. Read more: goo.gle/43wJWw8

English

3.6K

Tommy Mitchel retweetledi

Hansen Lillemark@hansenlillemark·15 Oca

State of the art World Models still lack a unified world memory for representing and predicting dynamics out of their field of view. Why is that, and how can we fix it? Introducing Flow Equivariant World Models: models with memory capable of predicting out of view dynamics!🧵⬇️

English

104

756

88.8K

Tommy Mitchel@twmitchel·3 Ara

@chrisoffner3d Just train depth/pose/etc probes in the latent space of a “geometry-free” model. If it’s effective at the 3D task, its internal representations should have some 3D knowledge.

English

Chris Offner@chrisoffner3d·2 Ara

Asking people to put their lives in the hands of entirely non-interpretable tech products is a tall order.

Achyuta Rajaram@AchyutaBot

@vincesitzmann @chrisoffner3d @ducha_aiki @CSProfKGD @jon_barron a take is that the intermediaries, although completely useless from a “performance standpoint”, still have utility Like I trust Waymos more because I can easily verify that they are seeing everything on the road (via the helpful UX) Monitorability is important!

English

1.3K

Tommy Mitchel retweetledi

Kosta Derpanis (sabbatical in Munich 🇩🇪)@CSProfKGD·1 Ara

Coming soon to a geometric problem near you @vincesitzmann

Kosta Derpanis (sabbatical in Munich 🇩🇪) tweet media

English

132

44.6K

Tommy Mitchel retweetledi

Vincent Sitzmann@vincesitzmann·4 Kas

Introducing XFactor: the first pose- and geometry-free method capable of true Novel View Synthesis (NVS). We re-think NVS and the concept of camera poses completely without concepts from multi-view geometry as a pure representation learning problem! mitchel.computer/xfactor/ (1/n)

English

151

8.8K

Tommy Mitchel retweetledi

Vincent Sitzmann@vincesitzmann·4 Kas

This is work with our amazing collaborator @twmitchel and my student @RyuHyunwoooo @MIT_CSAIL! 📄 Paper: arxiv.org/abs/2510.13063 💻 Code: github.com/vsitzmann/xfac… 🎥 More demos: mitchel.computer/xfactor/ (11/n)

English

646

Tommy Mitchel retweetledi

Vincent Sitzmann@vincesitzmann·11 Ara

Meet us and chat with us about symmetry discovery at today's afternoon poster session at NeurIPS, East Exhibit Hall A-C #2110, where we will be presenting Neural Isometries! @twmitchel

Vincent Sitzmann@vincesitzmann

Introducing Neural Isometries where we show how to exploit equivariant ML even for transformations that are “nasty”, e.g. non-compact, projective, nonlinear, or not even a group action! arxiv.org/abs/2405.19296 Collab w/ the amazing Tommy Mitchel @twmitchel and Mike Taylor! 1/n

English

3.9K

Tommy Mitchel retweetledi

Vincent Sitzmann@vincesitzmann·5 Kas

Really happy to see this study! Always wanted to do something like this myself, if only to support calming words to grad students: current-gen generative models have nothing to do with intelligence, and AI research remains fascinating and unsolved!

Bingyi Kang@bingyikang

Curious whether video generation models (like #SORA) qualify as world models? We conduct a systematic study to answer this question by investigating whether a video gen model is able to learn physical laws. Three are three key messages to take home: 1⃣The model generalises perfectly for in-distribution data, but fails to do out-of-distribution generalization. For combinatorial scenarios, scaling law is observed. 2⃣The models fail to abstract general rules and instead tries to mimic the closest training example. 3⃣The model prioritizes different attributes when referencing training data: color > size > velocity > shape. This work is a joint effort with our outstanding intern @YangYue_THU. Paper: arxiv.org/abs/2411.02385 Webpage: phyworld.github.io

English

102

9.8K

Tommy Mitchel retweetledi

Noam Aigerman@AigermanNoam·22 Eki

I'll be recruiting PhD and MSc students through Mila - consider applying if you want to work at the intersection of machine learning and 3D geometry!

Mila - Institut québécois d'IA@Mila_Quebec

Mila's annual supervision request process opens on October 15 to receive MSc and PhD applications for Fall 2025 admission! Join our community! More information here mila.quebec/en/prospective…

English

102

18K

Tommy Mitchel retweetledi

Vincent Sitzmann@vincesitzmann·18 Eki

A thread and video by MIT CSAIL about our Diffusion Forcing paper!

MIT CSAIL@MIT_CSAIL

Sequence models have skyrocketed in popularity for their ability to analyze data & predict what to do next. MIT’s "Diffusion Forcing" method combines the strengths of next-token prediction (like w/ChatGPT) & video diffusion (like w/Sora), training neural networks to handle corrupted data while predicting the next steps. This flexible, reliable sequence model helps produce higher-quality artificial videos and guides more precise decision-making for robots & AI agents: bit.ly/3BK2wWC

English

5.8K

Tommy Mitchel@twmitchel·29 Eyl

@simo_foti I will be very curious to see what you find! In any case, looking forward to chatting at NeurIPS and congratulations again on a neat paper! 🙂

English

Simone Foti@simo_foti·29 Eyl

@twmitchel Interesting to know! However I am not entirely sure the blotchy artifacts come from heat diffusion per se, I'd be more inclined to think they come from its non convergence, the spatial gradients, or potentially from the mass-vector approximation. We will further investigate this.

English

Simone Foti@simo_foti·27 Eyl

🚨 It's confirmed, "UV-free Texture Generation with Denoising and Geodesic Heat Diffusions" just landed at #NeurIPS2024! No more UV map struggles—just point cloud textures & heat diffusion magic. 🔥 Curious? Keep reading. Oh, and definitely turn up the audio 🎧👇

English

317

25.1K

Tommy Mitchel@twmitchel·29 Eyl

@simo_foti We tried a several vector (Laplacian) extensions of DN with field latents and got similar “paintbrush” artifacts. Unfortunately, I think high-frequency outputs are the critical limitation for DN-based approaches though would be happy to be wrong about this.

English

Simone Foti@simo_foti·29 Eyl

@twmitchel Thanks! We performed a promising experiment with CelebA, which provides higher frequency content than ShapeNet and ABO (note that in these results UV3-TeD has not fully converged!). We believe our diffusion can handle higher frequencies, but efficiency improvements are needed.

English

206

Tommy Mitchel@twmitchel·29 Eyl

@simo_foti Did you have any success/thoughts in this regard?

English

144

Keşfet

@vincesitzmann @nmwsharp @MattNiessner @charles_rqi @takeru_miyato @Googleorg @wellingmax @chrisoffner3d