
Tommy Mitchel
35 posts

Tommy Mitchel
@twmitchel
Senior Research Scientist @Adobe. Trying teach machines to understand geometry without telling them about geometry. PhD in weird math from Johns Hopkins.











In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…


🎉 We're excited to announce the 2025 Google PhD Fellows! @GoogleOrg is providing over $10 million to support 255 PhD students across 35 countries, fostering the next generation of research talent to strengthen the global scientific landscape. Read more: goo.gle/43wJWw8


@vincesitzmann @chrisoffner3d @ducha_aiki @CSProfKGD @jon_barron a take is that the intermediaries, although completely useless from a “performance standpoint”, still have utility Like I trust Waymos more because I can easily verify that they are seeing everything on the road (via the helpful UX) Monitorability is important!




Introducing Neural Isometries where we show how to exploit equivariant ML even for transformations that are “nasty”, e.g. non-compact, projective, nonlinear, or not even a group action! arxiv.org/abs/2405.19296 Collab w/ the amazing Tommy Mitchel @twmitchel and Mike Taylor! 1/n

Curious whether video generation models (like #SORA) qualify as world models? We conduct a systematic study to answer this question by investigating whether a video gen model is able to learn physical laws. Three are three key messages to take home: 1⃣The model generalises perfectly for in-distribution data, but fails to do out-of-distribution generalization. For combinatorial scenarios, scaling law is observed. 2⃣The models fail to abstract general rules and instead tries to mimic the closest training example. 3⃣The model prioritizes different attributes when referencing training data: color > size > velocity > shape. This work is a joint effort with our outstanding intern @YangYue_THU. Paper: arxiv.org/abs/2411.02385 Webpage: phyworld.github.io

Mila's annual supervision request process opens on October 15 to receive MSc and PhD applications for Fall 2025 admission! Join our community! More information here mila.quebec/en/prospective…

Sequence models have skyrocketed in popularity for their ability to analyze data & predict what to do next. MIT’s "Diffusion Forcing" method combines the strengths of next-token prediction (like w/ChatGPT) & video diffusion (like w/Sora), training neural networks to handle corrupted data while predicting the next steps. This flexible, reliable sequence model helps produce higher-quality artificial videos and guides more precise decision-making for robots & AI agents: bit.ly/3BK2wWC









