Alex Trevithick

136 posts

Alex Trevithick

@alextrevith

Research Scientist @NVIDIAAI. PhD @UCSanDiego. 4D Vision, Machine Learning, Generative Models.

Katılım Ekim 2020

340 Takip Edilen576 Takipçiler

Sabitlenmiş Tweet

Alex Trevithick@alextrevith·11 Ara

🚀 Introducing SimVS: our new method that simplifies 3D capture! 🎯 3D reconstruction assumes consistency—no dynamics or lighting changes—but reality constantly breaks this assumption. ✨ SimVS takes a set of inconsistent images and makes them consistent with a chosen frame.

English

258

51.5K

Alex Trevithick retweetledi

World Labs@theworldlabs·16 Eki

Introducing RTFM (Real-Time Frame Model): a highly efficient World Model that generates video frames in real time as you interact with it, powered by a single H100 GPU. RTFM renders persistent and 3D consistent worlds, both real and imaginary. Try our demo of RTFM today!

English

225

1.3K

338.6K

Alex Trevithick@alextrevith·2 Eki

Come check out our workshop at the intersection of generative models and 3D reconstruction at #ICCV2025! 🏝️

Ethan Weber@ethanjohnweber

📢 SceneComp @ ICCV 2025 🏝️ 🌎 Generative Scene Completion for Immersive Worlds 🛠️ Reconstruct what you know AND 🪄 Generate what you don’t! 🙌 Meet our speakers @angelaqdai, @holynski_, @jampani_varun, @ZGojcic @taiyasaki, Peter Kontschieder scenecomp.github.io #ICCV2025

English

874

Alex Trevithick retweetledi

Sherwin Bahmani@sherwinbahmani·25 Eyl

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: research.nvidia.com/labs/toronto-a… Code and Models: github.com/nv-tlabs/lyra Paper: arxiv.org/abs/2509.19296

English

258

65.2K

Alex Trevithick retweetledi

Nithin Raghavan@nithin_raghavan·14 Ağu

If you’re at SIGGRAPH 2025 in Vancouver, join us Thu 2 PM for our talk “Generative Neural Materials”! We introduce a universal neural material model for bidirectional texture functions and a complementary generative pipeline. 1/2

English

1.5K

Alex Trevithick retweetledi

Jack Langerman@jacklangerman·15 Haz

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models @ChrisWu6080 @RuiqiGao @poolio @alextrevith ChangxiZheng @jon_barron @holynski_

English

155

9.1K

Alex Trevithick@alextrevith·14 Haz

Project page: alextrevithick.com/simvs/

English

351

Alex Trevithick@alextrevith·14 Haz

🎥 What if 3D capture could gracefully handle moving scenes and varying illumination? 🎯Come see how video models generate exactly the data you need at our poster, SimVS! 📍CVPR, June 14th (afternoon), Poster #60.

English

9.1K

Alex Trevithick retweetledi

Aleksander Holynski@holynski_·14 Haz

Poster #60 this afternoon, swing by!

Alex Trevithick@alextrevith

English

1.6K

Alex Trevithick retweetledi

Bosung Kim@bosungkim17·23 May

Interactive looong-context reasoning still has a long way to go. We need progress across all axes: more data, bigger model, and smarter architectures. ∞-THOR is just beginning: generate ∞-len trajectories, run agents online train with feedback and more! Let’s push the limits🚀

Prithviraj (Raj) Ammanabrolu@rajammanabrolu

"Foundation" models for embodied agents are all the rage but how to actually do complex looong context reasoning? Can we scale Beyond Needle(s) in the (Embodied) Haystack? ∞-THOR is an infinite len sim framework + guide on (new) architectures/training methods for VLA models

English

2.7K

Alex Trevithick retweetledi

Hanwen Jiang@hanwenjiang1·2 May

Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D labels: ❌ No supervision of camera & geometry ✅ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy) 🌐 Project: hwjiang1510.github.io/RayZer/ (1/4)

English

442

68.7K

Alex Trevithick@alextrevith·29 Nis

@yongyuanxi @jon_barron Someone should do this on Minecraft--seems super doable to make a multiview video model with lots of players.

English

Towaki Takikawa / 瀧川永遠希@yongyuanxi·29 Nis

Would be super fun to try- a massive transformer (derived from pre-trained LLMs) that models compressed "3D" representations which then get fed into a view-dependent decoder model (but small enough to run on a "local" 8-GPU machine or something), trained on multiplayer game footage (N-streams of videos)

English

262

Jon Barron@jon_barron·28 Nis

Here's my 3DV talk, in chapters: 1) Intro / NeRF boilerplate. 2) Recent reconstruction work. 3) Recent generative work. 4) Radiance fields as a field. 5) Why generative video has bitter-lessoned 3D. 6) Why generative video hasn't bitter-lessoned 3D. 5 & 6 are my favorites.

English

102

789

114K

Alex Trevithick@alextrevith·29 Nis

@yongyuanxi @jon_barron 1. Agreed. Real-world videos obey physics; generative models should therefore learn this property of the data with scaling compute and data. 2. It's like taking the "neural" in neural rendering to the limit. I wonder if a prohibitively large decoder may be necessary.

English

193

Towaki Takikawa / 瀧川永遠希@yongyuanxi·29 Nis

I actually do wonder if even the speed argument is temporary given: 1. we _could_ concoct up an algorithm / method for efficiently enabling much longer context windows for video generation, resulting in temporally stable worlds 2. local GPUs might not get 1000x smaller, but you _could_ have massive cloud GPUs "render" a shared world state (in some abstract latent space) to get massive amortization across users (assuming its something like Gorilla Tag with shared state across N users)- and then have local hardware solve an easier task of decoding the latent frames and forward projecting into the future for some number of frames to account for latency (which already happen in much more local time scales: developer.nvidia.com/blog/nvidia-re…). the "multi-view" flavors of diffusion models for 3D reconstruction kind of looks like this- you _could_ do something similar like a N-view diffusion model trained on footage of N players playing in a shared world (which I'm not sure has been done yet but sounds like a fun idea to try. you could probably argue this is 'sort of 3D' but not 3D in the sense that the explicit 3D representation exists) (of course both 1,2 are very hard tasks, but possibly something not bounded by physical limitations per se)

English

1.4K

Alex Trevithick@alextrevith·23 Nis

After finishing ICCV reviews this year...

English

5.5K

Alex Trevithick@alextrevith·2 Nis

What's the difference between the oai and google image generators? Giving both of them the same image and prompt "generate this image" Gemini is essentially the identity function whereas oai changes content. Does this indicate continuous encoder for Gemini vs. VQVAE for oai?

English

490

Alex Trevithick retweetledi

Xingyu Chen@RoverXingyu·1 Nis

🦣Easi3R: 4D Reconstruction Without Training! Limited 4D datasets? Take it easy. #Easi3R adapts #DUSt3R for 4D reconstruction by disentangling and repurposing its attention maps → make 4D reconstruction easier than ever! 🔗Page: easi3r.github.io

English

168

39.3K

Alex Trevithick retweetledi

Stan Szymanowicz@StanSzymanowicz·19 Mar

⚡️ Introducing Bolt3D ⚡️ Bolt3D generates interactive 3D scenes in less than 7 seconds on a single GPU from one or more images. It features a latent diffusion model that *directly* generates 3D Gaussians of seen and unseen regions, without any test time optimization. 🧵👇 (1/9)

English

537

125.6K

Alex Trevithick retweetledi

Xintao Wang@xinntao·17 Mar

Thanks @_akhaliq for sharing our ReCamMaster! ReCamMaster can re-capture existing videos with novel camera trajectories. Project page: jianhongbai.github.io/ReCamMaster/ Paper: huggingface.co/papers/2503.11…

AK@_akhaliq

ReCamMaster Camera-Controlled Generative Rendering from A Single Video

English

118

25.4K

Alex Trevithick retweetledi

Jianyuan@jianyuan_wang·17 Mar

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense Point Cloud Reconstruction ✅ Point Tracking Project Page: vgg-t.github.io Code & Weights: github.com/facebookresear…

English

192

1.3K

202.6K

Alex Trevithick retweetledi

Jiaming Song@baaadas·11 Mar

As one of the people who popularized the field of diffusion models, I am excited to share something that might be the “beginning of the end” of it. IMM has a single stable training stage, a single objective, and a single network — all are what make diffusion so popular today.

Luma@LumaLabsAI

Today, we release Inductive Moment Matching (IMM): a new pre-training paradigm breaking the algorithmic ceiling of diffusion models. Higher sample quality. 10x more efficient. Single-stage, single network, stable training. Read more: lumalabs.ai/news/imm

English

106

913

154.8K

Alex Trevithick retweetledi

Jon Barron@jon_barron·18 Şub

I just pushed a new paper to arXiv. I realized that a lot of my previous work on robust losses and nerf-y things was dancing around something simpler: a slight tweak to the classic Box-Cox power transform that makes it much more useful and stable. It's this f(x, λ) here:

English

259

2.2K

235.9K

Keşfet

@ChrisWu6080 @RuiqiGao @poolio @jon_barron @holynski_ @yongyuanxi @_akhaliq @elonmusk