Alex Trevithick

136 posts

Alex Trevithick banner
Alex Trevithick

Alex Trevithick

@alextrevith

Research Scientist @NVIDIAAI. PhD @UCSanDiego. 4D Vision, Machine Learning, Generative Models.

Katılım Ekim 2020
340 Takip Edilen576 Takipçiler
Sabitlenmiş Tweet
Alex Trevithick
Alex Trevithick@alextrevith·
🚀 Introducing SimVS: our new method that simplifies 3D capture! 🎯 3D reconstruction assumes consistency—no dynamics or lighting changes—but reality constantly breaks this assumption. ✨ SimVS takes a set of inconsistent images and makes them consistent with a chosen frame.
English
10
37
258
51.5K
Alex Trevithick retweetledi
World Labs
World Labs@theworldlabs·
Introducing RTFM (Real-Time Frame Model): a highly efficient World Model that generates video frames in real time as you interact with it, powered by a single H100 GPU. RTFM renders persistent and 3D consistent worlds, both real and imaginary. Try our demo of RTFM today!
English
52
225
1.3K
338.6K
Alex Trevithick retweetledi
Sherwin Bahmani
Sherwin Bahmani@sherwinbahmani·
📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: research.nvidia.com/labs/toronto-a… Code and Models: github.com/nv-tlabs/lyra Paper: arxiv.org/abs/2509.19296
English
20
65
258
65.2K
Alex Trevithick retweetledi
Nithin Raghavan
Nithin Raghavan@nithin_raghavan·
If you’re at SIGGRAPH 2025 in Vancouver, join us Thu 2 PM for our talk “Generative Neural Materials”! We introduce a universal neural material model for bidirectional texture functions and a complementary generative pipeline. 1/2
English
1
5
19
1.5K
Alex Trevithick
Alex Trevithick@alextrevith·
🎥 What if 3D capture could gracefully handle moving scenes and varying illumination? 🎯Come see how video models generate exactly the data you need at our poster, SimVS! 📍CVPR, June 14th (afternoon), Poster #60.
English
1
13
81
9.1K
Alex Trevithick retweetledi
Bosung Kim
Bosung Kim@bosungkim17·
Interactive looong-context reasoning still has a long way to go. We need progress across all axes: more data, bigger model, and smarter architectures. ∞-THOR is just beginning: generate ∞-len trajectories, run agents online train with feedback and more! Let’s push the limits🚀
Prithviraj (Raj) Ammanabrolu@rajammanabrolu

"Foundation" models for embodied agents are all the rage but how to actually do complex looong context reasoning? Can we scale Beyond Needle(s) in the (Embodied) Haystack? ∞-THOR is an infinite len sim framework + guide on (new) architectures/training methods for VLA models

English
0
6
19
2.7K
Alex Trevithick retweetledi
Hanwen Jiang
Hanwen Jiang@hanwenjiang1·
Supervised learning has held 3D Vision back for too long. Meet RayZer — a self-supervised 3D model trained with zero 3D labels: ❌ No supervision of camera & geometry ✅ Just RGB images And the wild part? RayZer outperforms supervised methods (as 3D labels from COLMAP is noisy) 🌐 Project: hwjiang1510.github.io/RayZer/ (1/4)
English
6
74
442
68.7K
Towaki Takikawa / 瀧川永遠希
Would be super fun to try- a massive transformer (derived from pre-trained LLMs) that models compressed "3D" representations which then get fed into a view-dependent decoder model (but small enough to run on a "local" 8-GPU machine or something), trained on multiplayer game footage (N-streams of videos)
Towaki Takikawa / 瀧川永遠希 tweet media
English
1
0
1
262
Jon Barron
Jon Barron@jon_barron·
Here's my 3DV talk, in chapters: 1) Intro / NeRF boilerplate. 2) Recent reconstruction work. 3) Recent generative work. 4) Radiance fields as a field. 5) Why generative video has bitter-lessoned 3D. 6) Why generative video hasn't bitter-lessoned 3D. 5 & 6 are my favorites.
Jon Barron tweet media
English
39
102
789
114K
Alex Trevithick
Alex Trevithick@alextrevith·
@yongyuanxi @jon_barron 1. Agreed. Real-world videos obey physics; generative models should therefore learn this property of the data with scaling compute and data. 2. It's like taking the "neural" in neural rendering to the limit. I wonder if a prohibitively large decoder may be necessary.
English
1
0
2
193
Towaki Takikawa / 瀧川永遠希
I actually do wonder if even the speed argument is temporary given: 1. we _could_ concoct up an algorithm / method for efficiently enabling much longer context windows for video generation, resulting in temporally stable worlds 2. local GPUs might not get 1000x smaller, but you _could_ have massive cloud GPUs "render" a shared world state (in some abstract latent space) to get massive amortization across users (assuming its something like Gorilla Tag with shared state across N users)- and then have local hardware solve an easier task of decoding the latent frames and forward projecting into the future for some number of frames to account for latency (which already happen in much more local time scales: developer.nvidia.com/blog/nvidia-re…). the "multi-view" flavors of diffusion models for 3D reconstruction kind of looks like this- you _could_ do something similar like a N-view diffusion model trained on footage of N players playing in a shared world (which I'm not sure has been done yet but sounds like a fun idea to try. you could probably argue this is 'sort of 3D' but not 3D in the sense that the explicit 3D representation exists) (of course both 1,2 are very hard tasks, but possibly something not bounded by physical limitations per se)
English
2
0
5
1.4K
Alex Trevithick
Alex Trevithick@alextrevith·
After finishing ICCV reviews this year...
Alex Trevithick tweet media
English
4
0
37
5.5K
Alex Trevithick
Alex Trevithick@alextrevith·
What's the difference between the oai and google image generators? Giving both of them the same image and prompt "generate this image" Gemini is essentially the identity function whereas oai changes content. Does this indicate continuous encoder for Gemini vs. VQVAE for oai?
Alex Trevithick tweet mediaAlex Trevithick tweet media
English
0
0
9
490
Alex Trevithick retweetledi
Xingyu Chen
Xingyu Chen@RoverXingyu·
🦣Easi3R: 4D Reconstruction Without Training! Limited 4D datasets? Take it easy. #Easi3R adapts #DUSt3R for 4D reconstruction by disentangling and repurposing its attention maps → make 4D reconstruction easier than ever! 🔗Page: easi3r.github.io
English
5
30
168
39.3K
Alex Trevithick retweetledi
Stan Szymanowicz
Stan Szymanowicz@StanSzymanowicz·
⚡️ Introducing Bolt3D ⚡️ Bolt3D generates interactive 3D scenes in less than 7 seconds on a single GPU from one or more images. It features a latent diffusion model that *directly* generates 3D Gaussians of seen and unseen regions, without any test time optimization. 🧵👇 (1/9)
English
28
92
537
125.6K
Alex Trevithick retweetledi
Jianyuan
Jianyuan@jianyuan_wang·
Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds! No expensive optimization needed, yet delivers SOTA results for: ✅ Camera Pose Estimation ✅ Multi-view Depth Estimation ✅ Dense Point Cloud Reconstruction ✅ Point Tracking Project Page: vgg-t.github.io Code & Weights: github.com/facebookresear…
English
21
192
1.3K
202.6K
Alex Trevithick retweetledi
Jiaming Song
Jiaming Song@baaadas·
As one of the people who popularized the field of diffusion models, I am excited to share something that might be the “beginning of the end” of it. IMM has a single stable training stage, a single objective, and a single network — all are what make diffusion so popular today.
Luma@LumaLabsAI

Today, we release Inductive Moment Matching (IMM): a new pre-training paradigm breaking the algorithmic ceiling of diffusion models. Higher sample quality. 10x more efficient. Single-stage, single network, stable training. Read more: lumalabs.ai/news/imm

English
21
106
913
154.8K
Alex Trevithick retweetledi
Jon Barron
Jon Barron@jon_barron·
I just pushed a new paper to arXiv. I realized that a lot of my previous work on robust losses and nerf-y things was dancing around something simpler: a slight tweak to the classic Box-Cox power transform that makes it much more useful and stable. It's this f(x, λ) here:
English
36
259
2.2K
235.9K