Sven Elflein

39 posts

Sven Elflein

Sven Elflein

@s_elflein

Research Intern @NVIDIA | PhD Student @UofT @VectorInst

Toronto, Canada Katılım Eylül 2011
326 Takip Edilen229 Takipçiler
Sven Elflein retweetledi
Zan Gojcic
Zan Gojcic@ZGojcic·
A new generation in AV simulation is here! We are announcing AlpaDreams, a real time interactive generative world model for AV simualtion! Just a year ago it took minutes to generate a few seconds of video, today it is real time and interactive! research.nvidia.com/labs/sil/proje…
English
5
26
103
17.3K
Cristián Llull
Cristián Llull@cllullt·
@s_elflein Awesome project! We are hosting the SHREC26 Track on 3D Reconstruction and your work could be a valuable addition to the track. Challenge: reconstruct objects from 2D images. After, we'll evaluate their quality using a novel feature-aware metric. See: shapevision.dcc.uchile.cl/cllull-shrec20…
English
1
0
0
27
Sven Elflein
Sven Elflein@s_elflein·
🚀 Exciting news! We’re introducing VGG-T³: a scalable model for offline feed-forward 3D reconstruction that finally tackles the "quadratic bottleneck." Ever wanted to have VGGT reconstruct a 1,000-image scene in seconds instead of 10 minutes and use it for visual localization?
GIF
English
7
69
467
32.4K
Sven Elflein retweetledi
Zan Gojcic
Zan Gojcic@ZGojcic·
We're releasing DiffusionHarmonizer, an online diffusion enhancer bridging neural reconstruction and photorealistic simulation by correcting artifacts, and harmonizing inserted objects so they truly belong in the scene: matching shadows, lighting & color research.nvidia.com/labs/sil/proje…
English
7
48
277
45.1K
Sven Elflein
Sven Elflein@s_elflein·
@ducha_aiki Great question! Similar to VGGT, we observed that once the similarity of images becomes too high (or covers large areas and the model has to propagate covisibility) the reconstruction quality becomes worse. So scaling to reconstructing rome remains an open problem for FF methods
English
1
3
19
1.7K
Sven Elflein
Sven Elflein@s_elflein·
6/7 ⚙️ Making it work: We find 1) it is critical to initialize from a pre-trained softmax-attention checkpoint. 2) TTT exhibits length generalization issues! Please check out the paper for more details on initialization and tricks towards closing the gap to softmax attention!
Sven Elflein tweet media
English
1
0
10
1.6K
Sven Elflein retweetledi
Junchen Liu
Junchen Liu@JunchenLiu77·
Continual learning and online adaptation are often framed as the next frontier of AI. 🚀 Modern architectures use Test-Time Training (TTT) to memorize key-value pairs on the fly via gradient descent, or so we thought. To test this memorization hypothesis, we replaced gradient descent with gradient ASCENT. It should destroy memorization. Instead... performance was preserved, or even slightly improved. 😱 It turns out, TTT with KV Binding is secretly linear attention! Site: research.nvidia.com/labs/sil/proje…
Junchen Liu tweet media
English
5
40
235
53.1K
Sven Elflein retweetledi
Ruilong Li
Ruilong Li@ruilong_li·
PPISP is a proper way to compensate the exposure/white balance etc in the training views for 3DGS/NeRF. Very solid work. And day-0 support in gsplat! Thanks to the authors for brining it in! github.com/nerfstudio-pro…
Radiance Fields@RadianceFields

Radiance field reconstruction (Gaussian splatting) quality is getting a big step up. @NVIDIAAI just released Physically-Plausible Image Signal Processing (PPISP) for Radiance Field Reconstruction. Apache 2.0 and coming to both gsplat and 3DGRUT. Code: github.com/nv-tlabs/ppisp Article: radiancefields.com/nvidia-announc… Authors: Isaac Deutsch, Nicolas Möenne-Loccoz, @ZGojcic, @gavrielstate @NVIDIAAIDev

English
1
13
91
8.3K
Sven Elflein retweetledi
Xindi Wu
Xindi Wu@cindy_x_wu·
New #NVIDIA Paper We introduce Motive, a motion-centric, gradient-based data attribution method that traces which training videos help or hurt video generation. By isolating temporal dynamics from static appearance, Motive identifies which training videos shape motion in video generation. 🔗 research.nvidia.com/labs/sil/proje… 1/10
English
11
112
542
73.2K
Sven Elflein retweetledi
Sven Elflein retweetledi
Zan Gojcic
Zan Gojcic@ZGojcic·
Our team at Nvidia Spatial Intelligence Lab is hiring PhD research interns for 2026! research.nvidia.com/labs/sil/ If you’re excited about fast video models, generative world simulators, or 3D foundation models, please reach out by email or apply directly lnkd.in/gGKU_sUr
English
5
31
214
58.3K
Sven Elflein retweetledi
Andrew Liao
Andrew Liao@andrewliao11·
🤔🖼️ Could “slow thinking” help VLMs understand images better? We present LongPerceptualThoughts — teaching models to reason deeper for sharper visual understanding. 📅 Oct 7 @ COLM, Poster Session 1 — come chat reasoning + vision!
Andrew Liao tweet media
Andrew Liao@andrewliao11

🚀 New work: LongPerceptualThoughts We introduce a synthetic data pipeline to fine-tune VLMs with Long Chain-of-thoughts. 𝐆𝐨𝐚𝐥: Help VLMs “think longer” on vision tasks. +3 pts on 5 Vision tasks +11 pts on V* Bench +2 pts on MMLU-Pro (text-only) 🌐 andrewliao11.github.io/LongPerceptual…

English
0
5
36
4.9K
Sven Elflein retweetledi
Jiahui Huang
Jiahui Huang@huangjh_hjh·
[1/N] 🎥 We've made available a powerful spatial AI tool named ViPE: Video Pose Engine, to recover camera motion, intrinsics, and dense metric depth from casual videos! Running at 3–5 FPS, ViPE handles cinematic shots, dashcams, and even 360° panoramas. 🔗 research.nvidia.com/labs/toronto-a…
English
13
100
450
61.9K
Sven Elflein retweetledi
Pruna AI
Pruna AI@PrunaAI·
⚡️ The hype is real, generate 5s SOTA videos at $0.06 per video with Wan 2.2 Juiced! We just optimized the Wan 2.2 video model to make it the FASTEST and CHEAPEST video generation endpoint! • 𝗙𝗮𝘀𝘁: Wan 2.2. juiced shows 𝟭.𝟱𝘅-𝟮𝘅 𝘀𝗽𝗲𝗲𝗱 𝗮𝗰𝗰𝗲𝗹𝗲𝗿𝗮𝘁𝗶𝗼𝗻 over base models for 480p and 720p videos, and generates 480p videos in just 𝟮𝟴-𝟯𝟯 𝘀𝗲𝗰𝗼𝗻𝗱𝘀. • 𝗖𝗵𝗲𝗮𝗽: Wan 2.2. juiced is up to 𝟭𝟬𝟬𝘅 𝗰𝗵𝗲𝗮𝗽𝗲𝗿 than competitors. Text-to-video costs $0.06 per video vs $6.00 for Veo3! 👉 Try it on @replicate now: • replicate.com/wan-video/wan-…replicate.com/wan-video/wan-… 🎯 Compare quality frame-by-frame with our Frame Arena tool: huggingface.co/spaces/PrunaAI… ⭐️ Full technical deep-dive on our blog: pruna.ai/blog/wan-2-2-v…
English
6
7
89
7.2K