Xingang Pan

80 posts

Xingang Pan banner
Xingang Pan

Xingang Pan

@XingangP

Assistant Professor at Nanyang Technological University @NTUsg @MMLabNTU - Computer Vision, Deep Learning, Computer Graphics

Singapore Beigetreten Mayıs 2018
436 Folgt3.3K Follower
Xingang Pan
Xingang Pan@XingangP·
Can video generative models exhibit visuospatial intelligence? 🤔 Introducing Video4Spatial — a video-only framework that tackles spatial tasks. With just video context, our model can: 🔍 Ground objects by planning geometry-consistent paths 📸 Follow camera-pose instructions for scene navigation 🌐 Generalize to long contexts & unseen outdoor scenes A step toward video models as visual-spatial reasoners. Project: xizaoqu.github.io/video4spatial/ arXiv: arxiv.org/pdf/2512.03040
English
2
26
141
15.9K
Xingang Pan
Xingang Pan@XingangP·
Introducing 📦𝗔𝗿𝘁𝗶𝗟𝗮𝘁𝗲𝗻𝘁🔧 (SIGGRAPH Asia 2025) — a high-quality 3D diffusion model that explicitly models object articulation, paving the way for richer, more realistic assets in embodied AI and simulation: – Generates fully articulated 3D objects – Physically plausible joints & motion – High-fidelity 3D Gaussian appearance – Supports generation from a single real image arXiv: arxiv.org/pdf/2510.21432 Project: chenhonghua.github.io/MyProjects/Art… Code (coming soon): github.com/chenhonghua/Ar…
English
2
36
173
11.5K
Xingang Pan retweetet
AK
AK@_akhaliq·
STream3R Scalable Sequential 3D Reconstruction with Causal Transformer
GIF
English
4
14
111
12.9K
Xingang Pan
Xingang Pan@XingangP·
Introducing 𝗦𝗧𝗿𝗲𝗮𝗺𝟯𝗥, a new 3D geometric foundation model for efficient 3D reconstruction from streaming input. Similar to LLMs, STream3R uses casual attention during training and KVCache at inference. No need to worry about post-alignment or reconstructing from scratch. You can easily add new frames and update the reconstruction incrementally. Great work by Yushi @GROS17121524 and Yihang @TheYihangLuo ! Project: nirvanalan.github.io/projects/strea… arXiv: arxiv.org/abs/2508.10893 Code: github.com/NIRVANALAN/STr… See a streaming reconstruction of our S-Lab lobby below!
Yushi LAN@GROS17121524

🔥Streaming-based 3D/4D Foundation Model🔥 We present STream3R, which reformulates dense 3D/4D reconstruction into a sequential registration task with **causal attention**. - Projects: nirvanalan.github.io/projects/strea… - Code: github.com/NIRVANALAN/STr… - Model: huggingface.co/yslan/STream3R…

English
5
56
316
32.6K
Xingang Pan retweetet
AK
AK@_akhaliq·
Grok 4 one shots building a gemma-3-270m chatbot with transformers.js one click deploy in anycoder
English
9
13
105
30.1K
Xingang Pan
Xingang Pan@XingangP·
Directly training Video Diffusion Models on long videos faces huge memory and learning challenges. How do we model long-range temporal distribution then? Our ICCV 2025 work, 🎞️𝗧𝗼𝗸𝗲𝗻𝘀𝗚𝗲𝗻, offers a solution. We compress videos into a highly condensed token space, enabling a DiT to model token distribution over much longer time ranges. A second-stage DiT then "decodes" these tokens back into long videos. While academic computing constraints currently prevent us from matching industrial VDM quality, TokensGen demonstrates an efficient and scalable design. We're optimistic it will pave the way for generating exceptionally long videos, like movies, in the future. Project: vicky0522.github.io/tokensgen-webp… arXiv: arxiv.org/abs/2507.15728 Github (to be released): github.com/Vicky0522/Toke…
English
1
25
106
6.2K
Xingang Pan
Xingang Pan@XingangP·
Synthesizing worlds with video diffusion models is often inconsistent — moving the camera back and forth leads to different scenes. We propose 🌐𝗪𝗼𝗿𝗹𝗱𝗠𝗲𝗺, a memory-based approach that ensures consistent world simulation without relying on explicit 3D reconstruction.
Zeqi Xiao@zeqi_xiao

While recent works like Genie 2, The Matrix, and Navigation World Models explore video generative models as world simulators, world consistency remains underexplored. In this work, we propose 🌐WorldMem🌐, introducing a memory mechanism for long-term consistent world simulation.

English
2
27
147
19.5K
Xingang Pan
Xingang Pan@XingangP·
The Bokeh Effect is so important in photography, yet existing text2image diffusion models do not support controling bokeh strength. We introduce 𝗕𝗼𝗸𝗲𝗵 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻, a T2I diffusion model that supports flexible background blur control! Project: atfortes.github.io/projects/bokeh…
English
1
10
44
5.9K
Xingang Pan retweetet
Zexin He
Zexin He@he_zexin·
🎉Excited to share Neural LightRig!🎉 It allows for accurate and fast estimation of surface normals and PBR materials from just one image. We achieve this by generating multi-light images with a diffusion model, overcoming the estimation ambiguity of inverse rendering.🚀 Page: projects.zxhezexin.com/neural-lightrig
English
1
21
66
14.7K