Xingang Pan

80 posts

Xingang Pan

@XingangP

Assistant Professor at Nanyang Technological University @NTUsg @MMLabNTU - Computer Vision, Deep Learning, Computer Graphics

Singapore Beigetreten Mayıs 2018

436 Folgt3.3K Follower

Xingang Pan@XingangP·12 Oca

Introducing StoryMem — a memory-augmented framework for multi-shot long video storytelling. StoryMem carefully injects compact memory into the generation process with minimal overhead, enabling: • Cross-shot consistency • Smooth transitions • Narrative coherence across minutes-long videos Awesome work by Kaiwen @sze68zkw Project: kevin-thu.github.io/StoryMem/ arXiv: arxiv.org/pdf/2512.19539 Code: github.com/Kevin-thu/Stor…

English

16.1K

Xingang Pan retweetet

AK@_akhaliq·25 Ara

StoryMem Multi-shot Long Video Storytelling with Memory huggingface.co/papers/2512.19…

English

16K

Xingang Pan retweetet

周弈帆 (Yifan Zhou)@zhouyifan1107·19 Ara

We introduce Log-linear Sparse Attention (LLSA), a trainable sparse attention mechanism that reduces attention complexity from O(N²) to O(N log N). 📄 Paper: arxiv.org/abs/2512.16615 💻 Code: github.com/SingleZombie/L…

English

2.2K

Xingang Pan@XingangP·12 Ara

Can video generative models exhibit visuospatial intelligence? 🤔 Introducing Video4Spatial — a video-only framework that tackles spatial tasks. With just video context, our model can: 🔍 Ground objects by planning geometry-consistent paths 📸 Follow camera-pose instructions for scene navigation 🌐 Generalize to long contexts & unseen outdoor scenes A step toward video models as visual-spatial reasoners. Project: xizaoqu.github.io/video4spatial/ arXiv: arxiv.org/pdf/2512.03040

English

141

15.9K

Xingang Pan@XingangP·19 Kas

Introducing 📦𝗔𝗿𝘁𝗶𝗟𝗮𝘁𝗲𝗻𝘁🔧 (SIGGRAPH Asia 2025) — a high-quality 3D diffusion model that explicitly models object articulation, paving the way for richer, more realistic assets in embodied AI and simulation: – Generates fully articulated 3D objects – Physically plausible joints & motion – High-fidelity 3D Gaussian appearance – Supports generation from a single real image arXiv: arxiv.org/pdf/2510.21432 Project: chenhonghua.github.io/MyProjects/Art… Code (coming soon): github.com/chenhonghua/Ar…

English

173

11.5K

Xingang Pan retweetet

AK@_akhaliq·15 Ağu

STream3R Scalable Sequential 3D Reconstruction with Causal Transformer

GIF

English

111

12.9K

Xingang Pan@XingangP·15 Ağu

Cool work that connects the idea of volume rendering with image diffusion!

Xiaohang Zhan@xiaohangzhan

Our paper LaRender received full marks at ICCV 2025 and was selected as oral! This paper enables control of occlusion relationships among objects and visual effects in a training-free manner for diffusion-based image generation. Project page: xiaohangzhan.github.io/projects/laren…

English

1.2K

Xingang Pan@XingangP·15 Ağu

Introducing 𝗦𝗧𝗿𝗲𝗮𝗺𝟯𝗥, a new 3D geometric foundation model for efficient 3D reconstruction from streaming input. Similar to LLMs, STream3R uses casual attention during training and KVCache at inference. No need to worry about post-alignment or reconstructing from scratch. You can easily add new frames and update the reconstruction incrementally. Great work by Yushi @GROS17121524 and Yihang @TheYihangLuo ! Project: nirvanalan.github.io/projects/strea… arXiv: arxiv.org/abs/2508.10893 Code: github.com/NIRVANALAN/STr… See a streaming reconstruction of our S-Lab lobby below!

Yushi LAN@GROS17121524

🔥Streaming-based 3D/4D Foundation Model🔥 We present STream3R, which reformulates dense 3D/4D reconstruction into a sequential registration task with **causal attention**. - Projects: nirvanalan.github.io/projects/strea… - Code: github.com/NIRVANALAN/STr… - Model: huggingface.co/yslan/STream3R…

English

316

32.6K

Xingang Pan retweetet

AK@_akhaliq·15 Ağu

Grok 4 one shots building a gemma-3-270m chatbot with transformers.js one click deploy in anycoder

English

105

30.1K

Xingang Pan@XingangP·29 Tem

Directly training Video Diffusion Models on long videos faces huge memory and learning challenges. How do we model long-range temporal distribution then? Our ICCV 2025 work, 🎞️𝗧𝗼𝗸𝗲𝗻𝘀𝗚𝗲𝗻, offers a solution. We compress videos into a highly condensed token space, enabling a DiT to model token distribution over much longer time ranges. A second-stage DiT then "decodes" these tokens back into long videos. While academic computing constraints currently prevent us from matching industrial VDM quality, TokensGen demonstrates an efficient and scalable design. We're optimistic it will pave the way for generating exceptionally long videos, like movies, in the future. Project: vicky0522.github.io/tokensgen-webp… arXiv: arxiv.org/abs/2507.15728 Github (to be released): github.com/Vicky0522/Toke…

English

106

6.2K

Xingang Pan@XingangP·18 Nis

𝗪𝗼𝗿𝗹𝗱𝗠𝗲𝗺 is mainly created by @zeqi_xiao Project page: xizaoqu.github.io/worldmem/ ArXiv: arxiv.org/abs/2504.12369 Github: github.com/xizaoqu/WorldM… Demo: huggingface.co/spaces/yslan/w…

English

983

Xingang Pan@XingangP·18 Nis

Synthesizing worlds with video diffusion models is often inconsistent — moving the camera back and forth leads to different scenes. We propose 🌐𝗪𝗼𝗿𝗹𝗱𝗠𝗲𝗺, a memory-based approach that ensures consistent world simulation without relying on explicit 3D reconstruction.

Zeqi Xiao@zeqi_xiao

While recent works like Genie 2, The Matrix, and Navigation World Models explore video generative models as world simulators, world consistency remains underexplored. In this work, we propose 🌐WorldMem🌐, introducing a memory mechanism for long-term consistent world simulation.

English

147

19.5K

Xingang Pan@XingangP·14 Mar

Diffusion models are sensitive to small changes in the input noise. We introduce Alias-Free Latent Diffusion Models (𝗔𝗙-𝗟𝗗𝗠) at #CVPR2025. It achieves shift-equivariance and generates consistent outputs. Project: zhouyifan.net/AF-LDM-Page/ arXiv: arxiv.org/abs/2503.09419

English

406

42.6K

Xingang Pan@XingangP·14 Mar

arXiv: arxiv.org/abs/2503.08434

Català

562

Xingang Pan@XingangP·14 Mar

The Bokeh Effect is so important in photography, yet existing text2image diffusion models do not support controling bokeh strength. We introduce 𝗕𝗼𝗸𝗲𝗵 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻, a T2I diffusion model that supports flexible background blur control! Project: atfortes.github.io/projects/bokeh…

English

5.9K

Xingang Pan retweetet

Yihang Luo@onelineluo·11 Mar

💥 Consistent Multi-View Diffusion for 3D Enhancement 💥 Introducing our work #3DEnhancer @CVPR: a multi-view diffusion model that enhances multi-view images to improve 3D models. 📰arXiv: arxiv.org/abs/2412.18565 🔥Project: yihangluo.com/projects/3DEnh…

English

Xingang Pan retweetet

Zexin He@he_zexin·13 Ara

🎉Excited to share Neural LightRig!🎉 It allows for accurate and fast estimation of surface normals and PBR materials from just one image. We achieve this by generating multi-light images with a diffusion model, overcoming the estimation ambiguity of inverse rendering.🚀 Page: projects.zxhezexin.com/neural-lightrig

English

14.7K

Entdecken

@sze68zkw @GROS17121524 @zeqi_xiao @CVPR @elonmusk @BarackObama @taylorswift13 @cristiano