Cheng Lin

62

418

34.2K

Cheng Lin retweetledi

Zhiyang (Frank) Dou@frankzydou·11 May

Introducing ✨RigidFormer: Learning Rigid Dynamics with Transformers - our attempt to scale learning-based physical dynamics with Transformers. RigidFormer learns rigid dynamics with Transformers. It is a mesh-free, object-centric Transformer for multi-object rigid-body contact dynamics from point clouds. Learning physics with purely neural simulators, without relying on traditional physics engines, is an important and widely studied problem. Prior SOTA methods often use graph neural networks for accuracy and generalization, but still struggle with efficient, high-fidelity simulation at scale. RigidFormer uses only point inputs, matches or outperforms mesh-based baselines on standard benchmarks, runs much faster, generalizes across point resolutions and datasets, and scales to 200+ objects. We also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components. RigidFormer is mesh-free: it does not require mesh connectivity, SDFs, or vertex-level message passing, making it well-suited for point-cloud observations and scalable simulation. This architecture can also be adapted to learn soft-body dynamics by replacing the rigid-body module (differentiable Kabsch alignment). 🎬See our video for more details. Many thanks to my amazing collaborators: Minghao Guo @GuoMh14, Haixu Wu @Haixu_Wu_1998, Doug Roble, Tuur Stuyck @TuurStuyck, and Wojciech Matusik @wojmatusik. Project page: people.csail.mit.edu/frankzydou/pro… Paper: people.csail.mit.edu/frankzydou/pro…

English

61

295

566.6K

Cheng Lin retweetledi

Zhiyang (Frank) Dou@frankzydou·1 May

Excited to share that our work NeuralActuator: Neural Actuation Modeling for Robot Dynamics and External Force Perception has been accepted to #RSS2026! Your robot — even a low-cost one — can feel external forces without torque or tactile sensors. TL;DR: NeuralActuator is a neural actuator model that jointly predicts 1️⃣torque to capture the nonlinear and time-varying current–to–torque relationship of low-cost servos, 2️⃣external contact forces (and force detection gates) for sensorless force perception, 3️⃣and motor conditions that indicate each motor’s operating regime. Here is a fast-forward video clip ⬇️ We are also covering more robots like LeRobot-S101 and Franka Panda. More details coming soon.

English

8

59

326

39.5K

Cheng Lin@_cheng_lin·28 Nis

Great Work!

SAD: Soft Anisotropic Diagrams for Differentiable Image Representation has been accepted by #SIGGRAPH2026 Check it out, and huge congrats to Lucky! @Luckyballa #SAD represents an image as a soft, anisotropic, differentiable diagram over learnable sites. Each pixel is modeled as a softmax blend over its top-K nearby sites under a site-dependent distance, yielding a differentiable partition of unity with explicit ownership and content-aligned boundaries. A GPU-friendly top-K propagation scheme keeps the cost constant per pixel, enabling fast fitting at matched or better quality. Classical geometric structures can still inspire fresh perspectives in modern visual computing. Voronoi and Power diagrams have long been elegant tools for 3D shape analysis, reconstruction, and geometric reasoning; here, related diagram ideas, with connections to Apollonius-style diagrams, are explored for image representations. Homepage: luckyiyi.github.io/SAD/ arXiv: arxiv.org/pdf/2604.21984 #SIGGRAPH2026 #SIGGRAPH #CV #Vision #Graphics #CG

English

3

325

Cheng Lin retweetledi

Natalie Khalil@natalienkhalil·13 Nis

Basically

English

79

1.4K

6.9K

1.4M

Cheng Lin@_cheng_lin·13 Mar

✨Introducing our research product FizzReel🎥 — a video re-creation tool that uses AI to understand the creative intent behind your reference, from shot design and VFX to lighting💡. Combined with powerful models like Seedance 2.0🔥, it's the future of video creation! 🚀

English

1

9

561

Cheng Lin retweetledi

Zhiyang (Frank) Dou@frankzydou·6 Oca

We present EgoReAct: Real-time 3D human reaction generation from streaming egocentric video. 🌟Reacting to streaming egocentric video is something humans do every day. We hope EgoReAct makes human motion more human-like. 🔎 What we found: existing ego-reaction data can be spatially inconsistent (e.g., moving reactions paired with fixed-camera videos), which breaks 3D grounding. 📷 What we built: HRD, a spatially aligned egocentric video–reaction dataset (3,500 pairs, 32 categories), plus a spatially aligned ViMo fix for fair evaluation. (Instead of collecting expensive ground-truth motion, we employ VDM to generate the egocentric videos.) 👁️⚡🏃 Our simple yet effective pipeline: motion tokenization for compact discrete codes + an autoregressive Transformer for online, strictly-causal generation. Metric depth and head dynamics further improve 3D spatial consistency. Project Page: frank-zy-dou.github.io/projects/EgoRe… ArXiv: arxiv.org/abs/2512.22808 #HumanMotion #EgocentricVision #3D #ARVR #Animation #AIGC #DeepLearning #GenerativeAI #Graphics #ComputerVision #Motion

Cambridge, MA 🇺🇸 English

29

159

11.2K

Cheng Lin retweetledi

Zhenjun Zhao@zhenjun_zhao·11 Ara

TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels Jiahao Lu, Weitao Xiong, Jiacheng Deng, Peng Li, Tianyu Huang, @frankzydou, @_cheng_lin, Sai-Kit Yeung, @YuanLiu41955461 tl;dr: arbitrary sparse 2D tracks->upsampler->dense 2D tracks; ASAP constraint arxiv.org/abs/2512.08358

English

2

11

56

4.3K

Cheng Lin retweetledi

Zhenjun Zhao@zhenjun_zhao·7 Ara

LiteVGGT: Boosting Vanilla VGGT via Geometry-aware Cached Token Merging Zhijian Shu, @_cheng_lin, Tao Xie, Wei Yin, Ben Li, Zhiyuan Pu, @WeizeLi24, Yao Yao, Xun Cao, @gingertata, @xxlong0 tl;dr: pixel gradient+token variance->geometric importance->geometry-aware feature map->partition & group & merge tokens arxiv.org/abs/2512.04939

Suomi

6

44

2.7K

Cheng Lin@_cheng_lin·4 Ara

Welcome to check out our spotlight paper at NeurIPS2025！🌟

Please check out paper #MOSPA "🎧Human Motion Generation Driven by Spatial Audio” at #NeurIPS2025 (🌟Spotlight)! 😊We have released our dataset and models : ) 💡The paper tackles the challenge of spatial-audio-driven human motion generation, enabling virtual humans to respond dynamically and realistically to diverse spatial sounds — not just “what” is sounding, but also “where” and “how” it sounds in space. 💡We introduce SAM, the first comprehensive Spatial Audio-Driven Human Motion dataset, with diverse spatial audio scenarios and high-quality 3D motion pairs, providing a solid benchmark for studying human motion conditioned on spatial audio. 💡Building on this, MOSPA is a diffusion-based generative framework that fuses semantic and spatial features of the audio to synthesize diverse, realistic motions aligned with spatial audio cues, achieving state-of-the-art performance on this new task and offering a strong baseline for future research. If you work on virtual humans, spatial audio, XR, or humanoid / embodied control, this can be a good motion skill learning source. Please come meet the team at our #NeurIPS2025 San Diego Spotlight poster! 📍 Exhibit Hall C,D,E — #4310 🕚 Fri, Dec 5 | 11 a.m.–2 p.m. PST Homepage: frank-zy-dou.github.io/projects/MOSPA… Paper: arxiv.org/abs/2507.11949 Code and Data: github.com/xsy27/Mospa-Ac… #NeurIPS #NeurIPS2025 #MOSPA #motion #Animation #SpatialAudio #VirtualHuman #Robotics #Robot #AI #Deeplearning #GenerativeAI #AIGC

English

🚀 We’ll be hosting a Tutorial on "3D Human Motion Generation and Simulation" at ICCV 2026 in Honolulu, Hawaii! 🌺 📅 Date: October 19, 2026 ⏰ Time: 9:00–16:00 (HST) 🔗 More details & resources: 3dmogen.github.io #AIGC #Simulation #robotics #ComputerVision #ICCV2025

5

590

Cheng Lin retweetledi

MrNeRF@janusch_patas·18 Kas

Here is some new footage from this paper, offering a glimpse into the future of dynamic 3D Gaussian Splatting models combined with static reconstructed scenes. Imagine this: when the lighting matches, the result becomes practically indistinguishable from reality. Just pick a scene, add characters, and record it from any angle. Apply diffusion models to instantly change the look. I firmly believe this is the future of VFX.

MrNeRF@janusch_patas

AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting Contributions: • We introduce the 3D Gaussian Splatting representation to the classical computer graphics problem of animating humans in 3D environments. • We demonstrate that our framework can be used for geometry-consistent free viewpoint rendering of monocular videos edited with new animated humans. • We introduce a novel Gaussian-aligned motion module for motion synthesis in scenes represented as 3D Gaussians. • We introduce a human scene Gaussian refinement optimization for the correct placement of human Gaussians in scenes represented using 3DGS, leading to better contact and interactions.

English

12

77

635

57.8K

Cheng Lin retweetledi

Zhiyang (Frank) Dou@frankzydou·4 Eki

🚀 We’ll be hosting a Tutorial on "3D Human Motion Generation and Simulation" at ICCV 2025 in Honolulu, Hawaii! 🌺 🏃‍♀️🏃‍♂️🧗🏊🚴🕺🤖 📅 Date: October 19, 2025 ⏰ Time: 9:00–16:00 (HST) This tutorial brings together leading researchers to cover the foundations and latest advances in motion and interaction modeling. Topics include: 1️⃣ Human motion generation basics 2️⃣ Kinematic- and physics-based motion models 3️⃣ Controllability of motion generation 4️⃣ Human-object/scene interactions 5️⃣ Co-speech gesture synthesis 👩‍🏫 🎤 Speakers & Organizers include experts from Meta, Nvidia, ETH, MPI, Northeastern University, University of Pennsylvania, MIT, and more. Particularly, we will be honored to have Libin Liu, Zhengyi (Zen) Luo, Xianghui Xie, Korrawe Karunratanakul as our speakers. Whether you’re a PhD student, researcher, or industry practitioner, we belive this tutorial will provide deep technical insights, practical methods, and exciting open problems to explore in the years ahead. 🔗 More details & resources: 3dmogen.github.io #ICCV2026 #3DHumanMotion #GenerativeAI #Simulation #robotics #ComputerVision #DeepLearning #ICCV #AI #Motion #Animation

Chuan Guo@chuan_guo92603

English

11

98

14.1K

Cheng Lin retweetledi

Zhiyang (Frank) Dou@frankzydou·19 Eyl

Excited that 🎧MOSPA: Human Motion Generation Driven by Spatial Audio was accepted as a #NeurIPS2025 Spotlight! ✨ We hope #MOSPA can advance motion generation toward spatial intelligence a bit more, and also inspire progress in humanoid control and embodied AI. 🎧🏃‍♀️‍➡️ We’ll be releasing the code and dataset soon — stay tuned! Project Page: frank-zy-dou.github.io/projects/MOSPA… Paper: arxiv.org/abs/2507.11949 Congrats to Shuyang and all the collaborators! #NeurIPS2025 #NeurIPS #Motion #Robotics #Humanoid #Animation #CG #CV #AIGC #DL #Deeplearning #Motion #Graphics #AI #GenerativeAI

Excited to share our latest work on 🎧spatial audio-driven human motion generation. We aim to tackle a largely underexplored yet important problem of enabling virtual humans to move naturally in response to spatial audio—capturing not just what is heard, but also where the sound is coming from. To this end, we introduce the Spatial Audio-Driven Human Motion (SAM) dataset—the first comprehensive dataset featuring paired high-quality human motion and spatial audio recordings. For benchmarking, we develop a generative framework for human MOtion generation driven by SPAtial audio, termed MOSPA, which learns to synthesize realistic and diverse human motions conditioned on spatial audio input. We hope this research could provide a foundation for future research in spatial perception, virtual characters, and embodied AI. The dataset and model will be open-sourced soon. A big thank you to our intern, Shuyang Xu, for the wonderful collaboration! Congratulations, Shuyang! Project page: frank-zy-dou.github.io/projects/MOSPA… Paper: arxiv.org/abs/2507.11949 Video: youtu.be/p_xwTDA-K0g #Animation #CG #CV #AIGC #DL #Deeplearning #Motion #Graphics #AI #GenerativeAI

English

14

46

5.8K

Yawar Siddiqui@yawarnihal·13 Ağu

Check out our #ICCV2025 paper VertexRegen! Instead of the typical incomplete meshes you get with autoreg mesh gen, VertexRegen generates progressively more detailed meshes as the generated sequence gets longer. vertexregen.github.io Great work by xzhang.dev

English

3

37

346

19.9K

Cheng Lin@_cheng_lin·14 Ağu

@yawarnihal Great work! Always delivering new ideas on mesh generation!

English

2

153

Cheng Lin@_cheng_lin·30 Tem

@xuchenghust Thx Xu Lao!😉

Filipino

1

276

Xu Cheng@xuchenghust·30 Tem

@_cheng_lin Cool👏

English

0

2

243

Cheng Lin@_cheng_lin·29 Tem

🔎Excited to share our PDT: Point Distribution Transformation with Diffusion Models (SIGGRAPH2025)! 💡While autoregressive models are now widely adopted for predicting structures, we find diffusion models can also reveal high-level structures by transforming point distributions!

English

62

418

34.2K

Cheng Lin@_cheng_lin·30 Tem

@totoro97_ 🥰🥰

QME

240

Peng Wang@totoro97_·30 Tem

@_cheng_lin Nice work!

English

0

1

289

Cheng Lin retweetledi

Jionghao Wang@ShaneMankiw·29 Tem

We are excited to announce our PDT: Point Distribution Transformation with Diffusion Models, to appear in SIGGRAPH 2025! Paper link: arxiv.org/abs/2507.18939 Project page: shanemankiw.github.io/PDT/

English

8

3

18

1.6K

Cheng Lin@_cheng_lin·29 Tem

PDT can be used for remeshing, skeleton prediction and featureline prediction. 🌐Webpage: shanemankiw.github.io/PDT/ 📰Paper: arxiv.org/abs/2507.18939 Thanks to John @ShaneMankiw and all the co-authors!

English

3

18

1.3K

Cheng Lin@_cheng_lin·18 Tem

Check out Spatial Audio-drive Human Motion Generation. This will be very useful for future interactions with virtual AI characters in spatial environment!

Excited to share our latest work on 🎧spatial audio-driven human motion generation. We aim to tackle a largely underexplored yet important problem of enabling virtual humans to move naturally in response to spatial audio—capturing not just what is heard, but also where the sound is coming from. To this end, we introduce the Spatial Audio-Driven Human Motion (SAM) dataset—the first comprehensive dataset featuring paired high-quality human motion and spatial audio recordings. For benchmarking, we develop a generative framework for human MOtion generation driven by SPAtial audio, termed MOSPA, which learns to synthesize realistic and diverse human motions conditioned on spatial audio input. We hope this research could provide a foundation for future research in spatial perception, virtual characters, and embodied AI. The dataset and model will be open-sourced soon. A big thank you to our intern, Shuyang Xu, for the wonderful collaboration! Congratulations, Shuyang! Project page: frank-zy-dou.github.io/projects/MOSPA… Paper: arxiv.org/abs/2507.11949 Video: youtu.be/p_xwTDA-K0g #Animation #CG #CV #AIGC #DL #Deeplearning #Motion #Graphics #AI #GenerativeAI

English