AnySyn3D (@AnySyn3D) - Twitter Profili | Zamantika Mersobahis Locabet

AnySyn3D retweetledi

Here are more results from #RigidFormer: predicting physical dynamics with purely neural simulators — an attempt to learn physical dynamics in a scalable manner. 🤖 1) Controllable Articulated Body Simulation — More Results Additional Unitree G1 humanoid rollouts under controlled motion. Each sample uses a different initial state and control signal (direction and velocity). 🏺 2) Object Fragmentation Simulating the cracking and fragmentation process of objects. Thanks @zzigakovacic for suggesting this experiment! 🎬 3) Combining Rigidformer with Diffusion-as-Shader for controllable video generation. Note: the meshes shown here are only for visualization — the network takes point clouds as input and predicts the updated state of each point.

Zhiyang (Frank) Dou@frankzydou

Introducing ✨RigidFormer: Learning Rigid Dynamics with Transformers - our attempt to scale learning-based physical dynamics with Transformers. RigidFormer learns rigid dynamics with Transformers. It is a mesh-free, object-centric Transformer for multi-object rigid-body contact dynamics from point clouds. Learning physics with purely neural simulators, without relying on traditional physics engines, is an important and widely studied problem. Prior SOTA methods often use graph neural networks for accuracy and generalization, but still struggle with efficient, high-fidelity simulation at scale. RigidFormer uses only point inputs, matches or outperforms mesh-based baselines on standard benchmarks, runs much faster, generalizes across point resolutions and datasets, and scales to 200+ objects. We also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components. RigidFormer is mesh-free: it does not require mesh connectivity, SDFs, or vertex-level message passing, making it well-suited for point-cloud observations and scalable simulation. This architecture can also be adapted to learn soft-body dynamics by replacing the rigid-body module (differentiable Kabsch alignment). 🎬See our video for more details. Many thanks to my amazing collaborators: Minghao Guo @GuoMh14, Haixu Wu @Haixu_Wu_1998, Doug Roble, Tuur Stuyck @TuurStuyck, and Wojciech Matusik @wojmatusik. Project page: people.csail.mit.edu/frankzydou/pro… Paper: people.csail.mit.edu/frankzydou/pro…

English

1

15

95

11.2K

AnySyn3D retweetledi

Zhiyang (Frank) Dou@frankzydou·11 May

Introducing ✨RigidFormer: Learning Rigid Dynamics with Transformers - our attempt to scale learning-based physical dynamics with Transformers. RigidFormer learns rigid dynamics with Transformers. It is a mesh-free, object-centric Transformer for multi-object rigid-body contact dynamics from point clouds. Learning physics with purely neural simulators, without relying on traditional physics engines, is an important and widely studied problem. Prior SOTA methods often use graph neural networks for accuracy and generalization, but still struggle with efficient, high-fidelity simulation at scale. RigidFormer uses only point inputs, matches or outperforms mesh-based baselines on standard benchmarks, runs much faster, generalizes across point resolutions and datasets, and scales to 200+ objects. We also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components. RigidFormer is mesh-free: it does not require mesh connectivity, SDFs, or vertex-level message passing, making it well-suited for point-cloud observations and scalable simulation. This architecture can also be adapted to learn soft-body dynamics by replacing the rigid-body module (differentiable Kabsch alignment). 🎬See our video for more details. Many thanks to my amazing collaborators: Minghao Guo @GuoMh14, Haixu Wu @Haixu_Wu_1998, Doug Roble, Tuur Stuyck @TuurStuyck, and Wojciech Matusik @wojmatusik. Project page: people.csail.mit.edu/frankzydou/pro… Paper: people.csail.mit.edu/frankzydou/pro…

English

6

61

295

566.6K

AnySyn3D retweetledi

Zhiyang (Frank) Dou@frankzydou·1 May

Excited to share that our work NeuralActuator: Neural Actuation Modeling for Robot Dynamics and External Force Perception has been accepted to #RSS2026! Your robot — even a low-cost one — can feel external forces without torque or tactile sensors. TL;DR: NeuralActuator is a neural actuator model that jointly predicts 1️⃣torque to capture the nonlinear and time-varying current–to–torque relationship of low-cost servos, 2️⃣external contact forces (and force detection gates) for sensorless force perception, 3️⃣and motor conditions that indicate each motor’s operating regime. Here is a fast-forward video clip ⬇️ We are also covering more robots like LeRobot-S101 and Franka Panda. More details coming soon.

English

8

59

326

39.5K

AnySyn3D retweetledi

Lucky Iyinbor@Luckyballa·27 Nis

This January, I decided to give it a shot and wrote my first paper Today, I am happy to share that it was accepted by #SIGGRAPH2026 SAD is a differentiable image representation with soft, anisotropic partitioning, with up to 20x faster encoding time🧵 luckyiyi.github.io/SAD/index.html

English

12

46

409

28K

AnySyn3D retweetledi

Zhiyang (Frank) Dou@frankzydou·27 Nis

SAD: Soft Anisotropic Diagrams for Differentiable Image Representation has been accepted by #SIGGRAPH2026 Check it out, and huge congrats to Lucky! @Luckyballa #SAD represents an image as a soft, anisotropic, differentiable diagram over learnable sites. Each pixel is modeled as a softmax blend over its top-K nearby sites under a site-dependent distance, yielding a differentiable partition of unity with explicit ownership and content-aligned boundaries. A GPU-friendly top-K propagation scheme keeps the cost constant per pixel, enabling fast fitting at matched or better quality. Classical geometric structures can still inspire fresh perspectives in modern visual computing. Voronoi and Power diagrams have long been elegant tools for 3D shape analysis, reconstruction, and geometric reasoning; here, related diagram ideas, with connections to Apollonius-style diagrams, are explored for image representations. Homepage: luckyiyi.github.io/SAD/ arXiv: arxiv.org/pdf/2604.21984 #SIGGRAPH2026 #SIGGRAPH #CV #Vision #Graphics #CG

Lucky Iyinbor@Luckyballa

This January, I decided to give it a shot and wrote my first paper Today, I am happy to share that it was accepted by #SIGGRAPH2026 SAD is a differentiable image representation with soft, anisotropic partitioning, with up to 20x faster encoding time🧵 luckyiyi.github.io/SAD/index.html

Cambridge, MA 🇺🇸 English

1

10

30

3.5K

AnySyn3D retweetledi

#ICCV2025@ICCVConference·31 Mar

Coming soon: #ICCV2027 Hong Kong 🇨🇳

Yoshitomo Matsubara@yoshitomo_cs

didn't expect that it's already the season to create this email label

Filipino

0

22

214

42K

AnySyn3D retweetledi

Yuan Liu@YuanLiu41955461·31 Mar

Sharing our recent work, GO-Renderer, which leverages generative models to perform high-quality controllable object rendering from sparse images without exact geometry and appearance modeling. Homepage: igl-hkust.github.io/GO-Renderer/ Paper: arxiv.org/abs/2603.23246

English

5

29

168

10.4K

AnySyn3D retweetledi

Yuan Liu@YuanLiu41955461·30 Mar

Excited to share our work, Know3D, which connects LLMs' reasoning ability and knowledge to 3D generative models. This increases the controllability and plausibility of unseen parts in the generated 3D shapes. Paper: arxiv.org/abs/2603.22782 Project page: xishuxishu.github.io/Know3D.github.…

English

5

39

212

14.2K

AnySyn3D retweetledi

Yuan Liu@YuanLiu41955461·20 Şub

Happy to share our work, PartSAM, which is a promptable 3D part segmentation model trained directly on large-scale 3D data. Inference codes and the pre-trained model are released! Code: github.com/czvvd/PartSAM Project page: czvvd.github.io/PartSAMPage/ Paper: arxiv.org/abs/2509.21965

English

0

26

158

8.3K

AnySyn3D retweetledi

Jiahao Lu@FFzzf08·7 Mar

Why Track4World? 1️⃣ Dense world-centric tracking 2️⃣ Supports DA3/Pi3/MoGe 3️⃣ Efficient 3D correlation 4️⃣ 2D-to-3D supervision bypasses 3D GT scarcity! #ComputerVision #3DTracking #SceneFlow #OpticalFlow

Yuan Liu@YuanLiu41955461

Excited to share Track4World, feedforward 3D tracking of all pixels in the world-centric coordinate system. Code has been released, and welcome to try it! Homepage: jiah-cloud.github.io/Track4World.gi… Code: github.com/TencentARC/Tra… Paper: arxiv.org/abs/2603.02573

English

0

3

7

1K

AnySyn3D retweetledi

Yuan Liu@YuanLiu41955461·6 Mar

Excited to share Track4World, feedforward 3D tracking of all pixels in the world-centric coordinate system. Code has been released, and welcome to try it! Homepage: jiah-cloud.github.io/Track4World.gi… Code: github.com/TencentARC/Tra… Paper: arxiv.org/abs/2603.02573

English

4

44

266

18.3K

AnySyn3D retweetledi

Xiaoyan Cong@xiaoyan_cong·18 Mar

💡Introducing 𝑼𝑴𝑶 -- one unified model that unlocks motion foundation model (HY-Motion @TencentHunyuan) priors for 𝟏𝟎+ 𝐭𝐚𝐬𝐤𝐬: 𝐞𝐝𝐢𝐭𝐢𝐧𝐠, 𝐫𝐞𝐚𝐜𝐭𝐢𝐨𝐧 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧, 𝐬𝐭𝐲𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧, 𝐭𝐫𝐚𝐣𝐞𝐜𝐭𝐨𝐫𝐲 𝐜𝐨𝐧𝐭𝐫𝐨𝐥, 𝐨𝐛𝐬𝐭𝐚𝐜𝐥𝐞 𝐚𝐯𝐨𝐢𝐝𝐚𝐧𝐜𝐞, 𝐤𝐞𝐲𝐟𝐫𝐚𝐦𝐞 𝐢𝐧𝐟𝐢𝐥𝐥𝐢𝐧𝐠... (1/8) 🌐 Webpage: oliver-cong02.github.io/UMO.github.io/ 📄 Paper: arxiv.org/abs/2603.15975

English

5

34

163

22K

AnySyn3D@AnySyn3D·18 Mar

RT @frankzydou: We have seen many works unlock the power of pretrained models for images and videos🏞️. But what about human motion🕺💃? Can…

English

0

1

0

15

AnySyn3D retweetledi

Zhiyang (Frank) Dou@frankzydou·6 Oca

We present EgoReAct: Real-time 3D human reaction generation from streaming egocentric video. 🌟Reacting to streaming egocentric video is something humans do every day. We hope EgoReAct makes human motion more human-like. 🔎 What we found: existing ego-reaction data can be spatially inconsistent (e.g., moving reactions paired with fixed-camera videos), which breaks 3D grounding. 📷 What we built: HRD, a spatially aligned egocentric video–reaction dataset (3,500 pairs, 32 categories), plus a spatially aligned ViMo fix for fair evaluation. (Instead of collecting expensive ground-truth motion, we employ VDM to generate the egocentric videos.) 👁️⚡🏃 Our simple yet effective pipeline: motion tokenization for compact discrete codes + an autoregressive Transformer for online, strictly-causal generation. Metric depth and head dynamics further improve 3D spatial consistency. Project Page: frank-zy-dou.github.io/projects/EgoRe… ArXiv: arxiv.org/abs/2512.22808 #HumanMotion #EgocentricVision #3D #ARVR #Animation #AIGC #DeepLearning #GenerativeAI #Graphics #ComputerVision #Motion

Cambridge, MA 🇺🇸 English

6

29

159

11.2K

AnySyn3D retweetledi

Cheng Lin@_cheng_lin·4 Ara

Welcome to check out our spotlight paper at NeurIPS2025！🌟

Zhiyang (Frank) Dou@frankzydou

Please check out paper #MOSPA "🎧Human Motion Generation Driven by Spatial Audio” at #NeurIPS2025 (🌟Spotlight)! 😊We have released our dataset and models : ) 💡The paper tackles the challenge of spatial-audio-driven human motion generation, enabling virtual humans to respond dynamically and realistically to diverse spatial sounds — not just “what” is sounding, but also “where” and “how” it sounds in space. 💡We introduce SAM, the first comprehensive Spatial Audio-Driven Human Motion dataset, with diverse spatial audio scenarios and high-quality 3D motion pairs, providing a solid benchmark for studying human motion conditioned on spatial audio. 💡Building on this, MOSPA is a diffusion-based generative framework that fuses semantic and spatial features of the audio to synthesize diverse, realistic motions aligned with spatial audio cues, achieving state-of-the-art performance on this new task and offering a strong baseline for future research. If you work on virtual humans, spatial audio, XR, or humanoid / embodied control, this can be a good motion skill learning source. Please come meet the team at our #NeurIPS2025 San Diego Spotlight poster! 📍 Exhibit Hall C,D,E — #4310 🕚 Fri, Dec 5 | 11 a.m.–2 p.m. PST Homepage: frank-zy-dou.github.io/projects/MOSPA… Paper: arxiv.org/abs/2507.11949 Code and Data: github.com/xsy27/Mospa-Ac… #NeurIPS #NeurIPS2025 #MOSPA #motion #Animation #SpatialAudio #VirtualHuman #Robotics #Robot #AI #Deeplearning #GenerativeAI #AIGC

English

1

5

590

AnySyn3D retweetledi

Zhiyang (Frank) Dou@frankzydou·3 Ara

Please check out paper #MOSPA "🎧Human Motion Generation Driven by Spatial Audio” at #NeurIPS2025 (🌟Spotlight)! 😊We have released our dataset and models : ) 💡The paper tackles the challenge of spatial-audio-driven human motion generation, enabling virtual humans to respond dynamically and realistically to diverse spatial sounds — not just “what” is sounding, but also “where” and “how” it sounds in space. 💡We introduce SAM, the first comprehensive Spatial Audio-Driven Human Motion dataset, with diverse spatial audio scenarios and high-quality 3D motion pairs, providing a solid benchmark for studying human motion conditioned on spatial audio. 💡Building on this, MOSPA is a diffusion-based generative framework that fuses semantic and spatial features of the audio to synthesize diverse, realistic motions aligned with spatial audio cues, achieving state-of-the-art performance on this new task and offering a strong baseline for future research. If you work on virtual humans, spatial audio, XR, or humanoid / embodied control, this can be a good motion skill learning source. Please come meet the team at our #NeurIPS2025 San Diego Spotlight poster! 📍 Exhibit Hall C,D,E — #4310 🕚 Fri, Dec 5 | 11 a.m.–2 p.m. PST Homepage: frank-zy-dou.github.io/projects/MOSPA… Paper: arxiv.org/abs/2507.11949 Code and Data: github.com/xsy27/Mospa-Ac… #NeurIPS #NeurIPS2025 #MOSPA #motion #Animation #SpatialAudio #VirtualHuman #Robotics #Robot #AI #Deeplearning #GenerativeAI #AIGC

Zhiyang (Frank) Dou@frankzydou

Excited to share our latest work on 🎧spatial audio-driven human motion generation. We aim to tackle a largely underexplored yet important problem of enabling virtual humans to move naturally in response to spatial audio—capturing not just what is heard, but also where the sound is coming from. To this end, we introduce the Spatial Audio-Driven Human Motion (SAM) dataset—the first comprehensive dataset featuring paired high-quality human motion and spatial audio recordings. For benchmarking, we develop a generative framework for human MOtion generation driven by SPAtial audio, termed MOSPA, which learns to synthesize realistic and diverse human motions conditioned on spatial audio input. We hope this research could provide a foundation for future research in spatial perception, virtual characters, and embodied AI. The dataset and model will be open-sourced soon. A big thank you to our intern, Shuyang Xu, for the wonderful collaboration! Congratulations, Shuyang! Project page: frank-zy-dou.github.io/projects/MOSPA… Paper: arxiv.org/abs/2507.11949 Video: youtu.be/p_xwTDA-K0g #Animation #CG #CV #AIGC #DL #Deeplearning #Motion #Graphics #AI #GenerativeAI

English

0

9

19

4.7K

AnySyn3D retweetledi

Zhiyang (Frank) Dou@frankzydou·3 Ara

Please check out Chen (@chenwangcw) and Chuhao (@MorPhLingXD)’s work “PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation” today (Dec 3, 2025) at #NeurIPS2025! 🕚 11:00 AM – 2:00 PM PST 📍 Exhibit Hall C, D, E — Poster #4315

Chen Wang@chenwangcw

Excited to share our #NeurIPS2025 paper: PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation. We propose a novel framework to improve the controllability and physics plausibility of video models. Project Page: cwchenwang.github.io/physctrl/ (1/n)

English

0

5

38

4K

AnySyn3D retweetledi

Huaijin Pi@HuaijinPi·30 Kas

🚀 Excited to share our NeurIPS 2025 paper: CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects 🔗 Project page: phj128.github.io/page/CoDA/inde… 🔗 Code: github.com/phj128/CoDA 🔗 Paper: arxiv.org/abs/2505.21437

English

5

15

85

19.9K

AnySyn3D@AnySyn3D·30 Kas

RT @frankzydou: Please check out paper #CoDA “Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects”…

English

0

3

0

15

AnySyn3D retweetledi

Zhen Liu@ItsTheZhen·18 Eki

Can LLMs design real machines — from 🚗 cars to 🏹 catapults? Can they engineer through both 🧠 agentic workflows and 🌀 reinforcement learning (RL) — learning from physical simulation instead of text alone? We treat machine design as “machine code writing”, where LLMs assemble mechanisms from standard parts. To explore this, we built 🧩 BesiegeField — a real-time, physics-based sandbox where LLMs can build, test, and evolve machines through agentic planning or RL-based self-improvement. Our findings: 1️⃣ Even top LLMs fail to build working catapults — easy for humans but highly dynamic ⚙️ and nonlinear. 2️⃣ RL helps — working designs emerge through interaction. 3️⃣ Aligning reasoning 🧩 with construction 🔩 remains a key challenge. This marks the first step toward LLMs that learn to design through action — bridging reasoning, physics, and embodiment. 🛠️🤖 🌐 Project Website: besiegefield.github.io 💻 GitHub (RL & Agentic Workflow): github.com/Godheritage/Be… 👥 Joint work w/ @Besteuler & Wenqian Zhang

English

2

17

78

18.9K

AnySyn3D

Keşfet