Zihan Wang

56 posts

Zihan Wang

@Z1hanW

4D vision + Robotics | Research Intern @ Frontier AI & Robotics (FAR), Amazon

Katılım Mart 2022

214 Takip Edilen312 Takipçiler

Sabitlenmiş Tweet

Zihan Wang@Z1hanW·18 Ara

Introduce CRISP, a real-to-sim pipeline that recovers human motion and simulatable scene geometry from monocular video! CRISP builds contact-faithful 3D scene for simulation - 8× fewer sim failures, +43% faster sim, and improves human motion! Interactive demos👉: crisp-real2sim.github.io/CRISP-Real2Sim/ Exciting collaboration w/ @JiashunWang @jefftan969 @_Tsukasane @ Jessica Hodgins @shubhtuls @RamananDeva

English

344

45.7K

Zihan Wang@Z1hanW·2h

the most important is the problem formulation, and if one can identify it and make it work by so-called ‘magic glue’ with lowest cost, it should be a ‘pro’ not ‘con’

Guanya Shi@GuanyaShi

I’m so tired of writing rebuttals to this kind of “lack of novelty” review: “This paper trivially combines A, B, and C, so the algorithmic novelty is limited.” Technically, most (if not all) robotics papers are convex combinations of existing ideas. I still deeply appreciate A+B+C papers—especially when they deliver: - New capabilities: the “trivial combination” unlocks behaviors we simply couldn’t achieve before - Sensible & organic design: A+B+C is clearly the right composition—not some arbitrary A′+B+C′ - Nontrivial interactions: careful analysis of the dynamics, coupling, or failure modes between A, B, C - Rehabilitating old ideas: A was dismissed for years, but paired with modern B/C, it suddenly works—and teaches us why - System-level & "interface" insight: the contribution is not any single piece, but how the pieces talk to each other - Scaling laws or regimes: identifying when/why A+B+C works (and when it doesn’t) - Engineering clarity: making something actually work robustly in the real world is not “trivial” - New problem formulations: sometimes the real novelty is in the reformulation—only under this view does A+B+C make sense. Maybe worth keeping these in mind when reviewing the next A+B+C paper : )

English

Zihan Wang@Z1hanW·17h

@alopeze99 hi thanks for interest! I think short answer is yes

English

Adria Lopez@alopeze99·1d

@Z1hanW Cool work! Any plans to extend this for (moving) object interaction?

English

Zihan Wang@Z1hanW·18 Ara

English

344

45.7K

Zihan Wang@Z1hanW·1d

CRISP is accepted at ICLR 2026!!! @iclr_conf Excited to see more impact of building simulation-ready assets from monocular video on animation / robotics code is ready (github.com/Z1hanW/CRISP-R…) with the cleaned-up videos, including several parkour videos clipped from YouTube.

Zihan Wang@Z1hanW

English

8.3K

Zihan Wang retweetledi

Ethan Weber@ethanjohnweber·6d

Toon3D: Seeing Cartoons from New Perspectives 🔗 toon3d.studio Riley (co-lead, @cardiacmangoes) will also be at the conference! He'll be starting a PhD at Stanford soon! 😃

English

925

Zihan Wang retweetledi

Haozhi Qi@HaozhiQ·13 Mar

Super excited to share HandelBot! Getting robots to play the piano is quite challenging. Combining sim + real world RL helps! Fantastic work leading this effort by @amberxie_! 🎹🤖

Amber Xie@amberxie_

Introducing HandelBot 🎹🤖, a real-world piano playing robot! Piano is extremely hard (even for humans!). We take a small but exciting step to replicate this beautiful skill w HandelBot. Our insight is combining sim priors w real world refinement & RL. w/ @haozhiq @DorsaSadigh

English

11.2K

Zihan Wang@Z1hanW·11 Mar

@yang_yuezhi very cool work! wondering have you tried on 2.5d case?

English

Yuezhi Yang@yang_yuezhi·11 Mar

Excited to share our new work at CVPR 2026: Learning Convex Decomposition via Feature Fields. We introduce the first feedforward openworld model that generates high-quality convex decomposition for any 3D shapes in seconds, enabling faster simulation. 🔗research.nvidia.com/labs/sil/proje…

English

175

23.1K

Zihan Wang retweetledi

Junyi Zhang@junyi42·9 Mar

𝗢𝗻𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗰𝗮𝗻’𝘁 𝗿𝘂𝗹𝗲 𝘁𝗵𝗲𝗺 𝗮𝗹𝗹. We present 𝗟𝗼𝗚𝗲𝗥, a new 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 architecture for long-context geometric reconstruction. LoGeR enables stable reconstruction over up to 𝟭𝟬𝗸 𝗳𝗿𝗮𝗺𝗲𝘀 / 𝗸𝗶𝗹𝗼𝗺𝗲𝘁𝗲𝗿 𝘀𝗰𝗮𝗹𝗲, with 𝗹𝗶𝗻𝗲𝗮𝗿-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 in sequence length, 𝗳𝘂𝗹𝗹𝘆 𝗳𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 inference, and 𝗻𝗼 𝗽𝗼𝘀𝘁-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. Yet it matches or surpasses strong optimization-based pipelines. (1/5) @GoogleDeepMind @Berkeley_AI

English

448

3.4K

550K

Zihan Wang retweetledi

Wenjia Wang@WenjiaWang_HKU·27 Şub

🚀 Excited to share our #CVPR2026 paper: EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents. EmbodMocap, a portable yet affordable solution requiring only two moving iPhones—no calibrated multi-view camera studio, motion capture suits, or LiDAR sensors needed. With our fully automated optimization pipeline, you can effortlessly obtain high-precision scene meshes, human interaction motions, RGBD images, and camera parameters. The captured data is ready for training human-scene reconstruction models (like TRAM, pi3, etc.) and humanoid control policies (like deepmimic, AMP, etc.). What you need to do: 1. Borrow or buy two iPhone 12 Pros from eBay (600 USD in total). 2. Find 2 friends, then capture the sequences. 3. Deploy our repo, run our code, and get the results! The code and data will be released within 1 week. (Just come back to work from the Chinese Spring Festival, Happy Chinese New Year!) 📷 Project page: wenjiawang0312.github.io/projects/embod… 📷ArXiv: arxiv.org/abs/2602.23205 📷Code: github.com/WenjiaWang0312…

English

281

16.3K

Zihan Wang retweetledi

Shubham Tulsiani@shubhtuls·27 Şub

[1/N] Current visual geometry prediction models primarily rely on labeled 3D data. Our CVPR26 paper, Flow3r, allows additionally leveraging unlabeled videos (using flow supervision) for scalable visual geometry learning, enabling accurate multi-view 3D reconstruction in-the-wild.

English

207

15.4K

Zihan Wang@Z1hanW·24 Şub

@JayKarhade congrats Jay! great work!

English

Jay Karhade@JayKarhade·24 Şub

Thrilled to announce that Any4D has been accepted to #CVPR2026 🎉🥳! Project Page: any-4d.github.io Code: github.com/Any-4D/Any4D Stay tuned for the full code + v1.1 model + training data release very soon🚀! Exciting time to advance and work on #4DWorldModels!

Jay Karhade@JayKarhade

Introducing Any4D, a unified transformer for fully feed-forward, dense, metric-scale 4D reconstruction from flexible inputs! Any4D regresses per-pixel motion + geometry across frames in one pass — 15× faster, 2–3× more accurate reconstructions ⚡📈 Details + code below 👇 Exciting collab with @Nik__V__ @YuchenZhan54250 Tanisha Gupta @akashshrm02 @smash0190 @RamananDeva

English

129

9.9K

Zihan Wang retweetledi

Yuxuan Kuang@yuxuank02·18 Şub

How to enable dexterous hands with manipulation capabilities that work across diverse objects, tasks, scenes, camera views, and external perturbations? Excited to share Dex4D, a method for generalizable sim-to-real dexterous manipulation via a task-agnostic point track policy and video generation planners! NO parallel grippers, NO teleop! Project page: dex4d.github.io 🧵👇 0/10 #Robotics #EmbodiedAI #Manipulation #AI #ComputerVision

English

20K

Zihan Wang retweetledi

Zhen Wu@zhenkirito123·17 Şub

Can humanoids perform agile, autonomous, long-horizon parkour—based on what they see in the world? We present 𝗣𝗲𝗿𝗰𝗲𝗽𝘁𝗶𝘃𝗲 𝗛𝘂𝗺𝗮𝗻𝗼𝗶𝗱 𝗣𝗮𝗿𝗸𝗼𝘂𝗿 (𝗣𝗛𝗣): a framework that chains dynamic human skills using onboard depth perception for long-horizon traversal. 1/6

English

135

692

136.4K

Zihan Wang retweetledi

Sirui Chen@eric_srchen·10 Şub

What missing in RL based humanoid controller from industrial robots are precision and force control. CHIP can do both. We propose a simple recipe to build humanoid impedance controller, which can be used for wiping, carrying large objects and multi-robot collaboration.

Zi-ang Cao@ziang_cao

🚀 Introducing CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation! Current humanoids face a trade-off: they are either Agile & Stiff OR Slow & Soft. CHIP breaks this barrier. We enable on-the-fly switching between Compliant (wiping 🧼, collaborative holding 📦) and Stiff (lifting dumbbells 🏋️, opening doors 🚪💪) behaviors—all while maintaining agile skills like running! 🏃💨 Website: nvlabs.github.io/CHIP/ Join me for a deep dive on how CHIP enables adaptive control for complex tasks. 🧵↓

English

Zihan Wang retweetledi

Sirui Xu@xu_sirui·9 Şub

Humanoids need autonomy + versatility + generalization to be truly useful. Loco-manipulation makes that hard. InterPrior is our step toward bridging the gap — one policy, no reference. Could be promising for immersive games 🎮 and real robots 🤖 🔗 sirui-xu.github.io/InterPrior 📜 arxiv.org/abs/2602.06035 [1/9]

English

219

35.1K

Zihan Wang retweetledi

Brent Yi@brenthyi·6 Şub

New project! Flow Policy Gradients for Robot Control tldr; a simple online RL recipe for training and fine-tuning flow policies for robots co-led w/ @redstone_hong: hongsukchoi.github.io/fpo-control

English

101

607

71.1K

Zihan Wang@Z1hanW·5 Şub

check out the amazing work by @YuWuLucas & Shubham

Shubham Tulsiani@shubhtuls

[1/N] Rotary Position Embeddings (RoPE) are ubiquitous across transformers that process tokens from 1D, 2D, or 3D grids e.g. language, images, or videos. Our RayRoPE formulation extends these to multi-view transformers. Paper and code: rayrope.github.io

English

506

Zihan Wang retweetledi

Haiwen (Haven) Feng@HavenFeng·23 Oca

✨Thinking with Blender~ Meet VIGA: a multimodal agent that autonomously codes 3D/4D blender scenes from any image, with no human, no training! @berkeley_ai #LLMs #Blender #Agent 🧵1/6

English

309

2.1K

331.7K

Zihan Wang@Z1hanW·23 Oca

@Ronald_vanLoon @ZappyZappy7 dogcow

Filipino

Ronald van Loon@Ronald_vanLoon·22 Oca

Learn How to Build an #AI-Powered Dog-Shaped #Robot from Scratch via @ZappyZappy7 #EmergingTech #Robotics #Innovation #Technology

English

254

8.9K

Chen Geng@gengchen01·17 Oca

✨ Any static 3D assets ➡️ 4D dynamic worlds. Introducing CHORD, a universal framework for generating scene-level 4D dynamic motion from any static 3D inputs. It generalizes surprisingly well across a wide range of objects 🤯 and can even be used to learn robotics manipulation policy 🤖! Project page: yanzhelyu.github.io/chord. Dive deeper in a 🧵: 1/n

English

408

41.4K

Zihan Wang@Z1hanW·17 Oca

@gengchen01 great work! would love to see some results on SMPL / g1

English

276

Keşfet

@alopeze99 @JiashunWang @jefftan969 @_Tsukasane @shubhtuls @RamananDeva @iclr_conf @cardiacmangoes