Zihan Wang

56 posts

Zihan Wang banner
Zihan Wang

Zihan Wang

@Z1hanW

4D vision + Robotics | Research Intern @ Frontier AI & Robotics (FAR), Amazon

Katılım Mart 2022
214 Takip Edilen312 Takipçiler
Zihan Wang
Zihan Wang@Z1hanW·
the most important is the problem formulation, and if one can identify it and make it work by so-called ‘magic glue’ with lowest cost, it should be a ‘pro’ not ‘con’
Guanya Shi@GuanyaShi

I’m so tired of writing rebuttals to this kind of “lack of novelty” review: “This paper trivially combines A, B, and C, so the algorithmic novelty is limited.” Technically, most (if not all) robotics papers are convex combinations of existing ideas. I still deeply appreciate A+B+C papers—especially when they deliver: - New capabilities: the “trivial combination” unlocks behaviors we simply couldn’t achieve before - Sensible & organic design: A+B+C is clearly the right composition—not some arbitrary A′+B+C′ - Nontrivial interactions: careful analysis of the dynamics, coupling, or failure modes between A, B, C - Rehabilitating old ideas: A was dismissed for years, but paired with modern B/C, it suddenly works—and teaches us why - System-level & "interface" insight: the contribution is not any single piece, but how the pieces talk to each other - Scaling laws or regimes: identifying when/why A+B+C works (and when it doesn’t) - Engineering clarity: making something actually work robustly in the real world is not “trivial” - New problem formulations: sometimes the real novelty is in the reformulation—only under this view does A+B+C make sense. Maybe worth keeping these in mind when reviewing the next A+B+C paper : )

English
0
0
0
80
Adria Lopez
Adria Lopez@alopeze99·
@Z1hanW Cool work! Any plans to extend this for (moving) object interaction?
English
1
0
0
15
Zihan Wang
Zihan Wang@Z1hanW·
CRISP is accepted at ICLR 2026!!! @iclr_conf Excited to see more impact of building simulation-ready assets from monocular video on animation / robotics code is ready (github.com/Z1hanW/CRISP-R…) with the cleaned-up videos, including several parkour videos clipped from YouTube.
Zihan Wang@Z1hanW

Introduce CRISP, a real-to-sim pipeline that recovers human motion and simulatable scene geometry from monocular video! CRISP builds contact-faithful 3D scene for simulation - 8× fewer sim failures, +43% faster sim, and improves human motion! Interactive demos👉: crisp-real2sim.github.io/CRISP-Real2Sim/ Exciting collaboration w/ @JiashunWang @jefftan969 @_Tsukasane @ Jessica Hodgins @shubhtuls @RamananDeva

English
2
20
82
8.3K
Zihan Wang retweetledi
Ethan Weber
Ethan Weber@ethanjohnweber·
Toon3D: Seeing Cartoons from New Perspectives 🔗 toon3d.studio Riley (co-lead, @cardiacmangoes) will also be at the conference! He'll be starting a PhD at Stanford soon! 😃
English
1
1
17
925
Zihan Wang retweetledi
Haozhi Qi
Haozhi Qi@HaozhiQ·
Super excited to share HandelBot! Getting robots to play the piano is quite challenging. Combining sim + real world RL helps! Fantastic work leading this effort by @amberxie_! 🎹🤖
Amber Xie@amberxie_

Introducing HandelBot 🎹🤖, a real-world piano playing robot! Piano is extremely hard (even for humans!). We take a small but exciting step to replicate this beautiful skill w HandelBot. Our insight is combining sim priors w real world refinement & RL. w/ @haozhiq @DorsaSadigh

English
0
11
78
11.2K
Yuezhi Yang
Yuezhi Yang@yang_yuezhi·
Excited to share our new work at CVPR 2026: Learning Convex Decomposition via Feature Fields. We introduce the first feedforward openworld model that generates high-quality convex decomposition for any 3D shapes in seconds, enabling faster simulation. 🔗research.nvidia.com/labs/sil/proje…
English
4
30
175
23.1K
Zihan Wang retweetledi
Junyi Zhang
Junyi Zhang@junyi42·
𝗢𝗻𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗰𝗮𝗻’𝘁 𝗿𝘂𝗹𝗲 𝘁𝗵𝗲𝗺 𝗮𝗹𝗹. We present 𝗟𝗼𝗚𝗲𝗥, a new 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 architecture for long-context geometric reconstruction. LoGeR enables stable reconstruction over up to 𝟭𝟬𝗸 𝗳𝗿𝗮𝗺𝗲𝘀 / 𝗸𝗶𝗹𝗼𝗺𝗲𝘁𝗲𝗿 𝘀𝗰𝗮𝗹𝗲, with 𝗹𝗶𝗻𝗲𝗮𝗿-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 in sequence length, 𝗳𝘂𝗹𝗹𝘆 𝗳𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 inference, and 𝗻𝗼 𝗽𝗼𝘀𝘁-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. Yet it matches or surpasses strong optimization-based pipelines. (1/5) @GoogleDeepMind @Berkeley_AI
English
64
448
3.4K
550K
Zihan Wang retweetledi
Wenjia Wang
Wenjia Wang@WenjiaWang_HKU·
🚀 Excited to share our #CVPR2026 paper: EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents. EmbodMocap, a portable yet affordable solution requiring only two moving iPhones—no calibrated multi-view camera studio, motion capture suits, or LiDAR sensors needed. With our fully automated optimization pipeline, you can effortlessly obtain high-precision scene meshes, human interaction motions, RGBD images, and camera parameters. The captured data is ready for training human-scene reconstruction models (like TRAM, pi3, etc.) and humanoid control policies (like deepmimic, AMP, etc.). What you need to do: 1. Borrow or buy two iPhone 12 Pros from eBay (600 USD in total). 2. Find 2 friends, then capture the sequences. 3. Deploy our repo, run our code, and get the results! The code and data will be released within 1 week. (Just come back to work from the Chinese Spring Festival, Happy Chinese New Year!) 📷 Project page: wenjiawang0312.github.io/projects/embod… 📷ArXiv: arxiv.org/abs/2602.23205 📷Code: github.com/WenjiaWang0312…
English
10
46
281
16.3K
Zihan Wang retweetledi
Shubham Tulsiani
Shubham Tulsiani@shubhtuls·
[1/N] Current visual geometry prediction models primarily rely on labeled 3D data. Our CVPR26 paper, Flow3r, allows additionally leveraging unlabeled videos (using flow supervision) for scalable visual geometry learning, enabling accurate multi-view 3D reconstruction in-the-wild.
English
2
26
207
15.4K
Zihan Wang retweetledi
Yuxuan Kuang
Yuxuan Kuang@yuxuank02·
How to enable dexterous hands with manipulation capabilities that work across diverse objects, tasks, scenes, camera views, and external perturbations? Excited to share Dex4D, a method for generalizable sim-to-real dexterous manipulation via a task-agnostic point track policy and video generation planners! NO parallel grippers, NO teleop! Project page: dex4d.github.io 🧵👇 0/10 #Robotics #EmbodiedAI #Manipulation #AI #ComputerVision
English
1
29
96
20K
Zihan Wang retweetledi
Zhen Wu
Zhen Wu@zhenkirito123·
Can humanoids perform agile, autonomous, long-horizon parkour—based on what they see in the world? We present 𝗣𝗲𝗿𝗰𝗲𝗽𝘁𝗶𝘃𝗲 𝗛𝘂𝗺𝗮𝗻𝗼𝗶𝗱 𝗣𝗮𝗿𝗸𝗼𝘂𝗿 (𝗣𝗛𝗣): a framework that chains dynamic human skills using onboard depth perception for long-horizon traversal. 1/6
English
23
135
692
136.4K
Zihan Wang retweetledi
Sirui Chen
Sirui Chen@eric_srchen·
What missing in RL based humanoid controller from industrial robots are precision and force control. CHIP can do both. We propose a simple recipe to build humanoid impedance controller, which can be used for wiping, carrying large objects and multi-robot collaboration.
Zi-ang Cao@ziang_cao

🚀 Introducing CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation! Current humanoids face a trade-off: they are either Agile & Stiff OR Slow & Soft. CHIP breaks this barrier. We enable on-the-fly switching between Compliant (wiping 🧼, collaborative holding 📦) and Stiff (lifting dumbbells 🏋️, opening doors 🚪💪) behaviors—all while maintaining agile skills like running! 🏃💨 Website: nvlabs.github.io/CHIP/ Join me for a deep dive on how CHIP enables adaptive control for complex tasks. 🧵↓

English
0
3
14
3K
Zihan Wang retweetledi
Sirui Xu
Sirui Xu@xu_sirui·
Humanoids need autonomy + versatility + generalization to be truly useful. Loco-manipulation makes that hard. InterPrior is our step toward bridging the gap — one policy, no reference. Could be promising for immersive games 🎮 and real robots 🤖 🔗 sirui-xu.github.io/InterPrior 📜 arxiv.org/abs/2602.06035 [1/9]
English
5
43
219
35.1K
Zihan Wang retweetledi
Haiwen (Haven) Feng
Haiwen (Haven) Feng@HavenFeng·
✨Thinking with Blender~ Meet VIGA: a multimodal agent that autonomously codes 3D/4D blender scenes from any image, with no human, no training! @berkeley_ai #LLMs #Blender #Agent 🧵1/6
English
72
309
2.1K
331.7K
Chen Geng
Chen Geng@gengchen01·
✨ Any static 3D assets ➡️ 4D dynamic worlds. Introducing CHORD, a universal framework for generating scene-level 4D dynamic motion from any static 3D inputs. It generalizes surprisingly well across a wide range of objects 🤯 and can even be used to learn robotics manipulation policy 🤖! Project page: yanzhelyu.github.io/chord. Dive deeper in a 🧵: 1/n
English
10
65
408
41.4K
Zihan Wang
Zihan Wang@Z1hanW·
@gengchen01 great work! would love to see some results on SMPL / g1
English
0
0
2
276