Philipp Wu
448 posts

Philipp Wu
@philippswu
PhD @Berkeley_AI advised by @pabbeel. Previously @MetaAI @covariantai.

Really excited to release mjviser, a web-based MuJoCo viewer, powered by Viser. It has almost all the features of the native MuJoCo viewer, but runs in your browser. Load and simulate any MuJoCo model with a single uv command 👇
uvx mjviser

Humans can see in high-res, high-FPS in real-time. Why can't VLMs? Introducing AutoGaze: ViTs/VLMs "gaze" only at key video regions! Up to 4-100x token savings, 19x speedup, and enables scaling to 4K-res 1K-frame videos. 📄 arxiv.org/abs/2603.12254 🌐 autogaze.github.io 🤗 huggingface.co/collections/bf… (1/n)🧵



Introducing G1 Moves! 60 open-source motion capture clips + trained RL policies for the Unitree G1 humanoid robot. Come see live robot mocap and interactive roasts at the Dell booth at #GTC this week! huggingface.co/spaces/exptech… #DellProPrecision #DellTech #NVIDIA #Robotics

Coming soon to mjlab: heterogeneous worlds, aka every world gets its own object 👀




FPO++! We got RL on flow policies working on real robot tasks. Sim2real on humanoids trained from scratch + manipulation finetuning in sim with action chunking. Excited about this direction because we can now use RL with expressive policies to discover new behaviors!

Just shipped a major domain randomization overhaul in mjlab and I'm super excited about it! The biggest highlight is physically consistent inertia randomization. Mass, center of mass, and the inertia tensor now vary together through a pseudo inertia parameterization, so every sample corresponds to a real rigid body. If you randomize those fields independently, you can end up with models that look fine numerically but are physically impossible. This fixes that. You can also randomize geom sizes at runtime. In C MuJoCo this breaks the collision tree, but MuJoCo Warp does not rely on a static BVH, so we recompute the collision bounds after each size change and keep things consistent. Link lengths, link angles, geom offsets, and site poses are safe to randomize now too. mujocolab.github.io/mjlab/main/sou…

VLAs (from VLMs) ❌ => WAMs (from Video Models) ✅ Why WAMs? 1️⃣ World Physics: VLMs know the internet, but Video Models implicitly model the physical laws essential for manipulation. 2️⃣ The "GPT Direction": VLAs are like BERT (rely heavily on task-specific post-training). WAMs are like GPT (pre-train & prompt), unlocking incredible zero-shot transfer! What I want to see in 2026: 📈 Scaling Laws: We will see much clearer scaling laws for robotics compared to VLAs. 🤝 Human-to-Robot Transfer: Unlocking massive transfer capabilities using video as a shared representation space. 🤖 Zero-Shot Mastery: Moving from short-horizon tasks to long-horizon, dexterous manipulation without task-specific demonstrations. We recently open-sourced the checkpoints, training and inference code. Dive into the research! 👇 📄 Paper: arxiv.org/abs/2602.15922 💻 Code: github.com/dreamzero0/dre… 🤗 HF: huggingface.co/GEAR-Dreams/Dr…

Introducing Mesh

Some exciting Friday news 🙂 We just open-sourced our system identification toolbox in MuJoCo 3.5. Get started today: "pip install mujoco[sysid]" mjlab v1.1 is also out featuring a brand new RGB-D renderer and now fully available on PyPI. Install with: "pip install mjlab"




[Accepted to ICRA 2026!] 🚀 Introducing EgoMI: An egocentric manipulation interface that captures synchronized 6-DoF head and hand trajectories from egocentric human demonstrations! Transfers to IL policies zero-shot w/o visual augmentation or on-embodiment data. 1/n
