
My lab at CMU @LeCARLab is hiring a postdoc! (Vibe-made the poster via @ChatGPTapp)
Siqiao Huang
221 posts

@KnightNemo_
Junior undergrad, Yao class @Tsinghua_Uni | Current intern @uwcse | ML & Robotics | World Models / WAMs / Humanoid Foundation Models | Prev. RA @mldcmu.

My lab at CMU @LeCARLab is hiring a postdoc! (Vibe-made the poster via @ChatGPTapp)

Introducing 🔁 Awesome-Loop-Models: a curated repo for keeping up with loop models! Whether you are just entering the field or have been exploring loop models for a while, this repo is built to serve as an actively updated map for mechanism analysis, architecture and algorithm design, applications, and related directions. 🧵 [1/n]


We release Diamond Maps💎 unlocking accurate and efficient guidance for diffusion models. Our experiments show that our methods scale incredibly well. Excited to see what people will build with this! Accurate guidance has been a notoriously hard problem, but in this work, we’re bringing TWO (!) solutions to the table. The recipe for success: 1️⃣ Speed: Use distilled models (flow maps, mean flows, consistency models). 2️⃣ Exploration: Inject stochasticity to properly explore your search space. Because this fundamentally improves anything using flow matching and diffusion, we see a lot of potential for applications across audio, robotics, molecules, and beyond. Paper: arxiv.org/abs/2602.05993 Code: github.com/PeterHolderrie… Huge thanks to an amazing team: Douglas Chen, @LucaEyring, @ishin_shah, Giri Anantharaman, @electronickale, @zeynepakata, Tommi Jaakkola, @nmboffi, and @max_simchowitz. It was awesome bringing this to life together!

This is what I’ve been cooking in the past 4 months . GPT Image 2 is over a massive 240 elo jump over the second place model, marking the biggest jump bigger than the rest of the leaderboard combined




Are all videos worth the same number of tokens? Whether rich in motion or visually minimal, standard 3D-grid tokenizers treat them equally. We present VideoFlexTok, which represents videos using a flexible-length, coarse-to-fine sequence of tokens. Page: videoflextok.epfl.ch Demo: huggingface.co/spaces/EPFL-VI… Paper: arxiv.org/abs/2604.12887 1/n

I will be at #ICLR2026 next week, where I'll be presenting our work Vid2World (arxiv.org/abs/2505.14357). DMs are open, coffee chats are much welcomed, and would love some dinner party invitations😍. I would be super thrilled to chat about World Models, World Action Models, Dexterous Manipulation, Representation Learning, Humanoids etc.



Founded in Dec 2025, our mission is to bring general-purpose home robots into everyday life — and give people back their liberty. Our first step: bridging superintelligence with the physical world. We're just getting started. 🚀 #Robotics #EmbodiedAI #AGI #Startups #DeepTech #AI

The two-stage training bottleneck in latent diffusion is solved. UNITE unifies tokenization and generation into single-stage end-to-end training, achieving FID 2.12 on ImageNet without adversarial losses or pretrained encoders. arxiv.org/abs/2603.22283

Our recent findings on World Action Models (WAMs): the core advantage of WAMs is not test-time “imagination” of futures, but the training-time supervision from future video prediction. We propose Fast-WAM, which makes inference simple, fast, and policy-centric.


scaling off-policy teleop data is boring. it's also an uphill climb, not a flywheel. i want to see on-policy self-improving robotic models work. i want to see robots that flail around, try to do things badly, learn from mistakes, do them better on the next try, and before u know it, achieve superhuman competence at a task. i want to see robots that are goal-conditioned. ones that explore optimal methods for satisfying task requirements, not just mimicking human ones. if the sucess of ur robotic model depends on perpetually scaling expert demonstrations, u're in for a rude awakening a few years down the line.





