Siqiao Huang

221 posts

Siqiao Huang

@KnightNemo_

Junior undergrad, Yao class @Tsinghua_Uni | Current intern @uwcse | ML & Robotics | World Models / WAMs / Humanoid Foundation Models | Prev. RA @mldcmu.

شامل ہوئے Ağustos 2024

1.2K فالونگ987 فالوورز

Siqiao Huang@KnightNemo_·10h

A very neat set of very interesting and worthwhile topics:)

Guanya Shi@GuanyaShi

My lab at CMU @LeCARLab is hiring a postdoc! (Vibe-made the poster via @ChatGPTapp)

English

395

Siqiao Huang@KnightNemo_·2d

If you are interested in Loop Models, definitely check out this amazing repo by @huskydogewoof !

Benhao Huang@huskydogewoof

Introducing 🔁 Awesome-Loop-Models: a curated repo for keeping up with loop models! Whether you are just entering the field or have been exploring loop models for a while, this repo is built to serve as an actively updated map for mechanism analysis, architecture and algorithm design, applications, and related directions. 🧵 [1/n]

English

629

Siqiao Huang@KnightNemo_·3d

@PiotrPadlewski I think a simple task is given a PowerPoint snapshot, can Claude reproduce it with python or whatever tool, in my case, this is almost impossible for Claude code to do.

English

177

Piotr Padlewski@PiotrPadlewski·4d

I am on ICLR, dm if you wanna chat about multimodal, or where Claude needs to improve there

English

2.3K

Siqiao Huang ری ٹویٹ کیا

Max Simchowitz@max_simchowitz·6d

Unlocking test-time scaling and search in generative models has been a fundamental obstacle, and is increasingly important as these methods are deployed in science, engineering, and robotics. It was a pleasure to work with @peholderrieth on a paradigm that finally makes this capability genuinely possible, and lets you trade off training time vs inference time according to your needs. Check it out!

Peter Holderrieth@peholderrieth

We release Diamond Maps💎 unlocking accurate and efficient guidance for diffusion models. Our experiments show that our methods scale incredibly well. Excited to see what people will build with this! Accurate guidance has been a notoriously hard problem, but in this work, we’re bringing TWO (!) solutions to the table. The recipe for success: 1️⃣ Speed: Use distilled models (flow maps, mean flows, consistency models). 2️⃣ Exploration: Inject stochasticity to properly explore your search space. Because this fundamentally improves anything using flow matching and diffusion, we see a lot of potential for applications across audio, robotics, molecules, and beyond. Paper: arxiv.org/abs/2602.05993 Code: github.com/PeterHolderrie… Huge thanks to an amazing team: Douglas Chen, @LucaEyring, @ishin_shah, Giri Anantharaman, @electronickale, @zeynepakata, Tommi Jaakkola, @nmboffi, and @max_simchowitz. It was awesome bringing this to life together!

English

12.7K

Siqiao Huang@KnightNemo_·6d

Congrats, this is huge!! This is probably the only reason that I might be subscribing back to OpenAI accounts lol.

Boyuan Chen@BoyuanChen0

This is what I’ve been cooking in the past 4 months . GPT Image 2 is over a massive 240 elo jump over the second place model, marking the biggest jump bigger than the rest of the leaderboard combined

English

1.1K

Saining Xie@sainingxie·20 Nis

see you in Rio

AMI Labs@amilabs

Heading to ICLR in Rio 🇧🇷? We’re hosting our first networking mixer on April 24. Meet AMI’s technical team and cofounders, and learn more about what we’re building. Food, drinks, and great conversation included. Register at luma.com/np3x51zh

English

187

26K

Siqiao Huang@KnightNemo_·21 Nis

@sainingxie Let’s goooo

English

300

Siqiao Huang@KnightNemo_·20 Nis

@ShuaiZhou234190 How exactly is it hard? The hard part is probably having a B200 for realtime deploy, but that applies to every WAM realworld deploy

English

191

Shuai Zhou@ShuaiZhou234190·19 Nis

Has anyone fine-tuned and deployed DreamZero on Aloha? would like to discuss some details about deployment after finetune...

English

1.4K

Siqiao Huang@KnightNemo_·17 Nis

@SamCJG @sedielem You may wanna check if neural data follows the power low of natural images, as in @sedielem ‘s blog: sander.ai/2024/09/02/spe…. This is well-suited as elastictok/flextok/semanticist essentially does spectral decomposition

English

Sam Gijsen@SamCJG·16 Nis

@sedielem Never got a semanticist setup to work with neural data, wondering if there's something about natural image statistics thats particularly favourable to this kind of approach.

English

136

Sander Dieleman@sedielem·16 Nis

FlexTok/Semanticist provided an elegant recipe to learn semantically coarse-to-fine sequence representations of images. This works for video as well: preserve the temporal axis, replace the spatial axes with a semantic coarse-to-fine axis. Promising for long video generation!

Andrei Atanov@andrew_atanov

Are all videos worth the same number of tokens? Whether rich in motion or visually minimal, standard 3D-grid tokenizers treat them equally. We present VideoFlexTok, which represents videos using a flexible-length, coarse-to-fine sequence of tokens. Page: videoflextok.epfl.ch Demo: huggingface.co/spaces/EPFL-VI… Paper: arxiv.org/abs/2604.12887 1/n

English

9.9K

Siqiao Huang@KnightNemo_·17 Nis

@huskydogewoof Awww, thx Benhao!

English

134

Benhao Huang@huskydogewoof·17 Nis

Siqiao is a rising star in world models and robotics, with sharp insights and exciting work. Really enjoyed chatting with him. If you’ll be at ICLR, definitely reach out and say hi 👋

Siqiao Huang@KnightNemo_

I will be at #ICLR2026 next week, where I'll be presenting our work Vid2World (arxiv.org/abs/2505.14357). DMs are open, coffee chats are much welcomed, and would love some dinner party invitations😍. I would be super thrilled to chat about World Models, World Action Models, Dexterous Manipulation, Representation Learning, Humanoids etc.

English

1.2K

Siqiao Huang@KnightNemo_·17 Nis

English

7.3K

Siqiao Huang@KnightNemo_·16 Nis

@chris_j_paxton For those who don’t know, the co-founders of this company include: - first author of RDT-1&2 - first author of LBM-cotrain, OnetwoVLA, Imitation Learning Data Scaling Law Better look out, they are coooking! Also @wuji_global hands🙌

English

763

Chris Paxton@chris_j_paxton·15 Nis

What do you mean December 2025

LiberAI@liberai_robot

Founded in Dec 2025, our mission is to bring general-purpose home robots into everyday life — and give people back their liberty. Our first step: bridging superintelligence with the physical world. We're just getting started. 🚀 #Robotics #EmbodiedAI #AGI #Startups #DeepTech #AI

English

132

24.8K

Siqiao Huang ری ٹویٹ کیا

Max Simchowitz@max_simchowitz·15 Nis

⚠️Public Service Announcement for ICLR folks headed to Rio: Rio is beautiful, but getting a travel adapter in Brazil is a pain in the ass, and they don't use similar outlets to other major regions (US, UK, Europe, China). Buy a Brazil-specific adapter before traveling. Enjoy :)

English

12.8K

Siqiao Huang@KnightNemo_·25 Mar

If you squint at it harder, this is JEPA (done right with diffusion) plus reconstruction loss. In Chinese, we mock "质疑->理解->成为", which translates to "questioning->understanding->becoming", I am in my questioning JEPA to understanding JEPA transition phase.😇

Aaron (Youshen) Lim@youshenlim

The two-stage training bottleneck in latent diffusion is solved. UNITE unifies tokenization and generation into single-stage end-to-end training, achieving FID 2.12 on ImageNet without adversarial losses or pretrained encoders. arxiv.org/abs/2603.22283

English

110

17.9K

Siqiao Huang@KnightNemo_·24 Mar

Great science of WAM paper! My two cents: the main distinction between WAMs and VLAs that actually matters is representations, which are largely paired with the data they have seen during pretraining, i.e. Video data for WAMs, vision language understanding for VLMs.

Hang Zhao@zhaohang0124

Our recent findings on World Action Models (WAMs): the core advantage of WAMs is not test-time “imagination” of futures, but the training-time supervision from future video prediction. We propose Fast-WAM, which makes inference simple, fast, and policy-centric.

English

3.6K

Siqiao Huang@KnightNemo_·19 Mar

@leothecurious @xxunhuang No, the point is WM allows offloading TD-learning tax, and you get: parallelizable, cheap, and safe learning. But still, I think on-policy learning from scratch is a bad idea, for non-foundation models (arxiv.org/pdf/2510.14830) and foundation models (e.g. VLA, WAMs) alike.

English

davinci@leothecurious·19 Mar

@xxunhuang what happened to the world model revolution?

English

363

Xun Huang@xxunhuang·19 Mar

💯But on-policy learning in the real-world is too costly and dangerous, and existing physics engines do not scale.

davinci@leothecurious

scaling off-policy teleop data is boring. it's also an uphill climb, not a flywheel. i want to see on-policy self-improving robotic models work. i want to see robots that flail around, try to do things badly, learn from mistakes, do them better on the next try, and before u know it, achieve superhuman competence at a task. i want to see robots that are goal-conditioned. ones that explore optimal methods for satisfying task requirements, not just mimicking human ones. if the sucess of ur robotic model depends on perpetually scaling expert demonstrations, u're in for a rude awakening a few years down the line.

English

4.5K

Siqiao Huang@KnightNemo_·17 Mar

@xxunhuang I think Nvidia does a good job coining them as "World Models" and "World Action Models"

English

332

Xun Huang@xxunhuang·17 Mar

I’ve spent a lot of time explaining the distinction between these two approaches, and this blog does an excellent job of capturing it clearly. Should we consider giving them more distinct names, such as Video World Simulators vs Video World Policies?

Anirudha Majumdar@Majumdar_Ani

x.com/i/article/2033…

English

17.9K

Siqiao Huang@KnightNemo_·16 Mar

@thoma_gu @NVIDIAAI @NVIDIAAIDev @PennEngineers congrats, looking forward to seeing more wm papers from your lab!

English

113

Jiatao Gu@thoma_gu·15 Mar

Very grateful to @NVIDIAAI and @NVIDIAAIDev for supporting our research on "world models" at @PennEngineers through the NVIDIA Academic Grant Program! #NVIDIAGrant