Siqiao Huang

221 posts

Siqiao Huang banner
Siqiao Huang

Siqiao Huang

@KnightNemo_

Junior undergrad, Yao class @Tsinghua_Uni | Current intern @uwcse | ML & Robotics | World Models / WAMs / Humanoid Foundation Models | Prev. RA @mldcmu.

Entrou em Ağustos 2024
1.2K Seguindo987 Seguidores
Siqiao Huang
Siqiao Huang@KnightNemo_·
@PiotrPadlewski I think a simple task is given a PowerPoint snapshot, can Claude reproduce it with python or whatever tool, in my case, this is almost impossible for Claude code to do.
English
1
0
0
177
Piotr Padlewski
Piotr Padlewski@PiotrPadlewski·
I am on ICLR, dm if you wanna chat about multimodal, or where Claude needs to improve there
English
3
0
24
2.3K
Siqiao Huang retweetou
Max Simchowitz
Max Simchowitz@max_simchowitz·
Unlocking test-time scaling and search in generative models has been a fundamental obstacle, and is increasingly important as these methods are deployed in science, engineering, and robotics. It was a pleasure to work with @peholderrieth on a paradigm that finally makes this capability genuinely possible, and lets you trade off training time vs inference time according to your needs. Check it out!
Peter Holderrieth@peholderrieth

We release Diamond Maps💎 unlocking accurate and efficient guidance for diffusion models. Our experiments show that our methods scale incredibly well. Excited to see what people will build with this! Accurate guidance has been a notoriously hard problem, but in this work, we’re bringing TWO (!) solutions to the table. The recipe for success: 1️⃣ Speed: Use distilled models (flow maps, mean flows, consistency models). 2️⃣ Exploration: Inject stochasticity to properly explore your search space. Because this fundamentally improves anything using flow matching and diffusion, we see a lot of potential for applications across audio, robotics, molecules, and beyond. Paper: arxiv.org/abs/2602.05993 Code: github.com/PeterHolderrie… Huge thanks to an amazing team: Douglas Chen, @LucaEyring, @ishin_shah, Giri Anantharaman, @electronickale, @zeynepakata, Tommi Jaakkola, @nmboffi, and @max_simchowitz. It was awesome bringing this to life together!

English
0
9
56
12.7K
Siqiao Huang
Siqiao Huang@KnightNemo_·
@ShuaiZhou234190 How exactly is it hard? The hard part is probably having a B200 for realtime deploy, but that applies to every WAM realworld deploy
English
1
0
1
191
Shuai Zhou
Shuai Zhou@ShuaiZhou234190·
Has anyone fine-tuned and deployed DreamZero on Aloha? would like to discuss some details about deployment after finetune...
English
2
0
7
1.4K
Sam Gijsen
Sam Gijsen@SamCJG·
@sedielem Never got a semanticist setup to work with neural data, wondering if there's something about natural image statistics thats particularly favourable to this kind of approach.
English
1
0
1
136
Sander Dieleman
Sander Dieleman@sedielem·
FlexTok/Semanticist provided an elegant recipe to learn semantically coarse-to-fine sequence representations of images. This works for video as well: preserve the temporal axis, replace the spatial axes with a semantic coarse-to-fine axis. Promising for long video generation!
Andrei Atanov@andrew_atanov

Are all videos worth the same number of tokens? Whether rich in motion or visually minimal, standard 3D-grid tokenizers treat them equally. We present VideoFlexTok, which represents videos using a flexible-length, coarse-to-fine sequence of tokens. Page: videoflextok.epfl.ch Demo: huggingface.co/spaces/EPFL-VI… Paper: arxiv.org/abs/2604.12887 1/n

English
1
9
66
9.9K
Benhao Huang
Benhao Huang@huskydogewoof·
Siqiao is a rising star in world models and robotics, with sharp insights and exciting work. Really enjoyed chatting with him. If you’ll be at ICLR, definitely reach out and say hi 👋
Siqiao Huang@KnightNemo_

I will be at #ICLR2026 next week, where I'll be presenting our work Vid2World (arxiv.org/abs/2505.14357). DMs are open, coffee chats are much welcomed, and would love some dinner party invitations😍. I would be super thrilled to chat about World Models, World Action Models, Dexterous Manipulation, Representation Learning, Humanoids etc.

English
2
0
7
1.2K
Siqiao Huang
Siqiao Huang@KnightNemo_·
I will be at #ICLR2026 next week, where I'll be presenting our work Vid2World (arxiv.org/abs/2505.14357). DMs are open, coffee chats are much welcomed, and would love some dinner party invitations😍. I would be super thrilled to chat about World Models, World Action Models, Dexterous Manipulation, Representation Learning, Humanoids etc.
English
2
3
63
7.3K
Siqiao Huang
Siqiao Huang@KnightNemo_·
@chris_j_paxton For those who don’t know, the co-founders of this company include: - first author of RDT-1&2 - first author of LBM-cotrain, OnetwoVLA, Imitation Learning Data Scaling Law Better look out, they are coooking! Also @wuji_global hands🙌
English
0
1
11
763
Siqiao Huang retweetou
Max Simchowitz
Max Simchowitz@max_simchowitz·
⚠️Public Service Announcement for ICLR folks headed to Rio: Rio is beautiful, but getting a travel adapter in Brazil is a pain in the ass, and they don't use similar outlets to other major regions (US, UK, Europe, China). Buy a Brazil-specific adapter before traveling. Enjoy :)
English
2
10
89
12.8K
Siqiao Huang
Siqiao Huang@KnightNemo_·
If you squint at it harder, this is JEPA (done right with diffusion) plus reconstruction loss. In Chinese, we mock "质疑->理解->成为", which translates to "questioning->understanding->becoming", I am in my questioning JEPA to understanding JEPA transition phase.😇
Aaron (Youshen) Lim@youshenlim

The two-stage training bottleneck in latent diffusion is solved. UNITE unifies tokenization and generation into single-stage end-to-end training, achieving FID 2.12 on ImageNet without adversarial losses or pretrained encoders. arxiv.org/abs/2603.22283

English
2
7
110
17.9K
Siqiao Huang
Siqiao Huang@KnightNemo_·
Great science of WAM paper! My two cents: the main distinction between WAMs and VLAs that actually matters is representations, which are largely paired with the data they have seen during pretraining, i.e. Video data for WAMs, vision language understanding for VLMs.
Hang Zhao@zhaohang0124

Our recent findings on World Action Models (WAMs): the core advantage of WAMs is not test-time “imagination” of futures, but the training-time supervision from future video prediction. We propose Fast-WAM, which makes inference simple, fast, and policy-centric.

English
2
2
27
3.6K
Siqiao Huang
Siqiao Huang@KnightNemo_·
@leothecurious @xxunhuang No, the point is WM allows offloading TD-learning tax, and you get: parallelizable, cheap, and safe learning. But still, I think on-policy learning from scratch is a bad idea, for non-foundation models (arxiv.org/pdf/2510.14830) and foundation models (e.g. VLA, WAMs) alike.
English
1
0
1
68
davinci
davinci@leothecurious·
@xxunhuang what happened to the world model revolution?
English
2
0
1
363
Siqiao Huang
Siqiao Huang@KnightNemo_·
@xxunhuang I think Nvidia does a good job coining them as "World Models" and "World Action Models"
English
1
0
3
332
Xun Huang
Xun Huang@xxunhuang·
I’ve spent a lot of time explaining the distinction between these two approaches, and this blog does an excellent job of capturing it clearly. Should we consider giving them more distinct names, such as Video World Simulators vs Video World Policies?
Anirudha Majumdar@Majumdar_Ani

x.com/i/article/2033…

English
1
5
69
17.9K
Siqiao Huang
Siqiao Huang@KnightNemo_·
@electronickale Next week you’ll find yourself needing two max plans, or switching to codex
English
0
0
2
196
Yutong (Kelly) He
Yutong (Kelly) He@electronickale·
5 days into my trip to the Bay Area I’ve already upgraded my Claude subscription to max 🙂
English
3
0
30
2.9K