Daniel Ho

173 posts

Daniel Ho banner
Daniel Ho

Daniel Ho

@itsdanielho

Director @1x_tech. World models and evals. Prev @Waymo @Google @Theteamatx @berkeley_ai

Katılım Ekim 2009
333 Takip Edilen1.6K Takipçiler
Sabitlenmiş Tweet
Daniel Ho
Daniel Ho@itsdanielho·
Excited to share our latest work on world models as robot policies! NEO executes novel manipulation tasks from text prompts, deriving actions from text-conditioned video generation. We found strong alignment between world model generations and real rollouts, and sufficient controllability to control NEO accurately. 1/n
1X@1x_tech

NEO’s Starting to Learn on Its Own

English
12
9
138
12.2K
Daniel Ho retweetledi
Junfan Zhu 朱俊帆 🔛 GTC
Junfan Zhu 朱俊帆 🔛 GTC@junfanzhu98·
Tuned into @itsdanielho (@1x_tech) on @RoboPapers podcast geeking out over 1XWM—inspiring! "Dream success first, then reverse-engineer the actions" paradigm is 🔥 and lol it applies to non-robots too! My takes↓ 1️⃣ World Model perfectly predicted the future action, and it was extremely close to reality, due to action-conditioned video generation (on precise low-level action sequences). At execution time, Inverse Dynamics Model (IDM) back-infers actions to ensure the “dreamed perfect trajectory” can be grounded in reality. controllable + grounded + zero-shot 2️⃣ Egocentric large-scale mid-training is useful because diversified data expands distribution coverage. Scalable and low-cost. 3️⃣ Granular training (second-by-second). Use VLM for caption upsampling, from coarse task to second-by-second play-by-play. Similar to Sora fine-grained prompt engineering, but more applicable to robot control. Granularity makes the world model capture causal chains, not just spectacle. 4️⃣ Both success and failure videos are used to train the world model. Success videos reinforce correct physics, failure videos provide negative examples. This makes imagination robust: the model can generate diverse futures (including bad ones), and a value function selects the best. 5️⃣ World model evaluating world model (recursive eval) is interesting. Current 1XWM can do self-eval (model-evaluating-model): Generate multiple rollout videos; Use internal value function or visual signals to estimate success probability; Execute highest-scoring trajectory. A more advanced loop may be: Use WM rollouts as synthetic data to predict success rate for ablating training data; Retrain/improve WM; Offline policy optimization (Dreamer-style million dream iterations). Instead of directly learn policy and rely on real rollouts for eval (expensive), using World Model to do dream-time eval / in-simulation assessment can be scalable to break through the data wall and generalize exponentially. 6️⃣ Inverse Dynamics Model (IDM) is a bridging component to translate World Model video sequences into executable low-level robot actions. It's cerebellum/translator. Given adjacent generated frames, it infers the action commands required to transition from frame A to B. World Model generates multiple rollouts with stochastic sampling, then IDM performs frame-to-frame inversion to recover action sequence and candidate trajectories, applying rejection sampling to discard some dreams where inferred actions violate kinematic constraints and ask WM regenerates. Training IDM separately is more efficient (on smaller precise data), while WM is pretrained on massive data (strong generalization). This architecture enables video prior + grounded embodiment. Instead of directly VLA End-to-End action prediction, WM + IDM "imagine-then-invert" paradigm is like "dream success first, then reverse-engineer actions", with higher visual alignment in zero-shot long-horizon tasks and easier offline evals. 👉🏻x.com/RoboPapers/sta…
Daniel Ho@itsdanielho

Check out this @RoboPapers pod for an overview of the past year of our world model research @1x_tech! We're very excited about world model architectures to achieve truly generalizable robot policies and evaluators. NEO will be able to zero-shot tasks in homes, learn rapidly with autonomy data, and predict how freshly baked models perform. This will usher in the era of home robots.

English
0
4
20
4.6K
Daniel Ho
Daniel Ho@itsdanielho·
Check out this @RoboPapers pod for an overview of the past year of our world model research @1x_tech! We're very excited about world model architectures to achieve truly generalizable robot policies and evaluators. NEO will be able to zero-shot tasks in homes, learn rapidly with autonomy data, and predict how freshly baked models perform. This will usher in the era of home robots.
RoboPapers@RoboPapers

Every home is different. That means that to build a useful home robot, we must be able to perform zero-shot generalization on a wide range of tasks. Humanoid company @1x_tech has a solution: world models. 1X Director of Evaluations @itsdanielho joins us on RoboPapers to talk about: - why world models are the future for scaling robot learning - how to use world models for robot control - what world models unlock for evaluating robot model performance - how we can hill-climb from here to general purpose robots Watch Episode #61 of RoboPapers, with @micoolcho and @chris_j_paxton, now!

English
1
5
38
9.9K
Daniel Ho retweetledi
RoboPapers
RoboPapers@RoboPapers·
Every home is different. That means that to build a useful home robot, we must be able to perform zero-shot generalization on a wide range of tasks. Humanoid company @1x_tech has a solution: world models. 1X Director of Evaluations @itsdanielho joins us on RoboPapers to talk about: - why world models are the future for scaling robot learning - how to use world models for robot control - what world models unlock for evaluating robot model performance - how we can hill-climb from here to general purpose robots Watch Episode #61 of RoboPapers, with @micoolcho and @chris_j_paxton, now!
English
6
5
58
26.4K
Daniel Ho
Daniel Ho@itsdanielho·
@RealBrayden hopefully we can have people come interact with the model soon!
English
0
0
0
34
Brayden
Brayden@RealBrayden·
@itsdanielho Any chance there will be any early acess interest forms?
English
1
0
1
78
Daniel Ho
Daniel Ho@itsdanielho·
World model based polices like 1XWM we shared yesterday enables preference feedback during post-training and also test-time compute, because the model generates interpretable state One of the unlocks from this new type of architecture below the headlines
Jack Monas@JackMonas

One of many next steps at @1x_tech: preference learning for world-model-based policies. Given a generated starting frame, we can sample multiple video rollouts from our WM and use preference feedback to steer the model toward higher-quality behavior. This lets us fix policy failures in synthetic worlds—resolving bad NEO behaviors with generated dogs before we ever meet real ones.

English
4
5
65
7.5K
Daniel Ho
Daniel Ho@itsdanielho·
@robotryer With preference alignment and RLHF on world models, that opens up the opportunity to train custom models for each personality and owner
English
1
0
4
567
Robert Moore
Robert Moore@robotryer·
@itsdanielho What are your thoughts relative to NEO adopting an owner specific personality? We want our Neo to do things specific to our personal worlds. Sort of like scratching the dog on the tummy vs behind the ears…
English
1
0
2
97
Daniel Ho
Daniel Ho@itsdanielho·
@_joe_harris_ in our blog post we show side-by-side comparisons between generations and real rollouts for a bunch of tasks: 1x.tech/discover/world… Next up we will speed up model inference and minimize latency and re-plan when conditions drift
English
0
0
2
61
Joe Harris
Joe Harris@_joe_harris_·
@itsdanielho World models as policies is an interesting path. The key question: how robust is the alignment between generation and real rollout when conditions drift?
English
1
0
1
48
Daniel Ho
Daniel Ho@itsdanielho·
Excited to share our latest work on world models as robot policies! NEO executes novel manipulation tasks from text prompts, deriving actions from text-conditioned video generation. We found strong alignment between world model generations and real rollouts, and sufficient controllability to control NEO accurately. 1/n
1X@1x_tech

NEO’s Starting to Learn on Its Own

English
12
9
138
12.2K
Peter Liu
Peter Liu@peterliuposts·
One of the coolest examples we found is NEO holding up a peace sign WM both understands what a peace sign is and is self aware (no hands in starting frame) + the IDM extracts finger level actions :)
1X@1x_tech

NEO’s Starting to Learn on Its Own

English
3
4
17
2.1K
Kevin 🛸👽 plutoniangray.space
@itsdanielho @radbackwards Thanks so much for accelerating the future! As a purchaser, I can’t wait to see how well it works. A pivotal moment in history is upon us. I don’t think I’m overstating it. You should be proud to be on the team making it happen. Congratulations!
English
1
0
2
63
Daniel Ho
Daniel Ho@itsdanielho·
@christyjestin @ridcursion Because NEO’s embodiment is so close to human form, we found promising zero-shot transfer even without overlap on the task-specific data. For example we have 98.5% pick and place and tested transfer which wroked well
English
1
0
2
32
Chriminal
Chriminal@_chriminal_·
@ridcursion @itsdanielho My intuition is that the main thing you need is enough overlap between the human + robot data so the model can learn to align representations and then it can repurpose the human data for tasks where there's no robot reference. @itsdanielho curious to hear your view?
English
1
0
2
35
Daniel Ho
Daniel Ho@itsdanielho·
@PotEl0000 @btfdNOID @1x_tech Good question, you’re correct that our current world model work doesn’t solve these delayed and higher level tasks. Stay tuned for orchestration work where we solve things like this!
English
0
0
1
55
Ept
Ept@PotEl0000·
@itsdanielho @btfdNOID @1x_tech Hey, Daniel. Love what you guys did with the world model, it’s got me SUPER excited I do have a question, how will NEO handle delayed requests like ‘Greet my guests when they enter, answer the door and pay the pizza delivery driver upon arrival’,etc requests that aren’t instant?
English
1
0
1
40
1X
1X@1x_tech·
NEO’s Starting to Learn on Its Own
English
298
416
3.1K
6.2M
Daniel Ho
Daniel Ho@itsdanielho·
@ridcursion Yep! Our NEO data is 98.5% pick and place, and most tasks we show are zero-shot with no similar robot data at all
English
0
1
4
115
Rid
Rid@ridcursion·
@itsdanielho awesome work!! 900h human to 70h NEO is a big ratio. is the human data mostly doing the heavy lifting on task understanding while NEO data just handles the embodiment gap?
English
2
0
2
76
dar
dar@radbackwards·
@byersscamm This is a really good idea. Will work on this. We need to A. Reduce the time to generation B. Add a lot of fun ythjngs to play with to world model lab— but I’ll let the world model team know about this. I love this.
English
3
0
7
438
dar
dar@radbackwards·
1X World Model Summary:
GIF
English
7
3
67
2.5K
Daniel Ho
Daniel Ho@itsdanielho·
@Sentdex will happen sooner than you expect
English
0
0
12
134
Daniel Ho
Daniel Ho@itsdanielho·
@Surreal_Intel We're working on understanding that next :) For example, how to learn the precise ranking between different generations by quality or success likelihood
English
0
0
9
98
surreal intelligence
surreal intelligence@Surreal_Intel·
This is the cognitive stack we’ve been waiting for: imagination → action. Once the dream gets good enough, reality becomes the slow, expensive verification step. Next question: can it learn the why of failure, or only the choreography?
The Humanoid Hub@TheHumanoidHub

- NEO stands at a glass sliding door - receives a command to close it - "Dreams" the execution using a World Model (left) - real NEO then "copies" the dream into physical reality (right)

English
1
0
4
235