Daniel Ho

175 posts

Daniel Ho

@itsdanielho

Project Prometheus. Previously 1X, Waymo, Google X

San Francisco, CA Katılım Ekim 2009

344 Takip Edilen1.9K Takipçiler

Daniel Ho@itsdanielho·22 Mar

@karansdalal thanks Karan!

English

1.2K

Karan Dalal@karansdalal·22 Mar

@itsdanielho Congratulations, Daniel!

Français

1.4K

Daniel Ho@itsdanielho·22 Mar

A personal update: After two years at 1X, I’m moving on to something new. I joined 1X to solve general-purpose robotics, through the lens of evaluation. We bet on humanoid world models early in 2024. I’m proud of our work showing how the 1X World Model can solve the offline evaluation problem: judging policy quality by accurately predicting expected state and reward within the test-time distribution. We've then showcased how these same world models can leverage their understanding of robot manipulation to act as policies, generalizing far beyond the tasks in training data. Scaled deployments of robots in homes requires confidence in policy performance in unknown environments, and generalization across environments and skills. To my colleagues at 1X, It's been an honor working with you all. I'm inspired by the world-class team and humanoid that we’ve assembled and continue to assemble. In California, we’ve grown from a few robotics researchers in a one room office to a campus for large-scale manufacturing and research engineering. I’ve now joined the founding team at Project Prometheus as a member of technical staff. I've also moved up to San Francisco! Reach out if you'd like to grab coffee and chat AI in the physical world.

English

378

34.4K

Daniel Ho retweetledi

Junfan Zhu 朱俊帆 ✈️ CVPR@junfanzhu98·12 Şub

Tuned into @itsdanielho (@1x_tech) on @RoboPapers podcast geeking out over 1XWM—inspiring! "Dream success first, then reverse-engineer the actions" paradigm is 🔥 and lol it applies to non-robots too! My takes↓ 1️⃣ World Model perfectly predicted the future action, and it was extremely close to reality, due to action-conditioned video generation (on precise low-level action sequences). At execution time, Inverse Dynamics Model (IDM) back-infers actions to ensure the “dreamed perfect trajectory” can be grounded in reality. controllable + grounded + zero-shot 2️⃣ Egocentric large-scale mid-training is useful because diversified data expands distribution coverage. Scalable and low-cost. 3️⃣ Granular training (second-by-second). Use VLM for caption upsampling, from coarse task to second-by-second play-by-play. Similar to Sora fine-grained prompt engineering, but more applicable to robot control. Granularity makes the world model capture causal chains, not just spectacle. 4️⃣ Both success and failure videos are used to train the world model. Success videos reinforce correct physics, failure videos provide negative examples. This makes imagination robust: the model can generate diverse futures (including bad ones), and a value function selects the best. 5️⃣ World model evaluating world model (recursive eval) is interesting. Current 1XWM can do self-eval (model-evaluating-model): Generate multiple rollout videos; Use internal value function or visual signals to estimate success probability; Execute highest-scoring trajectory. A more advanced loop may be: Use WM rollouts as synthetic data to predict success rate for ablating training data; Retrain/improve WM; Offline policy optimization (Dreamer-style million dream iterations). Instead of directly learn policy and rely on real rollouts for eval (expensive), using World Model to do dream-time eval / in-simulation assessment can be scalable to break through the data wall and generalize exponentially. 6️⃣ Inverse Dynamics Model (IDM) is a bridging component to translate World Model video sequences into executable low-level robot actions. It's cerebellum/translator. Given adjacent generated frames, it infers the action commands required to transition from frame A to B. World Model generates multiple rollouts with stochastic sampling, then IDM performs frame-to-frame inversion to recover action sequence and candidate trajectories, applying rejection sampling to discard some dreams where inferred actions violate kinematic constraints and ask WM regenerates. Training IDM separately is more efficient (on smaller precise data), while WM is pretrained on massive data (strong generalization). This architecture enables video prior + grounded embodiment. Instead of directly VLA End-to-End action prediction, WM + IDM "imagine-then-invert" paradigm is like "dream success first, then reverse-engineer actions", with higher visual alignment in zero-shot long-horizon tasks and easier offline evals. 👉🏻x.com/RoboPapers/sta…

Daniel Ho@itsdanielho

Check out this @RoboPapers pod for an overview of the past year of our world model research @1x_tech! We're very excited about world model architectures to achieve truly generalizable robot policies and evaluators. NEO will be able to zero-shot tasks in homes, learn rapidly with autonomy data, and predict how freshly baked models perform. This will usher in the era of home robots.

English

6.6K

Daniel Ho@itsdanielho·10 Şub

RoboPapers@RoboPapers

Every home is different. That means that to build a useful home robot, we must be able to perform zero-shot generalization on a wide range of tasks. Humanoid company @1x_tech has a solution: world models. 1X Director of Evaluations @itsdanielho joins us on RoboPapers to talk about: - why world models are the future for scaling robot learning - how to use world models for robot control - what world models unlock for evaluating robot model performance - how we can hill-climb from here to general purpose robots Watch Episode #61 of RoboPapers, with @micoolcho and @chris_j_paxton, now!

English

12.5K

Daniel Ho retweetledi

RoboPapers@RoboPapers·4 Şub

English

27.5K

Daniel Ho@itsdanielho·2 Şub

Get ready for the @1x_tech world model RoboPapers pod drop!

RoboPapers@RoboPapers

Full episode dropping soon! Geeking out with @itsdanielho on 1X World Model 1x.tech/discover/world… Co-hosted by @micoolcho @chris_j_paxton

English

5.5K

Daniel Ho@itsdanielho·16 Oca

@RealBrayden hopefully we can have people come interact with the model soon!

English

Brayden@RealBrayden·14 Oca

@itsdanielho Any chance there will be any early acess interest forms?

English

Daniel Ho@itsdanielho·14 Oca

World model based polices like 1XWM we shared yesterday enables preference feedback during post-training and also test-time compute, because the model generates interpretable state One of the unlocks from this new type of architecture below the headlines

Jack Monas@JackMonas

One of many next steps at @1x_tech: preference learning for world-model-based policies. Given a generated starting frame, we can sample multiple video rollouts from our WM and use preference feedback to steer the model toward higher-quality behavior. This lets us fix policy failures in synthetic worlds—resolving bad NEO behaviors with generated dogs before we ever meet real ones.

English

7.6K

Daniel Ho@itsdanielho·15 Oca

@robotryer With preference alignment and RLHF on world models, that opens up the opportunity to train custom models for each personality and owner

English

580

Robert Moore@robotryer·14 Oca

@itsdanielho What are your thoughts relative to NEO adopting an owner specific personality? We want our Neo to do things specific to our personal worlds. Sort of like scratching the dog on the tummy vs behind the ears…

English

102

Daniel Ho@itsdanielho·15 Oca

@_joe_harris_ in our blog post we show side-by-side comparisons between generations and real rollouts for a bunch of tasks: 1x.tech/discover/world… Next up we will speed up model inference and minimize latency and re-plan when conditions drift

English

Joe Harris@_joe_harris_·15 Oca

@itsdanielho World models as policies is an interesting path. The key question: how robust is the alignment between generation and real rollout when conditions drift?

English

Daniel Ho@itsdanielho·12 Oca

Excited to share our latest work on world models as robot policies! NEO executes novel manipulation tasks from text prompts, deriving actions from text-conditioned video generation. We found strong alignment between world model generations and real rollouts, and sufficient controllability to control NEO accurately. 1/n

1X@1x_tech

NEO’s Starting to Learn on Its Own

English

138

14.2K

Daniel Ho@itsdanielho·13 Oca

@fatemi_michael thanks Michael!

English

126

Michael Yoo Fatemi@fatemi_michael·13 Oca

Very nice progress!

1X@1x_tech

NEO’s Starting to Learn on Its Own

English

538

Peter Liu@peterliuposts·13 Oca

One of the coolest examples we found is NEO holding up a peace sign WM both understands what a peace sign is and is self aware (no hands in starting frame) + the IDM extracts finger level actions :)

1X@1x_tech

NEO’s Starting to Learn on Its Own

English

2.2K

Daniel Ho@itsdanielho·13 Oca

@peterliuposts ✌️

QME

119

Daniel Ho@itsdanielho·13 Oca

@PlutonianGray @radbackwards thanks Kevin! excited for you to have your NEO

English

Kevin 🛸👽 plutoniangray.space@PlutonianGray·13 Oca

@itsdanielho @radbackwards Thanks so much for accelerating the future! As a purchaser, I can’t wait to see how well it works. A pivotal moment in history is upon us. I don’t think I’m overstating it. You should be proud to be on the team making it happen. Congratulations!

English

Daniel Ho@itsdanielho·13 Oca

@peterliuposts ur legendary peterliuposts!

Indonesia

Peter Liu@peterliuposts·13 Oca

@itsdanielho Legendary 🔥

Indonesia

Daniel Ho@itsdanielho·13 Oca

@christyjestin @ridcursion Because NEO’s embodiment is so close to human form, we found promising zero-shot transfer even without overlap on the task-specific data. For example we have 98.5% pick and place and tested transfer which wroked well

English

Chriminal@_chriminal_·13 Oca

@ridcursion @itsdanielho My intuition is that the main thing you need is enough overlap between the human + robot data so the model can learn to align representations and then it can repurpose the human data for tasks where there's no robot reference. @itsdanielho curious to hear your view?

English

Daniel Ho@itsdanielho·13 Oca

@PotEl0000 @btfdNOID @1x_tech Good question, you’re correct that our current world model work doesn’t solve these delayed and higher level tasks. Stay tuned for orchestration work where we solve things like this!

English

Ept@PotEl0000·13 Oca

@itsdanielho @btfdNOID @1x_tech Hey, Daniel. Love what you guys did with the world model, it’s got me SUPER excited I do have a question, how will NEO handle delayed requests like ‘Greet my guests when they enter, answer the door and pay the pizza delivery driver upon arrival’,etc requests that aren’t instant?

English

1X@1x_tech·12 Oca

NEO’s Starting to Learn on Its Own

English

297

417

3.2K

6.3M

Daniel Ho@itsdanielho·13 Oca

@ridcursion Yep! Our NEO data is 98.5% pick and place, and most tasks we show are zero-shot with no similar robot data at all

English

119

Rid@ridcursion·13 Oca

@itsdanielho awesome work!! 900h human to 70h NEO is a big ratio. is the human data mostly doing the heavy lifting on task understanding while NEO data just handles the embodiment gap?

English

Daniel Ho@itsdanielho·13 Oca

@radbackwards @byersscamm 🫡

QME

110

dar@radbackwards·13 Oca

@byersscamm This is a really good idea. Will work on this. We need to A. Reduce the time to generation B. Add a lot of fun ythjngs to play with to world model lab— but I’ll let the world model team know about this. I love this.

English

448

dar@radbackwards·12 Oca

1X World Model Summary:

GIF

English

2.5K

Daniel Ho@itsdanielho·13 Oca

@Sentdex will happen sooner than you expect

English

134

Harrison Kinsley@Sentdex·13 Oca

I have been wondering how 1x could ever possibly get enough teleop data to do what they need. This is really cool to see and I can't wait to see how it goes in deployment.

dar@radbackwards

Few notes… - This is a whole new world. We have now gone from a world where humanoid robots are constrained by tele op data collection to unlocking themselves to collect their own data by using a video backbone grounded in physics to generate pretty much any AI abilities… try and digest that a bit. - this IS NOT AGI. But this is an important step on the path there. We still have a lot of work to do in order to get to the point where NEO has fully closed the loop to truly teach itself anything you could ask. More updates soon. - now robot policies can improve alongside the rapid development in video models! This will make things move REALLY FAST. - the word model team is one of my greatest inspirations at 1X. Since the day I started: Jack, Daniel, Christina and now Lorand and Peter have been locked the fuck in for 2 years straight till 3 am every day on a completely contrarian bet that payed off big today… that is why it takes. It’s never easy.

English

Keşfet

@karansdalal @1x_tech @RoboPapers @micoolcho @chris_j_paxton @RealBrayden @robotryer @_joe_harris_