Andy Zeng

398 posts

Andy Zeng banner
Andy Zeng

Andy Zeng

@andyzengineer

Building robot foundation models @GeneralistAI. Prev @GoogleDeepMind, PhD @Princeton. One experiment away from magic. ✗DMs → email

Katılım Eylül 2017
581 Takip Edilen9.1K Takipçiler
Sabitlenmiş Tweet
Andy Zeng
Andy Zeng@andyzengineer·
The dark matter of robotics is “physical commonsense” Those tiny corrections, subtle recoveries that your hands do (and rarely notice). It’s everywhere, and yet—hard to pin down Second nature to us, but hard for machines They’re starting to emerge in our foundation models👇
Andy Zeng@andyzengineer

x.com/i/article/2016…

English
1
11
110
21.1K
Andy Zeng
Andy Zeng@andyzengineer·
Took time to make sure we did this right with results we could trust: blind A/B evals 📊 on success rates. Still feels crazy to me that data from random unrelated activities around the world yields strong pretraining transfer. Yet here we are. It's quite the beast. Stay tuned.
Generalist@GeneralistAI

More pretraining improves GEN-0 real-robot performance (via blind A/B evals with closed-loop rollouts). Improvements are significant in the low-data regime, but the best models thrive with both pretraining and ample post-training. See blog addendum: generalistai.com/blog/nov-04-20…

English
2
11
99
15.4K
Andy Zeng retweetledi
Andy Zeng retweetledi
Kamyar Ghasemipour
Kamyar Ghasemipour@coolboi95·
At Generalist, robotics is no longer limited by data. Breaking through the data wall has enabled us to become a true foundation model company, building and scaling models from the ground up for embodied intelligence. In today’s blog post we’re excited to share details about GEN-0, our first generation of embodied foundation models: • GEN-0 is a class of custom, natively cross-embodied, model architectures built from the ground up for the decadent dexterous behaviors Generalist has become known for. • We have scaled GEN-0 to 10B+ fully active parameters and continue to push the boundaries of scale for training and inference. • GEN-0 foundation models are thus far pretrained on 270,000+ hours of real world diverse manipulation data. We collect 10,000 hours of data per week and are accelerating. • We are now seeing that massive-scale pretraining leads to beautifully sample-efficient finetuning on downstream tasks, delivering on the fundamental promise of embodied foundation models. • The scale of our data trove has enabled us to conduct detailed science on pretraining. Scaling Laws are alive and well in robotics. All of the above and much more on our blog post: generalistai.com/blog/nov-04-20…
Generalist@GeneralistAI

Introducing GEN-0, our latest 10B+ foundation model for robots ⏱️ built on Harmonic Reasoning, new architecture that can think & act seamlessly 📈 strong scaling laws: more pretraining & model size = better 🌍 unprecedented corpus of 270,000+ hrs of dexterous data Read more 👇

English
17
34
263
36.5K
Andy Zeng
Andy Zeng@andyzengineer·
General dexterity involves "physical commonsense" —learning long tail of cases like: 🤏nudging objects to make space for fingers to grasp 🫴placing down slipping objects to get better grip etc... This👇shares more on how we think robots🦾can get there with models & lots of data
Generalist@GeneralistAI

Introducing GEN-0, our latest 10B+ foundation model for robots ⏱️ built on Harmonic Reasoning, new architecture that can think & act seamlessly 📈 strong scaling laws: more pretraining & model size = better 🌍 unprecedented corpus of 270,000+ hrs of dexterous data Read more 👇

English
1
3
88
8.5K
Andy Zeng
Andy Zeng@andyzengineer·
"Train a model to predict how many timesteps left until task success" - a simple, yet powerful way to get rewards from episodic BC data Lots of nuggets in the paper (including steps-to-go fn is distributionally multimodal) Kamyar's post 👇 on how it drives RL self-improvement
Kamyar Ghasemipour@coolboi95

Super excited to finally share our work on “Self-Improving Embodied Foundation Models”!! (Also accepted at NeurIPS 2025) • Online on-robot Self-Improvement • Self-predicted rewards and success detection • Orders of magnitude sample-efficiency gains compared to SFT alone • Generalization enables novel skill acquisition 🧵👇[1/11]

English
2
10
90
11.6K
Andy Zeng retweetledi
Danfei Xu
Danfei Xu@danfei_xu·
One shot imitation learning! Brings back good memories from eons ago (aka 2016). This is probably my favorite demo of the year (so far). The smoothness and agility of these systems speak to the quality of the full stack system. Sometimes I tell my students that a good number of problems could be fixed with better robot controller and overall system integration (time sync etc.) rather than fancy learning algorithms and bigger models ...
Andy Zeng@andyzengineer

This is one-shot assembly: you show examples of what to build, and the robot just does it. (see original post: generalistai.com/blog) To share more on how this works, the robot is controlled in real time by a neural network that takes in video pixels and outputs 100Hz actions. The video below is part of the raw input passed directly into the model. I also like this view (at 1x speed) because it shows more of the (I think very cool) subtle moments of dexterity near the fingertips 👌 One-shot assembly seemed like a dream even just a year ago — it's not easy. It requires both the high-level reasoning of "what to build" (recognizing the geometry of the structures presented by the human), and the low-level visuomotor control of "how to build it" (purposefully re-orienting individual pieces and nudging them together in place). While possible to manually engineer a complex system for this (e.g. w/ hierarchical control, or explicit state representations), we were curious if our own Foundation model could do it all end-to-end with just some post-training data. Surprisingly, it just worked. Nothing about the recipe is substantially different than any other demo we’ve run in the past, and we’re excited about its implications on model capabilities: • On contextual reasoning, these models can (i) attend to task-related pixels in the peripheral view of the video inputs, and (ii) retain this knowledge in-context while ignoring irrelevant background. This is useful for generalizing to a wide range of real workflows: e.g. paying attention to what’s coming down the conveyor line, or glancing at the instructions displayed on a nearby monitor. • On dexterity, these models can produce contact-rich "commonsense" behaviors that can be difficult to pre-program or write language instructions for e.g. rolling a brick slightly to align its studs against the bottom of another, re-grasping to get a better grip or to move out of the way before a forceful press, or gently pushing the corners of a brick against the mat to rotate it in hand and stand it up vertically (i.e. extrinsic dexterity). These aspects work together to form a capability that resembles fast adaptation — a hallmark of intelligence, relevant for real use cases. This has also expanded my own perspective on what's possible with robot learning, using a recipe that's repeatable for many more skills. This milestone stands on top of the solid technical foundations we’ve built here at Generalist: hardcore controls & hardware, all in-house built models, and a data engine that "just works." We're a small group of hyper-focused engineers, and hands-down the highest talent-density team I’ve ever worked with. We're accelerating and scaling aggressively towards unlocking next-generation robot intelligence. Building Legos is just one example, and it's clear to me that we're headed towards a future where robots can do just about anything we want them to. Its coming, and we're going to make it happen.

English
2
8
76
11.5K
Andy Zeng
Andy Zeng@andyzengineer·
@tomprimozic I suspect a combination of both! as long as it's within the context buffer
English
0
0
1
513
Andy Zeng
Andy Zeng@andyzengineer·
This is one-shot assembly: you show examples of what to build, and the robot just does it. (see original post: generalistai.com/blog) To share more on how this works, the robot is controlled in real time by a neural network that takes in video pixels and outputs 100Hz actions. The video below is part of the raw input passed directly into the model. I also like this view (at 1x speed) because it shows more of the (I think very cool) subtle moments of dexterity near the fingertips 👌 One-shot assembly seemed like a dream even just a year ago — it's not easy. It requires both the high-level reasoning of "what to build" (recognizing the geometry of the structures presented by the human), and the low-level visuomotor control of "how to build it" (purposefully re-orienting individual pieces and nudging them together in place). While possible to manually engineer a complex system for this (e.g. w/ hierarchical control, or explicit state representations), we were curious if our own Foundation model could do it all end-to-end with just some post-training data. Surprisingly, it just worked. Nothing about the recipe is substantially different than any other demo we’ve run in the past, and we’re excited about its implications on model capabilities: • On contextual reasoning, these models can (i) attend to task-related pixels in the peripheral view of the video inputs, and (ii) retain this knowledge in-context while ignoring irrelevant background. This is useful for generalizing to a wide range of real workflows: e.g. paying attention to what’s coming down the conveyor line, or glancing at the instructions displayed on a nearby monitor. • On dexterity, these models can produce contact-rich "commonsense" behaviors that can be difficult to pre-program or write language instructions for e.g. rolling a brick slightly to align its studs against the bottom of another, re-grasping to get a better grip or to move out of the way before a forceful press, or gently pushing the corners of a brick against the mat to rotate it in hand and stand it up vertically (i.e. extrinsic dexterity). These aspects work together to form a capability that resembles fast adaptation — a hallmark of intelligence, relevant for real use cases. This has also expanded my own perspective on what's possible with robot learning, using a recipe that's repeatable for many more skills. This milestone stands on top of the solid technical foundations we’ve built here at Generalist: hardcore controls & hardware, all in-house built models, and a data engine that "just works." We're a small group of hyper-focused engineers, and hands-down the highest talent-density team I’ve ever worked with. We're accelerating and scaling aggressively towards unlocking next-generation robot intelligence. Building Legos is just one example, and it's clear to me that we're headed towards a future where robots can do just about anything we want them to. Its coming, and we're going to make it happen.
English
17
32
284
48.2K
Andy Zeng retweetledi
Kento Kawaharazuka / 河原塚 健人
🎉Advanced Robotics Best Survey Paper Award has been awarded to our survey paper "Real-World Robot Applications of Foundation Models: A Review"! We are truly grateful to everyone who contributed! Thank you @__tmats__, Andrew, @jiaxianguo07, @chris_j_paxton, and @andyzeng_ !
Kento Kawaharazuka / 河原塚 健人 tweet media
Kento Kawaharazuka / 河原塚 健人@KKawaharazuka

How can existing robot systems be replaced with foundation models? Check out our new survey paper on the real-world robot applications of foundation models: arxiv.org/abs/2402.05741 Thread👇

English
4
18
116
22.8K
Aditya Ganapathi
Aditya Ganapathi@adivganapathi·
So excited to keep tabs on this. I had the opportunity to work with @andyzeng_ and @peteflorence at Google Brain where they convinced me that end to end learning would eventually enable robots that can generalize. A lot of my research was built upon their foundational work, and I can easily say that they are two of the smartest people I’ve ever worked with. If anyone is solving robotics in this decade, my bet is on these guys!
Andy Zeng@andyzengineer

To see emergent behaviors from low-level policies was a first for many of us on the team. They don't happen often enough yet, but it certainly feels like we're headed in the right direction. Reach out if you're interested in working together.

English
2
0
6
309
Andy Zeng retweetledi
Andy Zeng retweetledi