Dhruv Patel

57 posts

Dhruv Patel

Dhruv Patel

@dhruvpatel2012

Robotics @GeorgiaTech @ICatGT | Prev - Autonomous Driving @hri_usa and @swaayatt, Robotics @iiit_hyderabad, @GoogleOSS @letsunifyai AI and Robotics 🧠 🤖

Katılım Aralık 2018
1.8K Takip Edilen235 Takipçiler
Sabitlenmiş Tweet
Dhruv Patel
Dhruv Patel@dhruvpatel2012·
Presenting EgoMimic at #CoRL2024! 🎉 Effortless data collection with @meta_aria glasses—just wear & go. Our low-cost manipulator leverages this scalable data to perform grocery handling, laundry, coffee-making & more. Thrilled to be a part of this effort! egomimic.github.io
Simar Kareer@simar_kareer

Introducing EgoMimic - just wear a pair of Project Aria @meta_aria smart glasses 👓 to scale up your imitation learning datasets! Check out what our robot can do. A thread below👇

English
0
5
30
2.3K
Dhruv Patel retweetledi
Danfei Xu
Danfei Xu@danfei_xu·
Human data becomes far more useful when robots are human-like. Excited to share a milestone from our work at GEAR: we trained a VLA model on 20k+ hours of in-the-wild human data and deployed it on robots with 22-DoF hands. Key findings: - Near log-linear scaling between human data volume and action accuracy (R² = 0.998), predictive of real dexterous performance - Few-shot generalization begins to emerge at this scale, with some tasks solved from a single demo - Policies trained on humans transfer across embodiments, including lower-DoF hands Simple recipe + scale = new capabilities Outside of the paper, we also discovered other emerging properties such as strong language following. More to come!
Ruijie Zheng@ruijie_zheng12

Proud to introduce EgoScale: We pretrained a GR00T VLA model on 20K+ hours of egocentric human video and discovered that robot dexterity can be scaled, not with more robots, but with more human data. A thread on 🧵what we learned. 👇

English
2
14
127
13.7K
Dhruv Patel retweetledi
Simar Kareer
Simar Kareer@simar_kareer·
Some properties of LLMs only emerge with scale, one of which is the ability to effectively generalize from diverse data. During my internship @physical_int, we uncovered an emergent property of VLAs: as we scale up pre-training, VLAs can naturally learn from human video data!
English
7
14
83
9.7K
Dhruv Patel retweetledi
Danfei Xu
Danfei Xu@danfei_xu·
Most past work throws human data into a pretraining mix. EgoMimic showed that, with proper alignment, you can co-train with human data. In his internship project at Pi, @simar_kareer took this a step further and showed that human data can "post-train" VLAs. This enables robots to solve tasks seen only in human data. Stronger base models enable stronger transfer. Awesome collaboration with the amazing @SurajNair_1 @KarlPertsch et al. at Pi!
Physical Intelligence@physical_int

We discovered an emergent property of VLAs like π0/π0.5/π0.6: as we scale up pre-training, the model learns to align human videos and robot data! This gives us a simple way to leverage human videos. Once π0.5 knows how to control robots, it can naturally learn from human video.

English
5
14
239
20.1K
Dhruv Patel retweetledi
Danfei Xu
Danfei Xu@danfei_xu·
My group (and I) will be at #NeurIPS2025 San Diego. DM if you want to chat about generative planning, TAMP, egocentric data, and basically anything robotics! Papers: @ryan_punamiya @LawrenceZhu22 will be presenting EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data @yluo_y @utkarshm0410 will be presenting Generative Trajectory Stitching through Diffusion Composition (spotlight!) @ShuoCheng94 will be presenting Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training
English
2
6
87
9.2K
Dhruv Patel retweetledi
DJ Seo
DJ Seo@djseo·
Open the freezer → take out the pretzel → open the microwave → put the plate with the pretzel inside → cook → enjoy 🥨 All via telepathy, pure brain power
English
98
277
2.5K
457.6K
Dhruv Patel retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea is sufficiently "bitter lesson pilled" (meaning arranged so that it benefits from added computation for free) as a proxy for whether it's going to work or worth even pursuing. The underlying assumption being that LLMs are of course highly "bitter lesson pilled" indeed, just look at LLM scaling laws where if you put compute on the x-axis, number go up and to the right. So it's amusing to see that Sutton, the author of the post, is not so sure that LLMs are "bitter lesson pilled" at all. They are trained on giant datasets of fundamentally human data, which is both 1) human generated and 2) finite. What do you do when you run out? How do you prevent a human bias? So there you have it, bitter lesson pilled LLM researchers taken down by the author of the bitter lesson - rough! In some sense, Dwarkesh (who represents the LLM researchers viewpoint in the pod) and Sutton are slightly speaking past each other because Sutton has a very different architecture in mind and LLMs break a lot of its principles. He calls himself a "classicist" and evokes the original concept of Alan Turing of building a "child machine" - a system capable of learning through experience by dynamically interacting with the world. There's no giant pretraining stage of imitating internet webpages. There's also no supervised finetuning, which he points out is absent in the animal kingdom (it's a subtle point but Sutton is right in the strong sense: animals may of course observe demonstrations, but their actions are not directly forced/"teleoperated" by other animals). Another important note he makes is that even if you just treat pretraining as an initialization of a prior before you finetune with reinforcement learning, Sutton sees the approach as tainted with human bias and fundamentally off course, a bit like when AlphaZero (which has never seen human games of Go) beats AlphaGo (which initializes from them). In Sutton's world view, all there is is an interaction with a world via reinforcement learning, where the reward functions are partially environment specific, but also intrinsically motivated, e.g. "fun", "curiosity", and related to the quality of the prediction in your world model. And the agent is always learning at test time by default, it's not trained once and then deployed thereafter. Overall, Sutton is a lot more interested in what we have common with the animal kingdom instead of what differentiates us. "If we understood a squirrel, we'd be almost done". As for my take... First, I should say that I think Sutton was a great guest for the pod and I like that the AI field maintains entropy of thought and that not everyone is exploiting the next local iteration LLMs. AI has gone through too many discrete transitions of the dominant approach to lose that. And I also think that his criticism of LLMs as not bitter lesson pilled is not inadequate. Frontier LLMs are now highly complex artifacts with a lot of humanness involved at all the stages - the foundation (the pretraining data) is all human text, the finetuning data is human and curated, the reinforcement learning environment mixture is tuned by human engineers. We do not in fact have an actual, single, clean, actually bitter lesson pilled, "turn the crank" algorithm that you could unleash upon the world and see it learn automatically from experience alone. Does such an algorithm even exist? Finding it would of course be a huge AI breakthrough. Two "example proofs" are commonly offered to argue that such a thing is possible. The first example is the success of AlphaZero learning to play Go completely from scratch with no human supervision whatsoever. But the game of Go is clearly such a simple, closed, environment that it's difficult to see the analogous formulation in the messiness of reality. I love Go, but algorithmically and categorically, it is essentially a harder version of tic tac toe. The second example is that of animals, like squirrels. And here, personally, I am also quite hesitant whether it's appropriate because animals arise by a very different computational process and via different constraints than what we have practically available to us in the industry. Animal brains are nowhere near the blank slate they appear to be at birth. First, a lot of what is commonly attributed to "learning" is imo a lot more "maturation". And second, even that which clearly is "learning" and not maturation is a lot more "finetuning" on top of something clearly powerful and preexisting. Example. A baby zebra is born and within a few dozen minutes it can run around the savannah and follow its mother. This is a highly complex sensory-motor task and there is no way in my mind that this is achieved from scratch, tabula rasa. The brains of animals and the billions of parameters within have a powerful initialization encoded in the ATCGs of their DNA, trained via the "outer loop" optimization in the course of evolution. If the baby zebra spasmed its muscles around at random as a reinforcement learning policy would have you do at initialization, it wouldn't get very far at all. Similarly, our AIs now also have neural networks with billions of parameters. These parameters need their own rich, high information density supervision signal. We are not going to re-run evolution. But we do have mountains of internet documents. Yes it is basically supervised learning that is ~absent in the animal kingdom. But it is a way to practically gather enough soft constraints over billions of parameters, to try to get to a point where you're not starting from scratch. TLDR: Pretraining is our crappy evolution. It is one candidate solution to the cold start problem, to be followed later by finetuning on tasks that look more correct, e.g. within the reinforcement learning framework, as state of the art frontier LLM labs now do pervasively. I still think it is worth to be inspired by animals. I think there are multiple powerful ideas that LLM agents are algorithmically missing that can still be adapted from animal intelligence. And I still think the bitter lesson is correct, but I see it more as something platonic to pursue, not necessarily to reach, in our real world and practically speaking. And I say both of these with double digit percent uncertainty and cheer the work of those who disagree, especially those a lot more ambitious bitter lesson wise. So that brings us to where we are. Stated plainly, today's frontier LLM research is not about building animals. It is about summoning ghosts. You can think of ghosts as a fundamentally different kind of point in the space of possible intelligences. They are muddled by humanity. Thoroughly engineered by it. They are these imperfect replicas, a kind of statistical distillation of humanity's documents with some sprinkle on top. They are not platonically bitter lesson pilled, but they are perhaps "practically" bitter lesson pilled, at least compared to a lot of what came before. It seems possibly to me that over time, we can further finetune our ghosts more and more in the direction of animals; That it's not so much a fundamental incompatibility but a matter of initialization in the intelligence space. But it's also quite possible that they diverge even further and end up permanently different, un-animal-like, but still incredibly helpful and properly world-altering. It's possible that ghosts:animals :: planes:birds. Anyway, in summary, overall and actionably, I think this pod is solid "real talk" from Sutton to the frontier LLM researchers, who might be gear shifted a little too much in the exploit mode. Probably we are still not sufficiently bitter lesson pilled and there is a very good chance of more powerful ideas and paradigms, other than exhaustive benchbuilding and benchmaxxing. And animals might be a good source of inspiration. Intrinsic motivation, fun, curiosity, empowerment, multi-agent self-play, culture. Use your imagination.
Dwarkesh Patel@dwarkesh_sp

.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI

English
415
1.2K
9.5K
2M
Dhruv Patel retweetledi
Danfei Xu
Danfei Xu@danfei_xu·
T-2 to CoRL 🇰🇷🤖 It turns out naively cotraining human and robot data doesn't work as well as we think. We found that, despite better empirical performance, cotraining policies tends to learn disjoint human and robot representations. This limits the policy's ability to transfer skills that only exists in human data. Introducing EgoBridge (NeurIPS'25, CoRL'25 H2R), a principled technique to align human and robot latent policy distributions with Optimal Transport.
Ryan Punamiya@ryan_punamiya

Robots struggle to learn new skills from human videos. Why? We found that naive co-training produces disjoint distributions. Our EgoBridge (NeurIPS’25) extends Optimal Transport to align human-robot latents, improving success by 44% and generalization to human-only tasks!🧵

English
3
18
107
16.1K
Dhruv Patel
Dhruv Patel@dhruvpatel2012·
Our latest endeavour - EgoBridge (#NeurIPS2025, #CoRL2025 workshop) - extends EgoMimic to align human–robot latent spaces in an action-aware way, going beyond naive cotraining, to improve scene/behavior generalization. Catch Ryan at 11:00am, H2R @corl_conf !
Ryan Punamiya@ryan_punamiya

Robots struggle to learn new skills from human videos. Why? We found that naive co-training produces disjoint distributions. Our EgoBridge (NeurIPS’25) extends Optimal Transport to align human-robot latents, improving success by 44% and generalization to human-only tasks!🧵

English
0
0
3
236
Dhruv Patel
Dhruv Patel@dhruvpatel2012·
Egocentric human videos offer a rich source to tap for advancing mobile manipulation 🚀. Led by @LawrenceZhu22 , we’re excited to introduce EMMA - our next step beyond Egomimic, taking us from tabletop setups to real-world mobile manipulation. More details in below thread 👇
Lawrence Yunzhou Zhu@LawrenceZhu22

Can we scale up mobile manipulation with egocentric human data? Meet EMMA: Egocentric Mobile MAnipulation EMMA learns from human mobile manipulation + static robot data — no mobile teleop needed! EMMA generalizes to new scenes and scales strongly with added human data. 1/9

English
1
1
7
361
Dhruv Patel retweetledi
Russ Tedrake
Russ Tedrake@RussTedrake·
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the technology, and to share a lot of details for how we're achieving it. youtube.com/watch?v=BEXFnr…
YouTube video
YouTube
English
8
105
487
87.5K
Chris Paxton
Chris Paxton@chris_j_paxton·
If you're at ICRA looking for a robotics job, don't hesitate to talk to me. We're looking for roles in manipulation/grasping, perception, and SLAM for humanoids
English
12
6
175
14.4K
Dhruv Patel retweetledi
Danfei Xu
Danfei Xu@danfei_xu·
Human Data + Robot Learning = Competence and Scalability. Submit your finished/ongoing work to our RSS workshop by May 15th! Best paper award generously sponsored by Meta
Snehal Jauhri@SnehalJauhri

Excited to announce EgoAct🥽🤖: the 1st Workshop on Egocentric Perception & Action for Robot Learning @ #RSS2025 in LA! We’re bringing together researchers exploring how egocentric perception can drive next-gen robot learning! Full info: egoact.github.io/rss2025 @RoboticsSciSys

English
0
6
27
3.8K
Brett Adcock
Brett Adcock@adcock_brett·
Georgia Tech and Meta shared a new video on how they're using Meta’s Project Aria glasses to train humanoids They developed an algorithm that leverages human data to speed up robot learning Interesting results
English
2
15
235
16.6K
Brett Adcock
Brett Adcock@adcock_brett·
Significant progress in AI and Robotics this week. So, I summarized everything from Figure, xAI, Microsoft, NVIDIA, Meta, Clone Robotics, Perplexity, Humane, Stanford, and more. Here's everything you need to know and how to make sense out of it:
English
90
258
2.8K
924.8K
Dhruv Patel retweetledi
The Humanoid Hub
The Humanoid Hub@TheHumanoidHub·
Georgia Tech's research uses Meta’s Project Aria glasses to train humanoid robots with egocentric data. PhD student Simar Kareer developed an algorithm that leverages human data to enhance robot learning, achieving a 400% performance boost with just 90 minutes of recordings.
English
14
84
716
65.7K