Ignat Georgiev

28 posts

Ignat Georgiev

Ignat Georgiev

@imgeorgiev

Robot Learning PhD @ Georgia Tech

Atlanta, GA Katılım Mayıs 2018
85 Takip Edilen443 Takipçiler
Sabitlenmiş Tweet
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
We have a new ICML paper! Adaptive Horizon Actor Critic (AHAC). Joint work with @krishpopdesu @xujie7979 @eric_heiden @animesh_garg AHAC is a first-order model-based RL algorithm that learns high-dimensional tasks in minutes and outperforms PPO by 40%. 🧵(1/4)
English
4
63
363
52.1K
Ignat Georgiev retweetledi
Danfei Xu
Danfei Xu@danfei_xu·
Introducing EgoVerse: an ecosystem for robot learning from egocentric human data. Built and tested by 4 research labs + 3 industry partners, EgoVerse enables both science and scaling 1300+ hrs, 240 scenes, 2000+ tasks, and growing Dataset design, findings, and ecosystem 🧵
English
34
158
857
252.1K
Ignat Georgiev retweetledi
Ignat Georgiev retweetledi
Albert Wilcox
Albert Wilcox@albertwilcoxiii·
Imitation learning has seen great success, but IL policies still struggle with OOD observations We designed a 3D backbone, Adapt3R, that can combine with your favorite IL algorithm to enable zero-shot generalization to unseen embodiments and camera viewpoints!
English
2
26
206
17.6K
Ignat Georgiev retweetledi
Jeremy Collins
Jeremy Collins@jerthesquare_·
Robotics data is expensive and slow to collect. Robotics labs and companies spend months just to collect around 10k hours of demonstration data, all while that much video is uploaded to YouTube every 20 minutes. However, none of this video data contains action labels. How can we bridge the gap? AMPLIFY solves this problem by learning Actionless Motion Priors that unlock better sample efficiency, generalization, and scaling for robot learning.
English
10
71
531
90.6K
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
Exciting opportunity!
English
0
0
4
442
Ignat Georgiev retweetledi
Edward Johns
Edward Johns@Ed__Johns·
I have Post-Doc, PhD, and Research Assistant positions available, for a new project working with me on dexterous robot learning! Come and join me, my team, and our robots, at Imperial College London! 🤖🧑‍💻🇬🇧💂🤖 See robot-learning.uk/dexterous-robo… for more info. Please share/retweet!
English
1
37
215
25K
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
@Stone_Tao Yep fully agree with all you mentioned. I'm not sure if better rewards or task decomposition is the answer though. Here's one more piece of thought. Rewards we can engineer to be smoother but even then how do we make the dynamics smooth?
English
0
0
3
330
Stone Tao
Stone Tao@Stone_Tao·
very nice post! Still dissecting it a little more deeply myself but the part on non trivial optim. landscape really hits home something that has troubled me a lot Trying to approximate a non-smooth function (or piece-wise smooth) is fairly problematic because the value function can never do well around sharp transitions. I once experimented with a fairly strange function approximation problem that i documented here: blog.stoneztao.com/posts/nn-fnc-a…. Essentially smooth neural nets (lots of tanh activations) are super good smooth function approximators, but fail at where the function has sharp/instantaneous changes. One solution proposed was to combine decision trees to directly figure out where that sharp change is (DTs after all are very good with tabular like data, in many ways the staged rewards is like a tabular data classification problem). In all my PPO experiments with Maniskill the value function loss spikes essentially exactly around when the model learns to get to the next stage of the reward function. Designing reward functions with smoother transitions between stages (something we did not do consistently in ManiSkill) makes PPO more stable. That being said this does speak to the deeper more complex nature of reward function design. I say this blog post reinforces my belief that using LLMs as a way to scale up reward function generation is still very far away (and hence limiting the use-case for dense reward RL for robotics significantly). But still the avenue of RL + sparse reward + demonstration data (less data than used by IL) has strong promise still.
Stone Tao tweet media
English
3
1
27
2.8K
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
Behavior Cloning (BC) has been the new hot thing in #Robotics for the past year. I finally sucked my teeth into it and tried to decipher why it has worked so well for problems where RL struggles imgeorgiev.com/2025-01-31-why… Let me know if you have other interesting perspectives!
English
12
27
189
19.9K
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
@adv8p Yeah I had this point brought up on linkedin. Dagger is very much still a theoretical issue but for some reason I don't see it materialize that often now in the world of large(r) data and trajectory chunking. Thoughts?
English
1
0
0
127
underscore advait patel
underscore advait patel@_advaitpatel·
@imgeorgiev this does not quite seem to be true - isn’t the DAgger problem a thing? other than that cool post!
underscore advait patel tweet media
English
1
0
0
251
Ignat Georgiev retweetledi
Utkarsh Mishra
Utkarsh Mishra@utkarshm0410·
How can robots compositionally generalize over multi-object multi-robot tasks for long-horizon planning? At #CoRL2024, we introduce Generative Factor Chaining (GFC), a diffusion-based approach that composes spatial-temporal factors into long-horizon skill plans. (1/7)
English
2
32
147
24.1K
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
@jia_xuhui Just published a paper on multi-task deep RL with large models, so I'm very interested in this opportunity!
English
0
0
0
892
Xuhui Jia
Xuhui Jia@jia_xuhui·
Our team at Google DeepMind is hiring a Research intern specializing in Multi-modal Generative Models starting this fall or as soon as possible. The position may be extended, and we prefer candidates with a PhD background in diffusion modeling, image grounding, and/or deep RL.
English
3
23
217
35.9K
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
Completely forgot to give the time - Hall C 4-9, Wednesday, 7:30 - 9:00 am!
English
5
0
1
526
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
We derive a bound on this sample error and find that backproping through contact. From these insights, we propose AHAC, which dynamically adapts its horizon to avoid differentiating through contact. AHAC scales to 152 action dimension tasks and beats model-free baselines by 40%!
Ignat Georgiev tweet media
English
2
0
3
755
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
AHAC is a first-order RL method that uses gradients from the sim to learn fast and also better policies - outperforming PPO by 40%. Differentiable simulations are a powerful framework to scale RL. However, even when given ground-truth dynamics, not all gradients are useful!
English
1
0
3
578
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
Excited for ICML next week! I'll be presenting Adaptive Horizon Actor Critic - a model-based RL method that learns high dim tasks in minutes using differentiable simulation. Stop by Hall C 4-9 or get in touch if you want to grab a coffee some other time! More on AHAC in the 🧵
English
7
38
306
25.2K
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
This means that PWM can scale to billion-parameter models more effectively. Incredibly, multi-task PWM almost matches the performance of single-task experts like DreamerV3 and SAC Check out the paper, code and models at imgeorgiev.com/pwm
English
0
1
8
693
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
We also tested PWM on 30 and 80 multi-task settings from dm_control and MetaWorld. After training a single large multi-task world model, we extract policies in <10 min per task using PWM. We surpassed TD-MPC2 by 27% and 8% respectively, without the need for online planning! 🧵
Ignat Georgiev tweet media
English
1
0
6
760
Ignat Georgiev
Ignat Georgiev@imgeorgiev·
🔔New Paper - PWM: Policy Learning with Large World Models Joint work with @VarunGiridhar3 @ncklashansen @animesh_garg PWM is a multi-task RL method which solves 80 tasks across different embodiments in <10m per task using world models and first-order gradient optimization🧵
English
1
25
150
19.1K