Ajay Mandlekar

230 posts

Ajay Mandlekar

@AjayMandlekar

Simulation Research for GR00T @ NVIDIA GEAR Lab | EE PhD @Stanford

Stanford, CA Katılım Kasım 2019

400 Takip Edilen2.8K Takipçiler

Sabitlenmiş Tweet

Ajay Mandlekar@AjayMandlekar·25 Eki

Tired of endlessly teleoperating your robot in order to train it? Introducing SkillMimicGen, a data generation system that automatically scales robot imitation learning by synthesizing demos through integrating motion planning and demo adaptation. skillgen.github.io 1/

English

172

35.3K

Ajay Mandlekar@AjayMandlekar·7 Nis

@stepjamUK Thanks for highlighting our work @stepjamUK ! Totally agree that data is the key factor for advances in robot manipulation - we’re excited to continue using simulation as an enabler for sourcing good quality data

English

154

Ajay Mandlekar retweetledi

Stephen James@stepjamUK·7 Nis

𝗧𝗵𝗲 𝗿𝗲𝗮𝘀𝗼𝗻 𝗿𝗼𝗯𝗼𝘁𝘀 𝘀𝘁𝗶𝗹𝗹 𝘀𝘁𝗿𝘂𝗴𝗴𝗹𝗲 𝘄𝗶𝘁𝗵 𝗳𝗼𝗼𝗱, 𝗳𝗮𝗯𝗿𝗶𝗰, 𝗼𝗿 𝗳𝗹𝗲𝘅𝗶𝗯𝗹𝗲 𝗽𝗮𝗿𝘁𝘀 𝗮𝘁 𝘀𝗰𝗮𝗹𝗲 𝗶𝘀𝗻’𝘁 𝗷𝘂𝘀𝘁 𝗵𝗮𝗿𝗱𝘄𝗮𝗿𝗲 𝗼𝗿 𝗺𝗼𝗱𝗲𝗹𝘀. 𝗜𝘁’𝘀 𝗱𝗮𝘁𝗮. Rigid-body manipulation has dominated robot learning research because simulation handles it cleanly. Deformable objects don't simulate easily, so teams have been forced to collect everything by hand. Slowly, expensively, and at a scale that doesn't generalise. @nvidia and the @UofT recently published SoftMimicGen: a data generation system that extends automated demonstration synthesis to deformable object manipulation for the first time. Here's why it matters. • 𝗦𝗰𝗮𝗹𝗲 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝘁𝗵𝗲 𝗵𝗲𝗮𝗱𝗰𝗼𝘂𝗻𝘁. SoftMimicGen generates thousands of demonstrations from a small number of human-collected seeds, dramatically reducing dataset bottlenecks. • 𝗔 𝗹𝗼𝗻𝗴-𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝗴𝗮𝗽, 𝗰𝗹𝗼𝘀𝗲𝗱. The approach builds on methods that worked for rigid objects and brings them into domains like food handling, healthcare, and logistics. • 𝗦𝗶𝗺𝘂𝗹𝗮𝘁𝗶𝗼𝗻 𝘁𝗵𝗮𝘁 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝘁𝗿𝗮𝗻𝘀𝗳𝗲𝗿𝘀. Policies trained on generated data show strong real-world generalisation, narrowing the sim-to-real gap. Every major advance in robot learning traces back to a data problem. Better algorithms, better architectures, better hardware - none of it fires without the training data to back it up. Collecting that data manually doesn't scale. The infrastructure to generate, manage, and deploy it does. Great work @MasoudMoghani, @AjayMandlekar, Sean Huver, and the full NVIDIA and University of Toronto team on this piece of work. Link to paper: arxiv.org/html/2603.2572…

English

7.5K

Ajay Mandlekar@AjayMandlekar·2 Nis

Deformable manipulation has long stalled simulation adoption. SoftMimicGen reduces this barrier with 4 embodiments, scalable data generation, and proven sim-to-real results. Check out Masoud’s thread below for more!

Masoud Moghani@MasoudMoghani

Simulations scale for rigid objects, but deformable objects remain an open frontier. SoftMimicGen generates large-scale training data from just a handful of demonstrations. softmimicgen.github.io

English

6.9K

Ajay Mandlekar retweetledi

William Shen@shenbokui·6 Mar

Excited to introduce Uni-1, our new multimodal model that *unifies* understanding and generation. TLDR: a team of ~15 researchers is going pound-for-pound with nano banana and gpt image 🧵

Jiaming Song@baaadas

Excited to introduce Uni-1, our new *unified* multimodal model that does both understanding and generation: lumalabs.ai/uni-1 TLDR: I think Uni-1 @LumaLabsAI is > GPT Image 1.5 in many cases, and toe-to-toe with Nano Banana Pro/2. (showcase below)

English

540

77K

Ajay Mandlekar retweetledi

Danfei Xu@danfei_xu·25 Şub

Human data becomes far more useful when robots are human-like. Excited to share a milestone from our work at GEAR: we trained a VLA model on 20k+ hours of in-the-wild human data and deployed it on robots with 22-DoF hands. Key findings: - Near log-linear scaling between human data volume and action accuracy (R² = 0.998), predictive of real dexterous performance - Few-shot generalization begins to emerge at this scale, with some tasks solved from a single demo - Policies trained on humans transfer across embodiments, including lower-DoF hands Simple recipe + scale = new capabilities Outside of the paper, we also discovered other emerging properties such as strong language following. More to come!

Ruijie Zheng@ruijie_zheng12

Proud to introduce EgoScale: We pretrained a GR00T VLA model on 20K+ hours of egocentric human video and discovered that robot dexterity can be scaled, not with more robots, but with more human data. A thread on 🧵what we learned. 👇

English

127

13.7K

Ajay Mandlekar retweetledi

Ruijie Zheng@ruijie_zheng12·25 Şub

English

330

95.7K

Ajay Mandlekar retweetledi

Zhengyi “Zen” Luo@zhengyiluo·20 Şub

SONIC is now open-source! Generalist whole-body teleoperation for EVERYONE! Our team has long been building comprehensive pipelines for whole-body control, kinematic planner, and teleoperation, and they will all be shared. This will be a continuous update; inference code + model already there, training code and gr00t integration coming soon! Code: github.com/NVlabs/GR00T-W… Docs: nvlabs.github.io/GR00T-WholeBod… Site: nvlabs.github.io/GEAR-SONIC/

English

203

914

212.4K

Ajay Mandlekar@AjayMandlekar·13 Şub

Scaling robotics via synthetic data is limited by the sim-to-real gap. Introducing Point Bridge: a framework using unified point-based representations to enable zero-shot policy transfer without complex visual alignment. Sim-to-real, simplified! pointbridge3d.github.io

Siddhant Haldar@haldar_siddhant

Robot foundation models are limited by costly real data, while simulation data is plentiful but visually mismatched to reality. We present Point Bridge, a method that enables zero-shot sim-to-real transfer for robot learning with minimal visual alignment. pointbridge3d.github.io

English

9.8K

Ajay Mandlekar retweetledi

Ankur Handa@ankurhandos·3 Oca

Terrific summary of LLMs in 2025. Reasoning and coding agents were the real highlight for me this year. Some personal reflections: > I have been using coding models since TabNine but only this year have I felt a big step change. I use cursor regularly for my coding and this year vast majority of my code was written by coding models. They are not perfect and you have to nudge them often but being able to save time writing repetitive and boilerplate code (and it writes much faster) without having to look up API calls etc. is really a huge advantage. > I switched to claude-opus-4.5 and let it use terminal recently (felt terrifying initially) but I gave in and realised the amount of time I could save by letting it run the code and look at the errors and modify the code and repeat until it worked. Not perfect again but still far better than what I had imagined and I really like it now. The command allowlist is a great way to tell the model which commands it can run directly in the terminal without asking for your permissions. This way I’m not the bottleneck and it can run autonomously. > It does make subtle errors at times and writes verbose code but it error corrects pretty quickly especially when it can run code and verify. Working with physics engines means you can always ask it to test if the scene looks physically correct, no interpenetrations, textures loading properly etc. and these models are going to continue to get better. > I also use it regularly on new repos I git clone to get summary of the code and various key functions that I can call and get going with quickly. I’m not a fan of nested directory structures that many repos have so this saves me a lot of time. It prepares a new README.md file for me that is often better than the README.md that comes with the code. > I also agree with Simon here that it was truly the year of Gemini. I use it more often now and for image generation and slide deck generation it is indisputably better than anything else out there. I really love notebooklm. Even when I’m reading books and I come across a very dense page, I take an image and run it through notebooklm to explain it and it creates a nice slide deck with pretty visualizations. I still use gpt-5.2 for day to day stuff and queries. > I use all three of them now and they all have different personalities and it looks like we have reached a point where instead of using one model for everything, we have specialized models for coding, image-gen and day-to-day queries.

Simon Willison@simonw

Here's my enormous round-up of everything we learned about LLMs in 2025 - the third in my annual series of reviews of the past twelve months simonwillison.net/2025/Dec/31/th… This year it's divided into 26 sections! This is the table of contents:

English

1.5K

Ajay Mandlekar retweetledi

Bowen Wen@bowenwen_me·17 Ara

A new milestone for real-time accurate 3D spatial computing! Introducing ⚡️Fast-FoundationStereo⚡️, a real-time zero-shot stereo depth estimation model that accelerates the original FoundationStereo by >10x with comparable quality. Details in threads 🧵 (1/N)

English

472

76.7K

Ajay Mandlekar@AjayMandlekar·1 Ara

Sim-and-Real Co-Training is a simple and effective way to make use of large-scale synthetic simulation data for real-world manipulation. In our latest work we develop an approach to make it much more effective using an aligned representation space. Check it out at #NeurIPS !

Shuo Cheng@ShuoCheng94

Can large-scale sim data enable real-world generalization?🤔 In our new work, we introduce a generalizable domain adaptation setting, where policies must handle real-world situations never presented in the real training data. (1/n)

English

4.4K

Ajay Mandlekar retweetledi

Adithya Murali@Adithya_Murali_·14 Eki

Happy to share that I’ve been selected for MIT Technology Review Innovators Under 35 Asia Pacific 2025! @techreview 🙏 Grateful to my mentors and research collaborators at @NVIDIARobotics, CMU and beyond. This recognizes some of our work on scaling robot learning with procedural simulation and low-cost robots back in the day at @CMU_Robotics. Congratulations to all the other awardees recognized this year. 🔗 tr35.mittrasia.com/awards

English

9.7K

Ajay Mandlekar@AjayMandlekar·9 Eki

Great to see NVIDIA GR00T included in this year's report!

Nathan Benaich@nathanbenaich

🪩The one and only @stateofai 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:

English

875

Ajay Mandlekar@AjayMandlekar·28 Eyl

Had a great time speaking at the workshop - thanks for inviting me! @haldar_siddhant @DJiafei

Jiafei Duan@DJiafei

Thanks to everyone who joined the GenPrior workshop yesterday! We had a full house and a stellar lineup of speakers. Huge thanks to our speakers (@gao_young @haroldsoh @GeorgiaChal @Ed__Johns @leto__jean @xiaolonw @AjayMandlekar )for their insightful talks and panel!

English

551

Ajay Mandlekar retweetledi

Jiafei Duan@DJiafei·22 Eyl

Excited for the #GenPrior workshop at CoRL next week — 27 Sept, Room 307, full day. We’ll host a panel with @leto__jean • @AjayMandlekar • @GeorgiaChal • @Ed__Johns • @haroldsoh on building generalizable priors & representations for generalist robots. What’s the ONE question you’d ask the panelists? Drop it 👉 forms.gle/XdhUYr5NJqb6eS…

English

13.1K

Ajay Mandlekar retweetledi

Danfei Xu@danfei_xu·9 Eyl

A year ago we introduced EgoMimic. Now I'm excited to share a major update: Egocentric Human Data for Mobile Manipulation Robot teleop is costly. Scaling mobile manip teleop is borderline impossible. Can we learn mobile manipulation from human data? The answer is Yes. Our system EMMA learns from human mobile manipulation data and static robot data --- no mobile teleop needed! We showed that the trained mobile manipulation policy generalizes to environments only seen in human data. Most importantly, we saw a strong scaling law where, with 1 hour of human data, EMMA significantly outperformed MobileALOHA trained on 1 hour of teleoperated robot data. Check out our project lead @LawrenceZhu22's thread below 👇

Lawrence Yunzhou Zhu@LawrenceZhu22

Can we scale up mobile manipulation with egocentric human data? Meet EMMA: Egocentric Mobile MAnipulation EMMA learns from human mobile manipulation + static robot data — no mobile teleop needed! EMMA generalizes to new scenes and scales strongly with added human data. 1/9

English

202

28.8K

Ajay Mandlekar retweetledi

Jun Yamada@junjungoal·9 Eyl

How can a closed-loop policy safely and robustly grasp novel objects in cluttered environments? We introduce Grasp-MPC: a hybrid of model-based control and data-driven approaches for generalisable and safe 6DoF closed-loop grasping. 🧵👇 (1/N)

English

18.7K

Ajay Mandlekar@AjayMandlekar·18 Tem

We open-source everything - datasets, simulation environments, our data generation framework, and training code, built on our new robomimic v0.5 release! github.com/GaTech-RL2/mim…

English

239

Ajay Mandlekar@AjayMandlekar·18 Tem

Large datasets play a crucial role in modern robotics but we don’t understand what data is the most important to collect. I’m thrilled to announce MimicLabs, a synthetic data generation approach to studying this problem! Check it out! robo-mimiclabs.github.io

Vaibhav Saxena@saxenavaibhav11

Large robot datasets are crucial for training 🤖foundation models. Yet, we lack systematic understanding of what data matters. Introducing MimicLabs ✅System to generate large synthetic robot 🦾 datasets ✅Data-composition study 🗄️ on how to collect and use large datasets 🧵1/

English

3.6K

Ajay Mandlekar retweetledi

Jiafei Duan@DJiafei·15 Tem

1/ 🚀 Announcing #GenPriors — the CoRL 2025 workshop on Generalizable Priors for Robot Manipulation! 📍 Seoul, Korea 📅 Sat 27 Sep 2025. Mark your calendars & join us for a full day of discussion on building generalist robot policies are those capable of performing complex manipulation tasks across a wide range of environments. 🔗 corl25-genpriors.github.io

English

5.4K

Keşfet

@stepjamUK @nvidia @UofT @MasoudMoghani @techreview @NVIDIARobotics @CMU_Robotics @haldar_siddhant