Ankit Goyal

212 posts

Ankit Goyal banner
Ankit Goyal

Ankit Goyal

@imankitgoyal

Foundation Models for Robotics, Senior Research Scientist @ NVIDIA

Seattle, WA Katılım Mart 2020
674 Takip Edilen3K Takipçiler
Sabitlenmiş Tweet
Ankit Goyal
Ankit Goyal@imankitgoyal·
What's the right architecture for a VLA? VLM + custom action heads (π₀)? VLM with special discrete action tokens (OpenVLA)? Custom design on top of the VLM (OpenVLA-OFT)? Or... VLM with ZERO modifications? Just predict action as text. The results will surprise you. VLA-0: Outperforms π₀, GR00T-N1, MolmoAct, SmolVLA. With ZERO changes to the VLM. 🧵⬇️
English
20
77
572
106K
Nishanth Kumar
Nishanth Kumar@nishanthkumar23·
PhDone and officially a robot doctor! Thanks to incredible advisors Leslie and Tomás @MIT_LISLab and my secret bonus advisor @tomssilver - I’m so so lucky to have gotten to do research with them. I’m excited to keep working on making useful generalist robots a reality 🤖🚀
Nishanth Kumar tweet media
English
22
3
140
6.2K
Ankit Goyal
Ankit Goyal@imankitgoyal·
Evaluation is a critical bottleneck in building robot foundation models. Check out our latest work RoboLab, led by @xuningy, which addresses this exact challenge. Its a high-fidelity simulation environment for testing these models. A truly generalist policy should be able to complete these tasks zero-shot, and this benchmark highlights exactly how far we still have to go. More info 👇
Xuning Yang@xuningy

When every generalist robot model scores 95%+ on a benchmark, the numbers become meaningless. What if we built a photorealistic benchmark that never saturates and can generate new scenes and tasks with AI Workflows in minutes? We introduce RoboLab! 🧵(1/6)

English
2
10
75
29.2K
Ankit Goyal retweetledi
Xuning Yang
Xuning Yang@xuningy·
When every generalist robot model scores 95%+ on a benchmark, the numbers become meaningless. What if we built a photorealistic benchmark that never saturates and can generate new scenes and tasks with AI Workflows in minutes? We introduce RoboLab! 🧵(1/6)
Xuning Yang tweet media
English
9
26
144
26.7K
Qixing Huang
Qixing Huang@qixing_huang·
Congratulations to @geopavlakos on winning an NSF career award (nsf.gov/awardsearch/sh…). When George was hired two years ago, we did expect him to do well. Yet, his performance has exceeded our expectations: NSF medium + Career + a paper award at ICCV + Best thesis co-mentor
English
8
4
84
28.6K
Ankit Goyal
Ankit Goyal@imankitgoyal·
They also introduce Textual FAST, outperforms plain FAST, textual time-based actions and discrete tokens. The bigger takeaway is that robustness can come from good action representation and inference design. Not just only scale. (3/3)
English
0
1
5
783
Ankit Goyal
Ankit Goyal@imankitgoyal·
A diffusion expert drafts candidate actions, then the textual action VLM verifies in one pass. Textual actions improve both performance and policy behaviour: more recovery attempts, fewer pre-grasp collisions, better under shift. arxiv.org/abs/2603.18091 (2/3)
English
1
1
13
992
Ankit Goyal
Ankit Goyal@imankitgoyal·
Shoutout to Zhao et al. for ADV. A great follow-up to VLA-0! The idea is simple and the numbers are impressive. On LIBERO, ADV gets about the same perf as Cosmos Policy, ~10x cheaper to train by my back-of-the-envelope math, and runs at 40Hz. Nice read if you work on VLAs. (1/3)
Ankit Goyal tweet media
English
1
16
116
7.4K
Roozbeh Mottaghi
Roozbeh Mottaghi@RoozbehMottaghi·
I will be wrapping up my time at Meta this week. Proud of the work and impact at FAIR, one of the world’s leading AI orgs, and of what we accomplished over the past few years: Habitat 3.0 [ai.meta.com/blog/habitat-3…], a simulator for humans and robots, PARTNR [ai.meta.com/blog/machine-i…], a benchmark and a suite of models for embodied planning and reasoning (announced by Zuck [facebook.com/share/p/1A9nJh…] [@zuck/post/DBy2LVRPSJs?xmt=AQF0z_fIUOsxhOUu1PEp_CXZ7o_nGsDUGrfFW66WIeANmg" target="_blank" rel="nofollow noopener">threads.com/@zuck/post/DBy…]), PerceptionCache [openreview.net/pdf?id=79BOATB…], a learnable memory for planning, HomeRobot [ovmm.github.io], an open-vocabulary object manipulation framework, GO to AnyThing (GOAT) [theophilegervet.github.io/projects/goat/], a state-of-the-art navigation model tested across multiple AirBnBs, and many other unpublished work, smaller projects, and demos such as our robot demos in the event before the White House Correspondents’ Dinner [washingtonainetwork.com/2024/04/30/was…] and the 10th anniversary of FAIR Paris [@yannlecun/post/DFylqyJM3CU?xmt=AQF0wRTh79LmSbezq-7od_vYxeUe6LQKs9_Ft--wI_sbSQ" target="_blank" rel="nofollow noopener">threads.com/@yannlecun/pos…].
Roozbeh Mottaghi tweet media
English
10
6
168
11.5K
Ankit Goyal
Ankit Goyal@imankitgoyal·
@PrathamJainAI Something like ~20GB GPU is needed for the current code (raw pytorch).
English
1
0
2
54
Ankit Goyal
Ankit Goyal@imankitgoyal·
@GeeveGeorge @GeeveGeorge VLA-0 should perform adequately with enough data, though empirical testing will be required to find the best model.
English
1
0
1
162
Geeve George
Geeve George@GeeveGeorge·
@imankitgoyal @imankitgoyal i am experimenting with openvla, minivla and now vla-0. my goal is to teach a 4 dof arm how to react to it, like if it sees a human it does X , if it doesn't see human it does Y, to give life to a 4 dof arm. any suggestions on what works best for this.
English
1
0
1
225
Reza Sayar
Reza Sayar@iamRezaSayar·
@imankitgoyal very interesting! i wonder if we could swap the fine-tuning with In-ContextLearning and use the absolute best VLM available, like Gemini 3 Pro, etc.👀
English
1
0
2
125
Ankit Goyal
Ankit Goyal@imankitgoyal·
What's the right architecture for a VLA? VLM + custom action heads (π₀)? VLM with special discrete action tokens (OpenVLA)? Custom design on top of the VLM (OpenVLA-OFT)? Or... VLM with ZERO modifications? Just predict action as text. The results will surprise you. VLA-0: Outperforms π₀, GR00T-N1, MolmoAct, SmolVLA. With ZERO changes to the VLM. 🧵⬇️
English
20
77
572
106K
Ankit Goyal
Ankit Goyal@imankitgoyal·
To my friends and family in India Please raise your voice and DEMAND clean air! It is your fundamental right. Think about the youngest member of your family. What have they done to lose years of their life just because they are born in India. Enough of ignorance.
English
0
0
11
1.6K
Ankit Goyal
Ankit Goyal@imankitgoyal·
The launch of the first humanoid for consumers, Neo-X, is truly exciting! Many are claiming this means robot learning is solved and that 1X has leapfrogged everyone else, but the real picture is much more nuanced. From a hardware and platform perspective, it looks incredibly promising. Time will tell, but I'm optimistic that if 1X is willing to ship it, it must be robust enough. Kudos to them for this effort! However, from an AI (Robot Learning) standpoint, based on everything I've seen, the robot isn't quite there yet—definitely nothing that other industry or research labs can't do. But the playbook is clear: deploy robots in homes, collect more data, and continuously improve the model, much like the Tesla Autonomy flywheel. I really liked the raw review by Joanna Stern (@JoannaStern) from The Wall Street Journal (@WSJ). As said in the article, "the next few years isn't about owning a super useful robot, it's about raising one." Couldn't agree more. Link to the WSJ review is below 👇
Ankit Goyal tweet media
English
1
0
14
2.1K
Nemo
Nemo@xkxxhk·
@imankitgoyal In your evaluation results you have compared vla-0 against pi05-ki without pre-training. What is that exactly? Is it off the shelf paligemma weights + uninitialised action expert fine-tuned on libero?
English
1
0
2
168
Homanga Bharadhwaj
Homanga Bharadhwaj@mangahomanga·
I'll be joining the faculty @JohnsHopkins late next year as a tenure-track assistant professor in @JHUCompSci Looking for PhD students to join me tackling fun problems in robot manipulation, learning from human data, understanding+predicting physical interactions, and beyond!
Homanga Bharadhwaj tweet mediaHomanga Bharadhwaj tweet mediaHomanga Bharadhwaj tweet media
English
87
115
855
131.3K