Ankit Goyal (@imankitgoyal) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Ankit Goyal@imankitgoyal·16 Eki

What's the right architecture for a VLA? VLM + custom action heads (π₀)? VLM with special discrete action tokens (OpenVLA)? Custom design on top of the VLM (OpenVLA-OFT)? Or... VLM with ZERO modifications? Just predict action as text. The results will surprise you. VLA-0: Outperforms π₀, GR00T-N1, MolmoAct, SmolVLA. With ZERO changes to the VLM. 🧵⬇️

English

20

77

572

106K

Ankit Goyal@imankitgoyal·24 Nis

@nishanthkumar23 @MIT_LISLab @tomssilver Congrats Nishanth!

English

1

0

1

152

Nishanth Kumar@nishanthkumar23·23 Nis

PhDone and officially a robot doctor! Thanks to incredible advisors Leslie and Tomás @MIT_LISLab and my secret bonus advisor @tomssilver - I’m so so lucky to have gotten to do research with them. I’m excited to keep working on making useful generalist robots a reality 🤖🚀

English

22

3

140

6.2K

Ankit Goyal@imankitgoyal·21 Nis

Evaluation is a critical bottleneck in building robot foundation models. Check out our latest work RoboLab, led by @xuningy, which addresses this exact challenge. Its a high-fidelity simulation environment for testing these models. A truly generalist policy should be able to complete these tasks zero-shot, and this benchmark highlights exactly how far we still have to go. More info 👇

Xuning Yang@xuningy

When every generalist robot model scores 95%+ on a benchmark, the numbers become meaningless. What if we built a photorealistic benchmark that never saturates and can generate new scenes and tasks with AI Workflows in minutes? We introduce RoboLab! 🧵(1/6)

English

2

10

75

29.2K

Ankit Goyal retweetledi

Xuning Yang@xuningy·20 Nis

When every generalist robot model scores 95%+ on a benchmark, the numbers become meaningless. What if we built a photorealistic benchmark that never saturates and can generate new scenes and tasks with AI Workflows in minutes? We introduce RoboLab! 🧵(1/6)

English

9

26

144

26.7K

Ankit Goyal@imankitgoyal·2 Nis

@qixing_huang @geopavlakos Congrats @geopavlakos!

English

1

0

1

214

Qixing Huang@qixing_huang·2 Nis

Congratulations to @geopavlakos on winning an NSF career award (nsf.gov/awardsearch/sh…). When George was hired two years ago, we did expect him to do well. Yet, his performance has exceeded our expectations: NSF medium + Career + a paper award at ICCV + Best thesis co-mentor

English

8

4

84

28.6K

Ankit Goyal@imankitgoyal·31 Mar

They also introduce Textual FAST, outperforms plain FAST, textual time-based actions and discrete tokens. The bigger takeaway is that robustness can come from good action representation and inference design. Not just only scale. (3/3)

English

0

1

5

783

Ankit Goyal@imankitgoyal·31 Mar

A diffusion expert drafts candidate actions, then the textual action VLM verifies in one pass. Textual actions improve both performance and policy behaviour: more recovery attempts, fewer pre-grasp collisions, better under shift. arxiv.org/abs/2603.18091 (2/3)

English

1

13

992

Ankit Goyal@imankitgoyal·31 Mar

Shoutout to Zhao et al. for ADV. A great follow-up to VLA-0! The idea is simple and the numbers are impressive. On LIBERO, ADV gets about the same perf as Cosmos Policy, ~10x cheaper to train by my back-of-the-envelope math, and runs at 40Hz. Nice read if you work on VLAs. (1/3)

English

1

16

116

7.4K

Ankit Goyal@imankitgoyal·31 Mar

@RoozbehMottaghi Best of luck for the next adventure @RoozbehMottaghi!

English

0

1

153

Roozbeh Mottaghi@RoozbehMottaghi·30 Mar

I will be wrapping up my time at Meta this week. Proud of the work and impact at FAIR, one of the world’s leading AI orgs, and of what we accomplished over the past few years: Habitat 3.0 [ai.meta.com/blog/habitat-3…], a simulator for humans and robots, PARTNR [ai.meta.com/blog/machine-i…], a benchmark and a suite of models for embodied planning and reasoning (announced by Zuck [facebook.com/share/p/1A9nJh…] [@zuck/post/DBy2LVRPSJs?xmt=AQF0z_fIUOsxhOUu1PEp_CXZ7o_nGsDUGrfFW66WIeANmg" target="_blank" rel="nofollow noopener">threads.com/@zuck/post/DBy…]), PerceptionCache [openreview.net/pdf?id=79BOATB…], a learnable memory for planning, HomeRobot [ovmm.github.io], an open-vocabulary object manipulation framework, GO to AnyThing (GOAT) [theophilegervet.github.io/projects/goat/], a state-of-the-art navigation model tested across multiple AirBnBs, and many other unpublished work, smaller projects, and demos such as our robot demos in the event before the White House Correspondents’ Dinner [washingtonainetwork.com/2024/04/30/was…] and the 10th anniversary of FAIR Paris [@yannlecun/post/DFylqyJM3CU?xmt=AQF0wRTh79LmSbezq-7od_vYxeUe6LQKs9_Ft--wI_sbSQ" target="_blank" rel="nofollow noopener">threads.com/@yannlecun/pos…].

English

10

6

168

11.5K

Ankit Goyal@imankitgoyal·16 Ara

@PrathamJainAI Something like ~20GB GPU is needed for the current code (raw pytorch).

English

1

0

2

54

Pratham Jain@PrathamJainAI·16 Ara

@imankitgoyal Hi @imankitgoyal how much compute is needed to run inference .

English

1

0

1

103

Ankit Goyal@imankitgoyal·15 Ara

Happy to share that the code for VLA-0 is out now: github.com/NVlabs/vla0 Given its simplicity, it’s a great starting point to try out VLAs!

Ankit Goyal@imankitgoyal

What's the right architecture for a VLA? VLM + custom action heads (π₀)? VLM with special discrete action tokens (OpenVLA)? Custom design on top of the VLM (OpenVLA-OFT)? Or... VLM with ZERO modifications? Just predict action as text. The results will surprise you. VLA-0: Outperforms π₀, GR00T-N1, MolmoAct, SmolVLA. With ZERO changes to the VLM. 🧵⬇️

English

10

22

269

27.9K

Ankit Goyal@imankitgoyal·16 Ara

@GeeveGeorge @GeeveGeorge VLA-0 should perform adequately with enough data, though empirical testing will be required to find the best model.

English

1

0

1

162

Geeve George@GeeveGeorge·15 Ara

@imankitgoyal @imankitgoyal i am experimenting with openvla, minivla and now vla-0. my goal is to teach a 4 dof arm how to react to it, like if it sees a human it does X , if it doesn't see human it does Y, to give life to a 4 dof arm. any suggestions on what works best for this.

English

1

0

1

225

Ankit Goyal@imankitgoyal·16 Ara

@iamRezaSayar Great question. That is be a nice baseline to test.

English

0

2

84

Reza Sayar@iamRezaSayar·16 Ara

@imankitgoyal very interesting! i wonder if we could swap the fine-tuning with In-ContextLearning and use the absolute best VLM available, like Gemini 3 Pro, etc.👀

English

1

0

2

125

Ankit Goyal@imankitgoyal·16 Eki

What's the right architecture for a VLA? VLM + custom action heads (π₀)? VLM with special discrete action tokens (OpenVLA)? Custom design on top of the VLM (OpenVLA-OFT)? Or... VLM with ZERO modifications? Just predict action as text. The results will surprise you. VLA-0: Outperforms π₀, GR00T-N1, MolmoAct, SmolVLA. With ZERO changes to the VLM. 🧵⬇️

English

20

77

572

106K

Ankit Goyal@imankitgoyal·28 Kas

To my friends and family in India Please raise your voice and DEMAND clean air! It is your fundamental right. Think about the youngest member of your family. What have they done to lose years of their life just because they are born in India. Enough of ignorance.

English

0

11

1.6K

Ankit Goyal@imankitgoyal·29 Eki

youtube.com/watch?v=f3c4mQ…

YouTube

ZXX

0

3

993

Ankit Goyal@imankitgoyal·29 Eki

The launch of the first humanoid for consumers, Neo-X, is truly exciting! Many are claiming this means robot learning is solved and that 1X has leapfrogged everyone else, but the real picture is much more nuanced. From a hardware and platform perspective, it looks incredibly promising. Time will tell, but I'm optimistic that if 1X is willing to ship it, it must be robust enough. Kudos to them for this effort! However, from an AI (Robot Learning) standpoint, based on everything I've seen, the robot isn't quite there yet—definitely nothing that other industry or research labs can't do. But the playbook is clear: deploy robots in homes, collect more data, and continuously improve the model, much like the Tesla Autonomy flywheel. I really liked the raw review by Joanna Stern (@JoannaStern) from The Wall Street Journal (@WSJ). As said in the article, "the next few years isn't about owning a super useful robot, it's about raising one." Couldn't agree more. Link to the WSJ review is below 👇

English

1

0

14

2.1K

Ankit Goyal@imankitgoyal·28 Eki

Had a great time guest lecturing in @YuXiang_IRVL's course on Vision-Language-Action (VLA) models. Check out the full recording 👇

Yu Xiang@YuXiang_IRVL

Are you interested in Vision-Language-Action (VLA) models? We had an excellent guest lecture today by Ankit Goyal @imankitgoyal from NVIDIA on VLAs and their role in robot manipulation 🎥 Watch the recording here 👇 youtube.com/watch?v=IeNwXw… Slides: yuxng.github.io/Courses/CS6341…

English

1

7

53

7.8K

Ankit Goyal@imankitgoyal·20 Eki

@xkxxhk Yes, this paper has the pi-0.5-KI baseline: physicalintelligence.company/download/pi05_… (Tab 1)

English

0

2

129

Nemo@xkxxhk·20 Eki

@imankitgoyal In your evaluation results you have compared vla-0 against pi05-ki without pre-training. What is that exactly? Is it off the shelf paligemma weights + uninitialised action expert fine-tuned on libero?

English

1

0

2

168

Ankit Goyal@imankitgoyal·20 Eki

@mangahomanga @JohnsHopkins @JHUCompSci Congrats @mangahomanga! Best of luck.

English

0

1

714

Homanga Bharadhwaj@mangahomanga·20 Eki

I'll be joining the faculty @JohnsHopkins late next year as a tenure-track assistant professor in @JHUCompSci Looking for PhD students to join me tackling fun problems in robot manipulation, learning from human data, understanding+predicting physical interactions, and beyond!

English

87

115

855

131.3K

Ankit Goyal

Keşfet