Nishanth Kumar

1

68

Kevin Zakka@kevin_zakka·4d

Impending graduation is making me really sad, PhD has been so lovely 😭

English

8

0

130

6.6K

Nishanth Kumar@nishanthkumar23·4d

@VorobiovEugene We will release an IsaacSim integration :)

English

0

11

Eugene@VorobiovEugene·4d

@nishanthkumar23 Awesome, would love to be able to try it out in sim :)

English

0

1

11

Nishanth Kumar@nishanthkumar23·11 Mar

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

6

36

201

62.5K

Nishanth Kumar@nishanthkumar23·4d

@VorobiovEugene We’re getting some feedback from several beta testers and iterating rapidly to make it as easy to use as possible: hopefully will be able to release in the next week or so!

English

0

1

22

Eugene@VorobiovEugene·5d

@nishanthkumar23 This is awesome, when are you planning to release the code?

English

0

20

Nishanth Kumar retweetledi

Lucy Cai@LucyCai9·5d

Imagine you told a robot to "find your car keys" in your apartment and it looked around, opened a drawer, and retrieved them for you. As a step towards that, I adapted TiPToP to run on the RBY1 humanoid in our lab! Here's an example instruction it follows: "Put the green block on the blue plate and the yellow block on the magazine." TiPToP helps plan over the right arm + single torso joint, but it's easy to unlock more joints -- even the base wheels -- for more expressive, real-world tasks. Humans find objects without thinking twice. One day, robots will too! 🤖

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

4

13

89

14K

Nishanth Kumar@nishanthkumar23·13 Mar

@VectorWang2 Yes! We’re actually quite excited to integrate FFS directly and see how much it impacts overall execution time!

English

1

86

Vector Wang@VectorWang2·13 Mar

Exactly the pipeline I imagine with Foundation Stereo and SAM2, now with FFS, TiPToP can directly become realtime deployable!

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

6

26

3.9K

Nishanth Kumar@nishanthkumar23·12 Mar

@tonyzzhao Congrats!! Very excited for the new research and deployment milestones the team will hit in 2026! 🤖

English

1

105

Tony Zhao@tonyzzhao·12 Mar

We raised $165M at a $1.15B valuation to stop doing demos. 2026 is about 1) deployment and 2) research. We will start shipping Memo with our new frontier models in a few months. Our series-B is led by Coatue, with Thomas Laffont joining the board. ->🧵

English

114

102

1.5K

354.9K

Nishanth Kumar@nishanthkumar23·12 Mar

@chris_j_paxton Thanks for the signal boost Chris - this is a great way of putting things! We hope to keep working on finding the right tradeoff that will let us zero shot as much as possible with ideally also zero hand tuning :)

English

53

Chris Paxton@chris_j_paxton·12 Mar

Planning still very powerful for zero shot generalization, we just need the right ways to include learning so it doesn't need so much tuning.

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

2

20

2.7K

Nishanth Kumar@nishanthkumar23·12 Mar

Good question - I’d say this system scales in a different way than end to end learning. It can in some sense scale with more test time computation, but you’re right that it doesn’t as easily scale to deformable objects or things that are hard to simulate. Ultimately it will be important to incorporate end to end learning, or perhaps even replace this entire thing with an end to end system, but I think studying systems that reason and scale at test time like this one can give us some useful insights towards a path forward for manipulation in general!

English

169

Jerry Chéng@thejerrycheng·12 Mar

Cool work but how does it scale up compare to the end to end learning? Tasks like deformable object manipulation, bimanual manipulation would be difficult in this pipeline. Also if one of the module fails, would it still have any meaningful output? It might end up to be a big state machine in the end

English

0

1

201

Nishanth Kumar@nishanthkumar23·12 Mar

@Guanming717 Training a policy is definitely one way to do erasing, but it’s also possible to do it with a hybrid force-position controller (which is what we do here). For reference, see @RussTedrake excellent chapter on this here: manipulation.csail.mit.edu/force.html

English

0

1

141

Guanming Wang@Guanming717·12 Mar

@nishanthkumar23 Pick and place it makes sense as IK can baesd on the predicted final pose and give actions but for “earse” how does it understand the continuous movement without training data?

English

0

176

Nishanth Kumar@nishanthkumar23·12 Mar

@Sahitbot_iRL 🤖❤️

QME

1

50

Sahit Chintalapudi@Sahitbot_iRL·11 Mar

I learned a lot about what modern manipulation systems are capable of working on this project, and I had the pleasure of doing so along some incredible collaborators!

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

0

6

239

Nishanth Kumar@nishanthkumar23·12 Mar

@wenlong_huang Thanks so much Wenlong! Looking forward to more discussions + future work from you on integrated planning and learning!

English

1

120

Wenlong Huang@wenlong_huang·12 Mar

Planning is the test-time compute for robotics. Like AlphaGo and reasoning LLMs, it discovers solutions / behaviors beyond what’s in the data. Having seen the demos in-person, I’m still impressed by the generality of the system. It worked surprisingly well with various instructions I came up on the fly and with random in-the-wild objects, all zero-shot. The low-level controller developed by the team also has the best tracking I have seen on a Franka. Huge congratulations to the team!

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

2

6

54

6.8K

Nishanth Kumar@nishanthkumar23·12 Mar

@_jingcao Thank you for your consistent hard work, and for getting this entire system in sim + evaling it so quickly! I wish I was half as capable as you are when I was an undergrad :)

English

1

120

Jing Cao@_jingcao·12 Mar

Had so much fun working on this very cool system! Watch out for the code release :)

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

0

12

601

Nishanth Kumar@nishanthkumar23·12 Mar

@Bw_Li1024 Thanks for the kind words Bowen! Yes - it was very interesting to see that most failures were due to our grasping module - you're welcome to try it out and help us make it better when the code is released! 😉

English

0

1

59

Bowen Li@Bw_Li1024·12 Mar

Very impressive results. I really like the failure analysis in the thread, it seems that a big part to improve is the grasping failure (physical contact introduces sim-real difference), and maybe non-prehensile behaviors beyond a TAMP model’s definition :)

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

0

4

726

Nishanth Kumar@nishanthkumar23·11 Mar

@tomssilver Thanks so much for your comments + feedback on an earlier draft of this work! We can't wait for you to try it out super soon (we promise the code will be out ASAP!)

English

1

61

Tom Silver@tomssilver·11 Mar

Really excited about this and looking forward to trying it in our lab!

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

2

21

4.2K

Nishanth Kumar@nishanthkumar23·11 Mar

@skymanaditya1 haha - thanks for trying it out! Excited for some contributions of new robots/functionality from your work! :)

English

0

1

88

Aditya Agarwal@skymanaditya1·11 Mar

Probably one of the cleanest codebases in academia that works out of the box!

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

0

8

479

Nishanth Kumar@nishanthkumar23·11 Mar

@WillShenSaysHi Super fun working together! Excited for more to come soon! 🤖

English

1

71

William Shen@WillShenSaysHi·11 Mar

TiPToP was a fun project with great collaborators! We were surprised at how fast and general it is across objects, setups, and embodiments. Its limitations point toward combining end-to-end VLAs with planning and reasoning. Code coming soon! 🚀

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

2

21

1.6K

Nishanth Kumar@nishanthkumar23·11 Mar

@JorgeAMendez_ Thanks Jorge! Let us know any feedback, and we hope you'll try out the system for yourself! :)

English

1

44

Jorge Mendez-Mendez@JorgeAMendez_·11 Mar

Exciting work from Nishanth and other friends! In the direction of making TAMP faster and less reliant on hard-to-engineer abstractions. I look forward to reading this.

State-of-the-art robot policies often need hundreds of hours of data. What if we needed none? Introducing TiPToP: a manipulation system that zero-shots open-world tasks from pixels and language using vision foundation models and GPU-parallelized Task and Motion Planning (TAMP).

English

0

4

291

Nishanth Kumar@nishanthkumar23·11 Mar

@dylanjsam @OpenAI @zicokolter @andrew_ilyas @furongh congrats!! very exciting stuff - hope to catch up/hang out in SF soon :)

English

1

218

Dylan Sam@dylanjsam·11 Mar

I defended my PhD thesis! Also, a very (~4 month) late life update, but I've joined @OpenAI to work on safety research and pretraining safer language models! 📈 Thank you to my advisor @zicokolter and my committee: Matt Fredrikson, @andrew_ilyas, and @furongh! 🙏

English

25

8

218

20.6K

Nishanth Kumar@nishanthkumar23·11 Mar

We hope you'll try TiPToP out and consider contributing! While we're excited by TiPToP's current capabilities, we also feel there's so much more to be done (check out the website for a list of things to be worked on). 🌐 Project: tiptop-robot.github.io 📄 Paper: arxiv.org/abs/2603.09971 💻 Code (coming soon; we're working hard on making it easy-to-run!): github.com/tiptop-robot/t… TiPToP was a big team effort and wouldn't have been possible without @WillShenSaysHi, @sahitbot_irl, @JieWang_ZJUI, Christopher Watson, @edward_s_hu, @_jingcao, @dineshjayaraman, Leslie Pack Kaelbling, and Tomás Lozano-Pérez. Special thanks to the folks at Penn for their help with evaluation!

English

0

11

877

Nishanth Kumar@nishanthkumar23·11 Mar

TiPToP is far from perfect: - Open-loop execution → no recovery from failed grasps - Single-viewpoint perception → limited visibility - Lacks closed-loop reactivity of VLAs We view TiPToP as a test-time scaling and reasoning method that's ultimately complementary to large robot foundation models like VLAs. We're excited about future research to more tightly combine these paradigms!

English