Haoran Geng

8

47

22K

Haoran Geng@HaoranGeng2·6 Oca

@karanjagtiani04 Human data is actually much more scalable than robot data. Large part of our training data are in the wild human data, which i believe is quite scalable.

English

2

477

Karan Jagtiani@karanjagtiani04·6 Oca

@HaoranGeng2 Impressive use of video data for robot planning. Curious about scalability-how do you handle potential data bottlenecks?

English

0

1

604

Haoran Geng@HaoranGeng2·6 Oca

This might be my "aha moment" of 2025: With our new robotics foundation model, Large Video Planner, we train a robot planner from large-scale video data. It works so well that we can use it directly for robot planning. Two moments really blew my mind: First: right after our model training, I fed in an image of my hand and my MacBook and asked it to close the laptop—when the Apple logo appeared exactly as the lid came down, I couldn’t help but feel impressed (and excited). Second demo: picking up the brush — check the 3D consistency. Even the brush shadow is remarkably accurate, and it can even infer what the Franka arm (at the corner) should look like.

English

18

30

290

27.4K

Haoran Geng@HaoranGeng2·6 Oca

@bruk_phi We actually use a lot of human dataset, large part of the training dataset are human data, and we also use human hand as an intermediate representation for robot planning.

English

1

423

Bruk@bruk_phi·6 Oca

@HaoranGeng2 what would be the advantage over using human datasets instead

English

0

450

Haoran Geng@HaoranGeng2·6 Oca

Also check more intro here: x.com/BoyuanChen0/st… Do try our model and check more information here: boyuan.space/large-video-pl…

Boyuan Chen@BoyuanChen0

Introducing Large Video Planner (LVP-14B) — a robot foundation model that actually generalizes. LVP is built on video gen, not VLA. As my final work at @MIT, LVP has all its eval tasks proposed by third parties as a maximum stress test, but it excels!🤗 boyuan.space/large-video-pl…

English

1

14

2.6K

Haoran Geng@HaoranGeng2·6 Oca

Because of the model’s remarkably accurate 3D consistency and human-hand prediction, we can use it directly for robotics planning—and it already pulls off a bunch of amazing dexterous tasks: zero-shot and, in some cases, first-ever. It also makes me believe even more that human-centric, in-the-wild data has huge potential to further boost robotics pretraining.

English

2

16

3.1K

Haoran Geng@HaoranGeng2·6 Oca

Been working on this for a long time—Large Video Planner is finally out. We found that large-scale video pretraining can actually teach models surprisingly accurate physics. With strong camera consistency and realistic hand shaping, we can use the model directly for robotics planning. When I tried using our dexterous hand to pull off extremely challenging zero-shot tasks—often for the first time—made it clear just how capable this model is.

Boyuan Chen@BoyuanChen0

Introducing Large Video Planner (LVP-14B) — a robot foundation model that actually generalizes. LVP is built on video gen, not VLA. As my final work at @MIT, LVP has all its eval tasks proposed by third parties as a maximum stress test, but it excels!🤗 boyuan.space/large-video-pl…

English

4

6

52

10.8K

Haoran Geng retweetledi

Lily Zhang ✨@lily_gpupoor·24 Eki

🏆 Announcing #IROS RoboGen Best Open Source Award: RoboVerse @FeishiWang @siyuanhuang95 @yuewang314

English

6

13

5.4K

Haoran Geng@HaoranGeng2·24 Eyl

@carlo_sferrazza @UTAustin @utmechengr @texas_robotics Congrats Carlo!!!🔥

Español

0

1

285

Carlo Sferrazza@carlo_sferrazza·23 Eyl

Excited to share that I'll be joining @UTAustin in Fall 2026 as an Assistant Professor with @utmechengr @texas_robotics! I'm looking for PhD students interested in humanoids, dexterous manipulation, tactile sensing, and robot learning in general -- consider applying this cycle!

English

49

38

464

54K

Haoran Geng@HaoranGeng2·19 Ağu

Huge thanks to all the contributors and the great RoboVerse team: github.com/RoboVerseOrg/R… All our code is open-source at github.com/RoboVerseOrg/R…

English

1

957

Haoran Geng retweetledi

Siyuan Huang@siyuanhuang95·17 Ağu

🎉🎉🎉 We won the champion in the solo dance contest at the first World Humanoid Robot Games, partner with @UnitreeRobotics ! Here is the full video! Training the robot to perform a long-term dancing (2:30 mins) with stability, smoothness, and agility is much more challenging than we expected. The robot needs to dance with the rhythm, keep global position, move dynamically and cannot fall. You cannot cherry pick on the playing field. More technical details will be released in the future.

English

42

96

575

116.5K

Haoran Geng@HaoranGeng2·15 Ağu

@QianqianWang5 Congrats!!!

English

Kempner Institute at Harvard University@KempnerInst

0

404

Qianqian Wang@QianqianWang5·15 Ağu

📢Thrilled to share that I'll be joining Harvard and the Kempner Institute as an Assistant Professor starting Fall 2026! I'll be recruiting students this year for the Fall 2026 admissions cycle. Hope you apply!

We are thrilled to share the appointment of @QianqianWang5 as an #KempnerInstitute Investigator! She will bring her expertise in computer vision to @Harvard. Read the announcement: bit.ly/4mIghHy @hseas #AI #ComputerVision

English

101

43

748

112K

Haoran Geng retweetledi

Ronald van Loon@Ronald_vanLoon·19 Tem

Introducing ViTacFormer: Next-Level Dexterous Manipulation with Active Vision and High-Resolution Touch by @HaoranGeng2 #AI #Robotics #MachineLearning #ArtificialIntelligence #ML #Innovation cc: @sallyeaves @amuellerml @marcusborba

English

🤖 What if a humanoid robot could make a hamburger from raw ingredients—all the way to your plate? 🔥 Excited to announce ViTacFormer: our new pipeline for next-level dexterous manipulation with active vision + high-resolution touch. 🎯 For the first time ever, we demonstrate ~2.5 minutes of continuous, autonomous control—combining active vision, high-res touch, and high-DoF robot hands SharpaWave — to complete complex, real-world tasks. Code is fully released; check out our: Homepage: roboverseorg.github.io/ViTacFormerPag… Paper link: arxiv.org/abs/2506.15953 Github: github.com/RoboVerseOrg/V…

5

16

2.5K

Haoran Geng@HaoranGeng2·14 Tem

@adcock_brett FYI, check out our original post here: x.com/HaoranGeng2/st…

Haoran Geng@HaoranGeng2

English

Brett Adcock@adcock_brett

2

1.2K

Haoran Geng@HaoranGeng2·14 Tem

Thank you so much, @adcock_brett, for featuring our new work, ViTacFormer! Generalizable and robust manipulation remains a long-standing and challenging goal in robot learning—we’re excited to keep pushing the boundaries. More exciting things are on the way—stay tuned! 🚀

@OfficialLoganK UC Berkeley researchers introduced ViTacFormer, a unified visuo-tactile pipeline for robot manipulation It fuses high-resolution visual and tactile data using cross-attention and enables multi-fingered hands to perform precise, long-horizon tasks

English

🤖 What if a humanoid robot could make a hamburger from raw ingredients—all the way to your plate? 🔥 Excited to announce ViTacFormer: our new pipeline for next-level dexterous manipulation with active vision + high-resolution touch. 🎯 For the first time ever, we demonstrate ~2.5 minutes of continuous, autonomous control—combining active vision, high-res touch, and high-DoF robot hands SharpaWave — to complete complex, real-world tasks. Code is fully released; check out our: Homepage: roboverseorg.github.io/ViTacFormerPag… Paper link: arxiv.org/abs/2506.15953 Github: github.com/RoboVerseOrg/V…

10

105

8.9K

Haoran Geng retweetledi

Jitendra MALIK@JitendraMalikCV·10 Tem

Again the power of tactile sensing and multi-finger hands comes through. This is the future of dexterous manipulation!

Haoran Geng@HaoranGeng2

English

🤖 What if a humanoid robot could make a hamburger from raw ingredients—all the way to your plate? 🔥 Excited to announce ViTacFormer: our new pipeline for next-level dexterous manipulation with active vision + high-resolution touch. 🎯 For the first time ever, we demonstrate ~2.5 minutes of continuous, autonomous control—combining active vision, high-res touch, and high-DoF robot hands SharpaWave — to complete complex, real-world tasks. Code is fully released; check out our: Homepage: roboverseorg.github.io/ViTacFormerPag… Paper link: arxiv.org/abs/2506.15953 Github: github.com/RoboVerseOrg/V…

8

74

9.6K

Haoran Geng retweetledi

Humanoid Scott@GoingBallistic5·9 Tem

There is a raging debate over sensory modes and redundancy How much is enough and is sensory overload an issue Perhaps the key is redundant modes are the fabric that hold actions together to solve long term planning Think fascia

Haoran Geng@HaoranGeng2

English

2

6

24

12.1K

Haoran Geng@HaoranGeng2·8 Tem

This work wouldn’t have been possible without the incredible support from great collaborators @cosm_para13983 and @KaifengZhang4, and my amazing advisors @JitendraMalikCV and @pabbeel. Thank you all! 🙏 Code is fully released; check out our: Homepage: roboverseorg.github.io/ViTacFormerPag… Paper link: arxiv.org/abs/2506.15953 Github: github.com/RoboVerseOrg/V…

English

1

6

1.9K

Haoran Geng@HaoranGeng2·8 Tem

We then explored the full capabilities of our system—and found it can handle super long-horizon tasks end-to-end. 🍔 Sit back and enjoy the hamburger-making policy in action!

English