Haoran Geng

81 posts

Haoran Geng

Haoran Geng

@HaoranGeng2

CS PhD at @Berkeley_AI. Prev: @Stanford, @PKU1898. Robotics, RL, 3D Vision

Katılım Ağustos 2021
261 Takip Edilen2.1K Takipçiler
Sabitlenmiş Tweet
Haoran Geng
Haoran Geng@HaoranGeng2·
🤖 What if a humanoid robot could make a hamburger from raw ingredients—all the way to your plate? 🔥 Excited to announce ViTacFormer: our new pipeline for next-level dexterous manipulation with active vision + high-resolution touch. 🎯 For the first time ever, we demonstrate ~2.5 minutes of continuous, autonomous control—combining active vision, high-res touch, and high-DoF robot hands SharpaWave — to complete complex, real-world tasks. Code is fully released; check out our: Homepage: roboverseorg.github.io/ViTacFormerPag… Paper link: arxiv.org/abs/2506.15953 Github: github.com/RoboVerseOrg/V…
English
11
120
486
94.5K
Haoran Geng retweetledi
Siyuan Huang
Siyuan Huang@siyuanhuang95·
Humanoid robots shouldn't just follow pre-defined movements—they should perform! 🤖✨ Introducing UniAct: A unified model for multimodal motion generation and action streaming. Most humanoids are limited to pre-designed moves. UniAct changes the game by allowing robots to generate live action sequences from: 📝 Text instructions 🎶 Music rhythms 📉 Spatial trajectories 🔄 Cross-modal signals Whether it’s dancing to a beat or following a complex path, UniAct brings humanoid robots to life in real-time. 🚀 🔗 Project: jnnan.github.io/uniact 📄 Paper: arxiv.org/abs/2512.24321
English
5
28
189
10.2K
Haoran Geng
Haoran Geng@HaoranGeng2·
Excited to see what @SharpaRobotics has achieved recently. Proud that we shaped the ViTacFormer prototype—even before the hand was fully product-ready, we were already making a hamburger (and doing many other tasks). My takeaway: how you fuse vision + touch matters a lot. With clean, high-quality teleop data, you can get impressive generalization with even ~50 demos.
Simon Kalouche@simonkalouche

This is the most dexterous task I’ve seen a humanoid do so far. Fully autonomous powered by Sharpa’s CraftNet (VTLA) — using tactile feedback to continuously fine-tune the last-millimeter interaction.

English
9
51
355
41.2K
Haoran Geng retweetledi
AK
AK@_akhaliq·
Large Video Planner Enables Generalizable Robot Control
Español
3
8
47
22K
Haoran Geng
Haoran Geng@HaoranGeng2·
@karanjagtiani04 Human data is actually much more scalable than robot data. Large part of our training data are in the wild human data, which i believe is quite scalable.
English
0
0
2
477
Karan Jagtiani
Karan Jagtiani@karanjagtiani04·
@HaoranGeng2 Impressive use of video data for robot planning. Curious about scalability-how do you handle potential data bottlenecks?
English
1
0
1
604
Haoran Geng
Haoran Geng@HaoranGeng2·
This might be my "aha moment" of 2025: With our new robotics foundation model, Large Video Planner, we train a robot planner from large-scale video data. It works so well that we can use it directly for robot planning. Two moments really blew my mind: First: right after our model training, I fed in an image of my hand and my MacBook and asked it to close the laptop—when the Apple logo appeared exactly as the lid came down, I couldn’t help but feel impressed (and excited). Second demo: picking up the brush — check the 3D consistency. Even the brush shadow is remarkably accurate, and it can even infer what the Franka arm (at the corner) should look like.
English
18
30
290
27.4K
Haoran Geng
Haoran Geng@HaoranGeng2·
@bruk_phi We actually use a lot of human dataset, large part of the training dataset are human data, and we also use human hand as an intermediate representation for robot planning.
English
0
0
1
423
Bruk
Bruk@bruk_phi·
@HaoranGeng2 what would be the advantage over using human datasets instead
English
1
0
0
450
Haoran Geng
Haoran Geng@HaoranGeng2·
Because of the model’s remarkably accurate 3D consistency and human-hand prediction, we can use it directly for robotics planning—and it already pulls off a bunch of amazing dexterous tasks: zero-shot and, in some cases, first-ever. It also makes me believe even more that human-centric, in-the-wild data has huge potential to further boost robotics pretraining.
English
3
2
16
3.1K
Haoran Geng
Haoran Geng@HaoranGeng2·
Been working on this for a long time—Large Video Planner is finally out. We found that large-scale video pretraining can actually teach models surprisingly accurate physics. With strong camera consistency and realistic hand shaping, we can use the model directly for robotics planning. When I tried using our dexterous hand to pull off extremely challenging zero-shot tasks—often for the first time—made it clear just how capable this model is.
Boyuan Chen@BoyuanChen0

Introducing Large Video Planner (LVP-14B) — a robot foundation model that actually generalizes. LVP is built on video gen, not VLA. As my final work at @MIT, LVP has all its eval tasks proposed by third parties as a maximum stress test, but it excels!🤗 boyuan.space/large-video-pl…

English
4
6
52
10.8K
Carlo Sferrazza
Carlo Sferrazza@carlo_sferrazza·
Excited to share that I'll be joining @UTAustin in Fall 2026 as an Assistant Professor with @utmechengr @texas_robotics! I'm looking for PhD students interested in humanoids, dexterous manipulation, tactile sensing, and robot learning in general -- consider applying this cycle!
Carlo Sferrazza tweet media
English
49
38
464
54K
Haoran Geng retweetledi
Siyuan Huang
Siyuan Huang@siyuanhuang95·
🎉🎉🎉 We won the champion in the solo dance contest at the first World Humanoid Robot Games, partner with @UnitreeRobotics ! Here is the full video! Training the robot to perform a long-term dancing (2:30 mins) with stability, smoothness, and agility is much more challenging than we expected. The robot needs to dance with the rhythm, keep global position, move dynamically and cannot fall. You cannot cherry pick on the playing field. More technical details will be released in the future.
English
42
96
575
116.5K
Qianqian Wang
Qianqian Wang@QianqianWang5·
📢Thrilled to share that I'll be joining Harvard and the Kempner Institute as an Assistant Professor starting Fall 2026! I'll be recruiting students this year for the Fall 2026 admissions cycle. Hope you apply!
Kempner Institute at Harvard University@KempnerInst

We are thrilled to share the appointment of @QianqianWang5 as an #KempnerInstitute Investigator! She will bring her expertise in computer vision to @Harvard. Read the announcement: bit.ly/4mIghHy @hseas #AI #ComputerVision

English
101
43
748
112K
Haoran Geng
Haoran Geng@HaoranGeng2·
Thank you so much, @adcock_brett, for featuring our new work, ViTacFormer! Generalizable and robust manipulation remains a long-standing and challenging goal in robot learning—we’re excited to keep pushing the boundaries. More exciting things are on the way—stay tuned! 🚀
Brett Adcock@adcock_brett

@OfficialLoganK UC Berkeley researchers introduced ViTacFormer, a unified visuo-tactile pipeline for robot manipulation It fuses high-resolution visual and tactile data using cross-attention and enables multi-fingered hands to perform precise, long-horizon tasks

English
3
10
105
8.9K
Haoran Geng retweetledi
Haoran Geng retweetledi
Haoran Geng
Haoran Geng@HaoranGeng2·
We then explored the full capabilities of our system—and found it can handle super long-horizon tasks end-to-end. 🍔 Sit back and enjoy the hamburger-making policy in action!
English
1
0
11
1.6K
Haoran Geng
Haoran Geng@HaoranGeng2·
🤖 What if a humanoid robot could make a hamburger from raw ingredients—all the way to your plate? 🔥 Excited to announce ViTacFormer: our new pipeline for next-level dexterous manipulation with active vision + high-resolution touch. 🎯 For the first time ever, we demonstrate ~2.5 minutes of continuous, autonomous control—combining active vision, high-res touch, and high-DoF robot hands SharpaWave — to complete complex, real-world tasks. Code is fully released; check out our: Homepage: roboverseorg.github.io/ViTacFormerPag… Paper link: arxiv.org/abs/2506.15953 Github: github.com/RoboVerseOrg/V…
English
11
120
486
94.5K