Raven Huang

33 posts

Raven Huang

Raven Huang

@RavenHuang4

https://t.co/oPMVub01XG

Katılım Eylül 2021
98 Takip Edilen149 Takipçiler
Raven Huang retweetledi
Chengshu Li
Chengshu Li@ChengshuEricLi·
We are excited to release MoMaGen, a data generation method for multi-step bimanual mobile manipulation. MoMaGen turns 1 human-teleoped robot trajectory into 1000s of generated trajectories automatically.🚀 Website: momagen.github.io arXiv: arxiv.org/abs/2510.18316
English
1
41
167
48.3K
Raven Huang retweetledi
Fei-Fei Li
Fei-Fei Li@drfeifei·
(1/N) How close are we to enabling robots to solve the long-horizon, complex tasks that matter in everyday life? 🚨 We are thrilled to invite you to join the 1st BEHAVIOR Challenge @NeurIPS 2025, submission deadline: 11/15. 🏆 Prizes: 🥇 $1,000 🥈 $500 🥉 $300
English
41
282
1.4K
451.9K
Raven Huang
Raven Huang@RavenHuang4·
@gordonhu608 Thanks for your reply! So each room has a corresponding memory entry?
English
1
0
0
40
Wenbo Hu
Wenbo Hu@gordonhu608·
@RavenHuang4 Thanks for your interest! The environments include the whole scene (multi-rooms). We keep a temporal exploration order of the explored rooms, when the agent revisits a room, that room's memory is replaced with new observations after the agent's interaction.
English
1
0
1
267
Wenbo Hu
Wenbo Hu@gordonhu608·
🤔How to maintain a long-term memory for a 3D embodied AI agent across dynamic spatial-temporal environment changes in complex tasks? 🚀Introducing 3DLLM-Mem, a memory-enhanced 3D embodied agent that incrementally builds and maintains a task-relevant long-term memory while it explores and incorporates feedback from the environment. More demos in our website. Project: 3dllm-mem.github.io Paper: arxiv.org/abs/2505.22657 #LLMs #VLMs #Multimodal #3D #memory #AgenticAI
Wenbo Hu tweet media
English
5
21
82
46.6K
Mingxuan Wu
Mingxuan Wu@jackwal97390450·
Introducing POD ! Predict-Optimize-Distill : A Self-Improving Cycle for 4D Object Understanding ! Inputs: a multi-view scan of an object + casually captured, long-form human interaction monocular videos (from your phone) ! Outputs: 3D part poses over time .
English
6
14
95
11.9K
Raven Huang
Raven Huang@RavenHuang4·
Can we track object part motions from a monocular video? Check out POD! With an object scan and a monocular video, we can learn an object configuration model. This could be useful for reconstructing articulated objects for robot learning.
Mingxuan Wu@jackwal97390450

Introducing POD ! Predict-Optimize-Distill : A Self-Improving Cycle for 4D Object Understanding ! Inputs: a multi-view scan of an object + casually captured, long-form human interaction monocular videos (from your phone) ! Outputs: 3D part poses over time .

English
0
0
8
973
Raven Huang
Raven Huang@RavenHuang4·
Can we scale up robot data collection without a robot? We propose a pipeline to scale robot dataset from one human demonstration. Through a real2render2real pipeline, policies trained with the generated data can be deployed directly on a real robot.
Max Fu@letian_fu

Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models as-is. Introducing: Real2Render2Real 👉 real2render2real.com

English
0
1
15
1.4K
Raven Huang retweetledi
Fangchen Liu
Fangchen Liu@fangchenliu_·
1/N Most Vision-Language-Action models need tons of data for finetuning, and still fail for new objects and instructions. Introducing OTTER, a lightweight, easy-to-train model that uses text-aware visual features to nail unseen tasks out of the box! Here's how it works 👇
English
12
64
308
68.2K
Raven Huang retweetledi
Max Fu
Max Fu@letian_fu·
Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. icrt.dev [1/N]
English
3
58
300
40.4K
Raven Huang retweetledi
Max Fu
Max Fu@letian_fu·
Can vision and language models be extended to include touch? Yes! We will present a new touch-vision-language dataset collected in the wild and Touch-Vision-Language Models (TVLMs) trained on this dataset at #ICML2024. 🙌 1/6 tactile-vlm.github.io
Max Fu tweet media
English
7
41
145
21.2K
Raven Huang retweetledi
Kaushik Shivakumar
Kaushik Shivakumar@19kaushiks·
Wouldn’t it be nice if ChatGPT could find your missing keys for you? Our latest research from @berkeley_ai + @GoogleAI suggests that robots can use large language models (LLMs) to find hidden objects faster. 🧵👇
English
5
34
161
58.5K
Raven Huang
Raven Huang@RavenHuang4·
We evaluate the policy learned on the combination of simulated and physical data over 3 cables on 16 uniformly sampled targets with 7 targets outside the robot workspace. For all 3 cables, the policy is able to achieve median error under 15% of the cable length. (7/8)
Raven Huang tweet media
English
1
0
1
0
Raven Huang
Raven Huang@RavenHuang4·
Planar Robot Casting for deformable materials aims to achieve a desired final state from one dynamic launching action. Our work from @AUTOLab_Cal @Berkeley_AI learn it using a self-supervised “Real2Sim2Real” framework. Data, paper, and presentation: tinyurl.com/robotcast (1/8)
English
2
7
20
0