Raven Huang

33 posts

Raven Huang

@RavenHuang4

https://t.co/oPMVub01XG

Katılım Eylül 2021

98 Takip Edilen149 Takipçiler

Raven Huang@RavenHuang4·1 Nis

Check out how coding agents can solve robotics tasks!

Max Fu@letian_fu

Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability. From @NVIDIA @Berkeley_AI @CMU_Robotics @StanfordAILab capgym.github.io 🧵

English

1.3K

Raven Huang retweetledi

Chengshu Li@ChengshuEricLi·23 Eki

We are excited to release MoMaGen, a data generation method for multi-step bimanual mobile manipulation. MoMaGen turns 1 human-teleoped robot trajectory into 1000s of generated trajectories automatically.🚀 Website: momagen.github.io arXiv: arxiv.org/abs/2510.18316

English

167

48.3K

Raven Huang retweetledi

Fei-Fei Li@drfeifei·2 Eyl

(1/N) How close are we to enabling robots to solve the long-horizon, complex tasks that matter in everyday life? 🚨 We are thrilled to invite you to join the 1st BEHAVIOR Challenge @NeurIPS 2025, submission deadline: 11/15. 🏆 Prizes: 🥇 $1,000 🥈 $500 🥉 $300

English

282

1.4K

451.9K

Raven Huang@RavenHuang4·3 Haz

@gordonhu608 Thanks for your reply! So each room has a corresponding memory entry?

English

Wenbo Hu@gordonhu608·30 May

@RavenHuang4 Thanks for your interest! The environments include the whole scene (multi-rooms). We keep a temporal exploration order of the explored rooms, when the agent revisits a room, that room's memory is replaced with new observations after the agent's interaction.

English

267

Wenbo Hu@gordonhu608·29 May

🤔How to maintain a long-term memory for a 3D embodied AI agent across dynamic spatial-temporal environment changes in complex tasks? 🚀Introducing 3DLLM-Mem, a memory-enhanced 3D embodied agent that incrementally builds and maintains a task-relevant long-term memory while it explores and incorporates feedback from the environment. More demos in our website. Project: 3dllm-mem.github.io Paper: arxiv.org/abs/2505.22657 #LLMs #VLMs #Multimodal #3D #memory #AgenticAI

English

46.6K

Raven Huang@RavenHuang4·30 May

@kacperkan1 @jackwal97390450 @justkerrding @ChungMinKim @AnthonyZhang123 @brenthyi @akanazawa Thank you for the question. The biggest difference would be that we use monocular videos while watch it move uses multi-view inputs. Watch it move also considers a structure prior, which we think might be useful but could be challenging to obtain.

English

Kacper Kania@kacperkan1·21 May

@jackwal97390450 @RavenHuang4 @justkerrding @ChungMinKim @AnthonyZhang123 @brenthyi @akanazawa Great work, folks! Out of curiosity: Have you tried comparing it to watch-it-move arxiv.org/abs/2112.11347? Their method won't work for real captures, although it may be useful in the synthetic setting.

English

Mingxuan Wu@jackwal97390450·19 May

Introducing POD ! Predict-Optimize-Distill : A Self-Improving Cycle for 4D Object Understanding ! Inputs: a multi-view scan of an object + casually captured, long-form human interaction monocular videos (from your phone) ! Outputs: 3D part poses over time .

English

11.9K

Raven Huang@RavenHuang4·20 May

Can we track object part motions from a monocular video? Check out POD! With an object scan and a monocular video, we can learn an object configuration model. This could be useful for reconstructing articulated objects for robot learning.

Mingxuan Wu@jackwal97390450

English

973

Raven Huang@RavenHuang4·16 May

Can we scale up robot data collection without a robot? We propose a pipeline to scale robot dataset from one human demonstration. Through a real2render2real pipeline, policies trained with the generated data can be deployed directly on a real robot.

Max Fu@letian_fu

Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models as-is. Introducing: Real2Render2Real 👉 real2render2real.com

English

1.4K

Raven Huang retweetledi

Fangchen Liu@fangchenliu_·14 Mar

1/N Most Vision-Language-Action models need tons of data for finetuning, and still fail for new objects and instructions. Introducing OTTER, a lightweight, easy-to-train model that uses text-aware visual features to nail unseen tasks out of the box! Here's how it works 👇

English

308

68.2K

Raven Huang@RavenHuang4·18 Oca

@antoniloq You look so happy 😆

English

Antonio Loquercio@antoniloq·17 Oca

First class students (and teachers) :)

Penn Electrical and Systems Engineering@ESEatPenn

Professor Antonio Loquercio and his students were all smiles during the first ESE 6800 class of the semester, which was held in the new Amy Gutmann Hall!✏️ ESE wishes all students, teachers, and staff a fantastic semester!📚

English

Raven Huang retweetledi

Max Fu@letian_fu·29 Ağu

Vision-language models perform diverse tasks via in-context learning. Time for robots to do the same! Introducing In-Context Robot Transformer (ICRT): a robot policy that learns new tasks by prompting with robot trajectories, without any fine-tuning. icrt.dev [1/N]

English

300

40.4K

Raven Huang retweetledi

Max Fu@letian_fu·9 Tem

Can vision and language models be extended to include touch? Yes! We will present a new touch-vision-language dataset collected in the wild and Touch-Vision-Language Models (TVLMs) trained on this dataset at #ICML2024. 🙌 1/6 tactile-vlm.github.io

English

145

21.2K

Raven Huang@RavenHuang4·21 Şub

You can find more here tactile-vlm.github.io

English

169

Raven Huang@RavenHuang4·21 Şub

Thank you@_akhaliq for sharing! In this work, we collect a large dataset of aligned vision and tactile pairs with human labeled and GPT-4V generated tactile semantic descriptions. We train a multi-modal model to align these three modalities with a 2-stage training recipe.

AK@_akhaliq

paper page: huggingface.co/papers/2402.13…

English

386

Raven Huang retweetledi

Kaushik Shivakumar@19kaushiks·7 Mar

Wouldn’t it be nice if ChatGPT could find your missing keys for you? Our latest research from @berkeley_ai + @GoogleAI suggests that robots can use large language models (LLMs) to find hidden objects faster. 🧵👇

English

161

58.5K

Raven Huang@RavenHuang4·9 Kas

This work was supported in part by @ToyotaResearch w/ @RavenHuang4, @vincentkslim, Lawrence Yunliang Chen, Jonathan Wang, @jeff_ichnowski, Daniel Seita (cs.cmu.edu/~dseita/), @Michaellaskey7, and @Ken_Goldberg. (8/8)

GIF

English

Raven Huang@RavenHuang4·9 Kas

We evaluate the policy learned on the combination of simulated and physical data over 3 cables on 16 uniformly sampled targets with 7 targets outside the robot workspace. For all 3 cables, the policy is able to achieve median error under 15% of the cable length. (7/8)

English

Raven Huang@RavenHuang4·9 Kas

Planar Robot Casting for deformable materials aims to achieve a desired final state from one dynamic launching action. Our work from @AUTOLab_Cal @Berkeley_AI learn it using a self-supervised “Real2Sim2Real” framework. Data, paper, and presentation: tinyurl.com/robotcast (1/8)

English

Keşfet

@gordonhu608 @kacperkan1 @jackwal97390450 @justkerrding @ChungMinKim @AnthonyZhang123 @brenthyi @akanazawa