Jikai Wang

9 posts

Jikai Wang

@JwRobotics

US Katılım Mart 2018

76 Takip Edilen80 Takipçiler

Jikai Wang retweetledi

Yu Xiang@YuXiang_IRVL·18 Nis

Great to have @Jesse_Y_Zhang visiting us @IRVLUTD today! He shared his journey toward generalist robotics reward models (RoboCLIP, ReWiND, Robometer), followed by a great buffet with the lab.

English

3.7K

Jikai Wang@JwRobotics·16 Nis

Glad to share our work on iTeach—leveraging mixed reality, eye gaze, and voice commands to enhance robot perception in real-world environments.

Jishnu Jaykumar Padalunkal@jishnu_jaykumar

🤖 Robots don't fail in the lab. They fail in the wild — clutter, occlusion, constantly changing environments. The real question: Can robots learn directly from these failures during deployment? How about teaching robots the way we'd teach a child — by showing them where they went wrong? 🧵👇

English

378

Jikai Wang retweetledi

Pablo Vela@pablovelagomez1·3 Nis

We have HOT3D! I've started using Claude to port more datasets into @rerundotio and exoego-forge. I'd been meaning to bring in the HOT3D dataset from Meta for a while, but with Claude, it's way easier. My goal is to take any egocentric, exocentric, or both datasets and ingest them into a standardized schema. Getting everything into Rerun means we can easily query and transform data via the in-memory OSS server. This lets us generate SQL-like queries such as: "Find me all frames that only contain left hands in the leftmost camera view." Most people think of Rerun as a viewer, but this is the actual superpower. So far we have: 1. HOT3D 2. Hocap 3. UmeTrack 4. Assembly101 5. EgoDex Planning to add more, and with every addition, it gets easier as we build up agent skills and better code examples. Hoping to make it almost fully automatic for adding new datasets. The next few I'm looking at are Harmony4D and Aria Pilot Gen2 After we have enough samples, I'll work on bringing in all the different algorithms I've worked on to transform the data 🙂

Pablo Vela@pablovelagomez1

Colmap 4.0 was very recently released, so it inspired me to do some work to better understand it and its new capabilities with @rerundotio. I want to really understand how Colmap, and in particular, pycolmap, works outside of just calling it via the CLI. So my goal is to use the low-level pycolmap API to log every part of the pipeline. The explicit goal is to have an alternative to the SQLite database that I can utilize. Instead of SQLite, I want to try logging everything directly to rerun and use RRD. This means I can have deep inspectability and still save the features/matches/2D view geometry, but be able to view it directly in rerun. I think this is one of the superpowers that rerun provides; data and visualizations are deeply integrated. As I'm often working with sequential data (videos), I'm going to specifically focus on four things: 1. Monocular Video Simple: Calls high-level APIs such as pycolmap.extract_features, pycolmap.match_sequential, pycolmap.incremental_mapping. These are basically identical to the CLI options and provide a good baseline. 2. Monocular Video Streamed: Take the above high-level APIs and break them down to their iterator version, logging each component in a streamed manner. This way, I can stream the intermediate features to rerun while the extraction/matching/mapping is happening. 3. Rig with unknown calibration: <- WHAT THE VIDEO SHOWS This is probably the most interesting version and the first one I've been working on. It allows one to set a rig between known sensors, such as in VR/AR devices, leading to much better reconstructions with multiple cameras. This is the case where we don't know the calibration a priori, so we have to run a reconstruction twice: once as a normal Colmap reconstruction with no rig constraints, use this to generate the constraints, and then do it again with the newly found rig. 4. Rig with known calibration: This is the RoboCap example, where we have a pre-calibrated set of sensors, so we don't need to run the two reconstructions and also gain better matching between cameras, both spatially and temporally. Again, this leads to a much better reconstruction! Along with all this, GLOMAP has become a first-class global mapper, making it super easy to use directly within pycolmap! I'm excited to do more with this and compare it to things like pycuvslam, vipe, and other alternatives.

English

110

35.8K

Jikai Wang@JwRobotics·4 Ara

It’s really excited to see the policy, trained entirely on simulation data, achieves robust zero-shot performance and outperforms teleoperators!

Haoru Xue@HaoruXue

Reality of robotics: humanoid kung fu is solved before they can open doors with RGB. Here we are. Introducing the frontier of sim2real at NVIDIA GEAR. 100% sim data. RGB input only. Code name: 𝗗𝗼𝗼𝗿𝗠𝗮𝗻. We are opening the sim-to-real door. doorman-humanoid.github.io 🧵

English

460

Jikai Wang retweetledi

Yu Xiang@YuXiang_IRVL·2 Ara

I won’t be at #NeurIPS2025, but Jikai will be in San Diego presenting our HO-Cap poster on 12/3, 4:30–7:30pm. Stop by and talk to him about human data for robot manipulation!

Yu Xiang@YuXiang_IRVL

Jikai Wang @JwRobotics (jwroboticsvision.github.io), who led our HO-Cap project, will be graduating next year. He’s looking for full-time roles in industry. If your team needs an expert in 3D hand-object interaction and robot simulation, please reach out to Jikai!

English

2.2K

Jikai Wang retweetledi

Yu Xiang@YuXiang_IRVL·2 Eki

Recognizing unseen objects is a challenge our lab keeps working on. This dataset from Amazon looks exciting to explore! Our latest work @parker_sean_L @ruosenli @jingliqiang6 @JwRobotics combines few-shot object detection with LLMs for visual grounding irvlutd.github.io/MultiGrounding/

Amazon Science@AmazonScience

Announcing Kaputt: a large-scale dataset for visual defect detection in retail logistics with 238,421 images across 48,376 unique items – 40x as large as current benchmarks:

English

7.8K

Jikai Wang retweetledi

Pablo Vela@pablovelagomez1·29 Eyl

It's finally done, I've finished ripping out my full-body pipeline and replaced it with a hands-only version. Critical to make it work in a lot more scenarios! I've visualized the final predictions with @rerundotio! I want to emphasize that these are not the ground-truth values provided by the wonderful HOCap dataset, but rather from my pipeline that was written from the ground up! For context, it consists of 4 parts 1. Exo/Ego camera estimation 2. Hand Shape Calibration 3. Per View 2D keypoint estimation 4. Hand Pose Optimization At the end of it all, I have a pipeline where you input synchronized videos and this outputs full tracked per-view 2D keypoints, bounding boxes, 3D keypoints, MANO joint angles + hand shape! Really happy with how it looks so far, but this is far from ideal. 1. Not even close to real time, this 30-second 8-view sequence took nearly 5 minutes to process on my 5090 GPU 2. 8 views is WAY too many and unscalable, I'm convinced this can be done with far fewer (2 exo + 1 stereo ego) 3. Interacting hands causes lots of issues, and the pipeline is very fragile when there's no clear delineation between hands Still, I'm quite happy with how it's going so far. Currently, I have a reasonable set of datasets to validate, a performant baseline, and an annotation app to correct inaccurate predictions. From here, the focus will be more on the egocentric side!

Pablo Vela@pablovelagomez1

If you're not labeling your own data, you're NGMI. I take this seriously, so I finished building the first version of my hand-tracking annotation app using @rerundotio and @Gradio. The combination of Rerun's callback system and Gradio integration enables a highly customizable and powerful labeling app. It supports multiple views, 2D and 3D, and maintains time synchronization! The only input required is a zip file containing two or more multiview MP4 files. I handle everything else automatically. This application works with both egocentric (first-person) and exocentric (third-person) videos. Networks will occasionally make mistakes, so having the ability to correct them manually is crucial. This is a significant step towards robust and powerful hand tracking, which will provide excellent training data for robot dexterous manipulation. The next step involves leveraging Rerun's recent updates, particularly the multisink support. Changes are saved directly to a file in .rrd format, easily extractable since the underlying representation is PyArrow. This can be converted to Pandas, Polars, or DuckDB. This tight integration between visuals, predictions, and data is crucial to ensure your data is precisely what you expect it to be.

English

10.7K

Jikai Wang retweetledi

Yu Xiang@YuXiang_IRVL·20 Ara

Introducing HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction. We built a multi-camera system and a semi-automatic method for annotating the shape and pose of hands and objects Project page: irvlutd.github.io/HOCap/

English

232

57.2K

Jikai Wang@JwRobotics·30 Haz

@sydneytheshort @UT_Dallas congrats

English

Keşfet

@Jesse_Y_Zhang @IRVLUTD @rerundotio @parker_sean_L @ruosenli @jingliqiang6 @UT_Dallas @elonmusk