OpenGraph Labs 🧤

70 posts

OpenGraph Labs 🧤

@OpenGraph_Labs

Building multimodal data infrastructure

🌍 Katılım Şubat 2025

61 Takip Edilen2.7K Takipçiler

OpenGraph Labs 🧤 retweetledi

Jerry Han@JerryHan_og·13 Nis

x.com/i/article/2043…

ZXX

6.6K

OpenGraph Labs 🧤 retweetledi

Junfan Zhu 朱俊帆 ✈️ CVPR@junfanzhu98·29 Mar

📖Robotics World Model Reading Club #01 Summary @BostonDynamics, @Stanford, @AGIBOTofficial, @intbotai, @BytedanceTalk, @Google, @moonlake, @Rivian, @Meta, @Samsung, @UCBerkeley, @Cruise, @encord_team, @ManycoreTech, @OpenGraph_Labs, @neuralmotion, @AMD, @nvidia, @oysterecosystem, @Zoom, @FusionFundVC, @BoostVC, @yzilabs... policy learning→WM VLA: observation→action WAM: latent world→future trajectory→controllable action →Shift=reactive mapping→controllable simulation @nvidia Gr00t (7B, high mem efficiency on Thor)≈DreamDojo-style WAM. Bottleneck is NOT scale, but missing unified interface across perception–geometry–physics–action. 🧠 Representation Pixel space is redundant & non-geometric. Trend→Explicit 3D backbone: point cloud/mesh object+sub-object representations geometry-aware tracking (contact, affordance) Point-flow pipeline: detect→sample keypoint→track→dynamic graph Core tradeoff=which points&density (motion saliency/affordance attn) 🌍 4D Reconstructi→Unified Latent @GoogleDeepMind D4RT encodes video→temporally consistent latent field: geometry+motion+visibility unified Outputs: point clouds, 3D tracks, full reconstruct (300× faster) ❗Gap: no shared latent across: vision/geometry/semantics/action/physics ⚙️ Physics Gap Sim2Real Gap=physics, not vision: discontinuous contact deformable objects (∞ DoF) non-differentiable friction Engineering fails: brittle collision meshes, unstable contact Solutions: learned physics proxy hybrid pipeline convex decomposition (geometry → collision proxy, ~5× speedup) 🎥 Video Pretrain≠Interaction Video=strong prior but no counterfactuals Missing: force, depth, tactile, proprioception →can't answer: what if act differently ⏱️ Control≠Inference Real world=high-freq loop action chunking latent action FastWAM (train with rollout, infer without) KV-cache (AutoGaze) 👉control selects feasible trajectory, not full future modeling Thor is good, but LLM scaling≠robotics scaling 📉 Data No “robotics internet”: sim/video/teleop/factory logs fragmented no unified labeling or metrics Reality: factories use fixed primitives generalization often unnecessary Bitter lesson: data flywheel>pipelines (but robotics lacks one) 🦾 Embodiment Gap manipulation→full-body intelligence loco-manipulation+gaze+coordination Need cross-embodiment align (space, action, kinematics) 🔁 Sim2Real Pipeline human data→semantics→geometry→collision proxy→sim→fine-tuning Unsolved: deformables, contact stability, long horizon 🧩 Paper VQVAE (discrete latent) VL-JEPA (predictive align) token pruning (efficiency) recursive models (depth reuse) multi-path exploration (GRPO) ⚡ Infra→SLM Real-time stack (LLM infra too slow) →WM must compress into SLMs Future=small, domain-specialized, grounded models 🧪Bottlenecks no unified representation no data flywheel inference–control mismatch physics fragmented embodiment Reality can't be scraped like internet. It must be sensed, interacted, simulated. 👉 Goal: jointly optimize representation+simulation+action under physics constraints 💡minimal sufficient representation? can video DiT become WAM? vertical SLM inevitable? robotics ImageNet moment?

Junfan Zhu 朱俊帆 ✈️ CVPR@junfanzhu98

x.com/i/article/2038…

San Francisco, CA 🇺🇸 English

321

61.1K

OpenGraph Labs 🧤@OpenGraph_Labs·19 Mar

Excited to share that @OpenGraph_Labs has been accepted into @NVIDIA’s Inception Program 🚀 Our mission is to build reliable infrastructure for multimodal data capture, powering the next generation of robotics & world models 🌎

English

1.5K

OpenGraph Labs 🧤 retweetledi

Jerry Han@JerryHan_og·17 Mar

World models can predict the next frame. They can't predict the next touch. That's the gap visuo-tactile world models will close. Is the robot gripping hard enough? Is the surface rigid or soft? When exactly does contact begin and end? Vision doesn't know. Tactile does. We built @OpenGraph_Labs to capture what cameras miss. Egocentric RGB × 5-finger multi-taxel tactile gloves. Frame-synced. Calibrated. In-the-wild. No lab setups. No scripted pick-and-place. Just humans doing real tasks in real stores. Watch the exact moment contact happens. The pressure map lights up in sync. Every touch. Every frame. 👇

English

119

12.7K

OpenGraph Labs 🧤 retweetledi

Julia Kim@_juliakeem·15 Mar

Robotics & world models require real-world multi-sensory data at scale. But collecting vision, tactile, and IMU data simultaneously is much harder than it sounds. Each sensor runs at different frequencies, latencies, and clock domains. Integrating them means dealing with hardware quirks, driver inconsistencies, and constant timestamp drift. This is fundamentally a synchronization problem. And it gets harder as more modalities are added and tasks become longer-horizon, because temporal misalignment compounds: the model loses the causal structure of what happened and when. We learned this the hard way building our own pipelines. That experience led us to build a unified platform for multimodal capture, one that handles time alignment, hardware abstraction, and data integrity from day one. @OpenGraph_Labs built 'SyncField - Multimodal Data Capture System " which: ▪️ Supports any hardware configuration (multiple cameras + tactile + IMU) ▪️ Automatic synchronization across all modalities ▪️ Output is fully time-aligned and ready to train on It already powers humanoid robotics teams, data collection companies, and university research labs. If your team is collecting multimodal robotics data, we'd love to talk. (now onboarding teams one by one)

English

297

17.3K

OpenGraph Labs 🧤 retweetledi

Julia Kim@_juliakeem·12 Mar

visuo-tactile world model tactile sensing is critical for contact state and contact interaction dynamic

Bessemer@BessemerVP

Robotics today looks a lot like NLP in 2005. We hand-code physics simulations the same way linguists hand-coded grammar rules. And it doesn't scale. A new class of models — world models — learns physics from video instead. The early results are striking. The gaps are real. Here's what you need to know. → bvp.com/atlas/can-worl… cc: @TaliaGold, @bhavikvnagda, @gracejhma

English

125

24.2K

OpenGraph Labs 🧤@OpenGraph_Labs·11 Mar

Introducing SyncField, our turnkey solution for in-the-wild data collection that your team can deploy from day1 Data quality = model quality = deployment success 🚀

Julia Kim@_juliakeem

Data can’t just be outsourced🤯 To iterate fast, robotics teams must own their data infrastructure Introducing SyncField: turnkey data infrastructure for in-the-wild data collection (Best for UMI-style & Embodied human) #Robotics #UMI #DataCollection

English

979

OpenGraph Labs 🧤 retweetledi

Beomsoo Son@BeomsooSon·11 Mar

First prototype in progress. A glove that captures tactile, finger pose, and precise wrist pose, all synced. Existing solutions cost a fortune and fall apart fast. We’re fixing that. #robotics #tactile

English

856

OpenGraph Labs 🧤 retweetledi

Jerry Han@JerryHan_og·28 Şub

Are you sure your training data is actually synced? Egocentric camera sees a hand grasping an orange, but the wrist cam shows nothing and tactile reads zero contact. Your policy is learning from broken data and doesn't even know it. In Physical AI, multi-modal sync is everything. → Egocentric: 30fps → Wrist: 30fps → Tactile: 100Hz Different devices, different clocks, slightly different rates. The drift starts small. Barely noticeable frame by frame. But over a 4-minute episode, that tiny difference compounds into seconds of misalignment. And you had no way to even check. Until now. We built the Sync Quality Dashboard. One score tells you if your data is clean. Then go deeper. Clock offset, drift rate, jitter, frame drops, per-stream correction. All visible, all measurable. In a 4-min episode, accumulated clock drift reached 7.5 seconds by the end of the recording. After correction: 9.0ms. That's the difference between "roughly aligned" and "actually aligned." Visually confirm vision-to-vision, vision-to-tactile alignment frame by frame. No more "trust me, the data is fine." We don't just collect multi-modal demos. We ship a quality assurance layer so you can verify every episode before it touches your model. All data in @LeRobotHF format. Ready to train. Verified in sync. Stop guessing. Start verifying.

English

103

9.8K

OpenGraph Labs 🧤 retweetledi

Jerry Han@JerryHan_og·27 Şub

VLMs see everything. Feel nothing. VLMs annotate what looks like contact. Tactile sensors verify what actually is contact. We ran VLM annotation on real manipulation demos. It labeled a grasp as "approach." Skipped release phases entirely. Hallucinated state transitions that never occurred. 6 out of 36 action phases wrong. 17% of your training data, corrupted. Why? Pixels don't know when a fingertip touches a surface. Pixels don't know when grip pressure hits zero. Tactile sensors do. So we built a pipeline that catches every error automatically. Tactile evidence validates every contact transition. Wrong labels corrected, missing phases inserted. Not faster labeling. Truthful labeling. This is one of the core problems we're solving at @OpenGraph_Labs

English

4.4K

OpenGraph Labs 🧤 retweetledi

Jerry Han@JerryHan_og·21 Şub

Robots can't learn if their eyes and hands are out of sync. A 30fps camera and a 1kHz tactile sensor don't speak the same language. Multiple cameras, multiple sensors, all on different clocks at different rates. Jitter from USB and OS scheduling. Drift that compounds every second of recording. We built a multi-modal sync pipeline that aligns all of it to ±2.5ms. Automatically. Every frame matched. Zero sensor samples lost. No hardware triggers needed. Sensor-agnostic. Hardware-agnostic. Just plug in and record. Physical AI needs real hand-eye coordination. Not approximate, precise. We're building this at OpenGraph. opengraphlabs.com

English

291

20.7K

OpenGraph Labs 🧤@OpenGraph_Labs·18 Şub

@WiYoungsun Really love this work @WiYoungsun We truly believe tactile sensing is a fundamental pillar for manipulation!

English

139

Youngsun Wi@WiYoungsun·17 Şub

Dexterous hands vary widely—so do tactile modalities. 🖐️🌈 Our vision on tactile human-to-robot transfer: 🔓 Not tied to specific hardware ♻️ Reuse human tactile demos across embodiments Presenting TactAlign, a cross-sensor tactile alignment for cross-embodiment policy transfer.

English

151

20.1K

OpenGraph Labs 🧤@OpenGraph_Labs·18 Şub

Tactile feedback is critical for safe and reliable real-world robot deployment 🤚🧤🤖 Really impressive work from @WiYoungsun demonstrating how a shared latent space can bridge tactile signals from wearable gloves to robot embodiments

Youngsun Wi@WiYoungsun

English

OpenGraph Labs 🧤 retweetledi

Junfan Zhu 朱俊帆 ✈️ CVPR@junfanzhu98·8 Şub

@agihouse_org Robotics Hackathon 💐@OpenGraph_Labs Multimodal Long-Horizon Reasoning video→temporal segmentation→3D reconstruct→cross-modal predict→task graphs+success/failure pattern→self-supervised multimodal encoder 🔹Pouring: ACT+subtask-centric 👉🏻linkedin.com/posts/junfan-z…

English

2.8K

OpenGraph Labs 🧤 retweetledi

Beomsoo Son@BeomsooSon·8 Şub

🏆 Won 1st Place at the AGI Hackathon at @agihouse_org with @juliakeem @JerryHan_og and @OpenGraph_Labs! We built a "Temporal Action Segmentation Pipeline" for Physical AI. The Problem: Robotics data today = short clips, RGB-only, lab settings. We need long-horizon, multi-modal, in-the-wild data. Our Solution: 🎬 Input: Long manipulation video (5+ mins) 🤖 Gemini VLM → Action & Phase segmentation 🎯 SAM3 → Object tracking with text prompts 🌐 Pi3 → 3D reconstruction & camera poses 📚 Skill clustering → Reusable skill library → Output: Structured robot training data with timestamps, masks & 3D Humans ARE the ultimate robots 🦾 #PhysicalAI #Robotics #Hackathon #Gemini #SegmentAnything Huge thanks to @henry_yu_01 @NomadicML @zoox @DynaRobotics

English

7.5K

OpenGraph Labs 🧤 retweetledi

Joel Jang@jang_yoel·6 Şub

Robotics right now feels like peak entropy. Everyone has a different bet on what will work, and they're all confident, which is why doing robotics research right now is so fun. I wrote an essay on the question that's been driving our work from DreamGen → DreamZero → what’s next My bet: human experience is the only data source that scales, world models are the right paradigm, and humanoids have the edge. joeljang.github.io/world-models-f…

English

272

56.7K

OpenGraph Labs 🧤@OpenGraph_Labs·7 Şub

8/ On top of this infrastructure, we’re scaling. Across 30+ industries, our tactile gloves are deployed in homes, factories, and service environments. They capture long-horizon manipulation in the real world.

English

387

OpenGraph Labs 🧤@OpenGraph_Labs·7 Şub

7/ From this data, we derive higher-level physical understanding: - Temporal action segmentation - Object tracking with SAM3 - Voice modality aligned with actions - Affordance points with associated touch information - Real-time physical deformation during contact

English

420

OpenGraph Labs 🧤@OpenGraph_Labs·7 Şub

Introducing the Tactile Data Engine - the frontier dataset for robotics. Tactile sensing is the new modality for next-generation robot training and world models, enabling robots to understand physical interaction beyond vision. Let’s get to the next level 🚀

English

6.3K

Keşfet

@BostonDynamics @Stanford @AGIBOTofficial @intbotai @BytedanceTalk @Google @moonlake @Rivian