Moritz Schiebold

679 posts

Moritz Schiebold banner
Moritz Schiebold

Moritz Schiebold

@m_schiebold

Building the data stack for Physical AI at https://t.co/vVh4lJnGvd. I admire people who are building things

Stockholm, Sweden Katılım Kasım 2014
593 Takip Edilen262 Takipçiler
Moritz Schiebold retweetledi
Pablo Vela
Pablo Vela@pablovelagomez1·
0.32 has shipped, and it's a massive release from @rerundotio. There's a ton of cool new features, and I wanted to highlight 2 in particular 1. OSS Server streaming from disk 2. Dataset review I walk you through them in the video, so take a look. I'll have a much longer blog post next week about the entire pipeline. With 0.32, much of the foundation is set for a unified data layer for physical data, and I'll be getting into the details of it with all that I've built over the past year. This will cover 1. Raw Data Collection 2. Data Ingestion 3. Catalog Registration 4. Query and Review 5. Post Process 6. Training so lots to share
Pablo Vela@pablovelagomez1

As I alluded to in the previous post, I've given egoexo-forge datasets the same @rerundotio OSS server treatment! To start, I ingested 10 samples from each of the following datasets before scaling: 1. Assembly101 2. HOCap 3. HOT3D-Aria 4. Ego-Dex 5. Aria-Gen2 (new!) 6. HOT3D-Quest3 7. Umetrack All follow a standardized format that is easily queryable. This means I can do things like: "Find all frames where hand keypoints exist in camera view 2 but not view 1" "Filter to samples where the hand keypoint velocity is > 0.5 m/s" Across all of the ingested datasets. Still a preview, and I'll have more to share with a super exciting release that is coming soon. I'll be adding a bunch more datasets and releasing the code after the release, so stay tuned!

English
3
6
52
5.1K
Moritz Schiebold retweetledi
Nikolaus West
Nikolaus West@NikolausWest·
We just shipped our biggest @rerundotio open source release ever, and our commercial product Rerun Hub is now available as private preview. I’m deeply proud of what the team has done here and very excited to share more publicly what we’ve been working on for the last year and a half. We’re building a new data layer for robot learning
Nikolaus West@NikolausWest

x.com/i/article/2054…

English
0
20
43
8K
Moritz Schiebold retweetledi
Rerun
Rerun@rerundotio·
We’re getting together in SF next Wednesday. The first 30 spots filled up faster than expected, so we’re making room for more people later today. Grateful for the momentum. Hope to see you there!
Rerun tweet media
English
1
3
12
3.7K
Moritz Schiebold retweetledi
Pablo Vela
Pablo Vela@pablovelagomez1·
As I alluded to in the previous post, I've given egoexo-forge datasets the same @rerundotio OSS server treatment! To start, I ingested 10 samples from each of the following datasets before scaling: 1. Assembly101 2. HOCap 3. HOT3D-Aria 4. Ego-Dex 5. Aria-Gen2 (new!) 6. HOT3D-Quest3 7. Umetrack All follow a standardized format that is easily queryable. This means I can do things like: "Find all frames where hand keypoints exist in camera view 2 but not view 1" "Filter to samples where the hand keypoint velocity is > 0.5 m/s" Across all of the ingested datasets. Still a preview, and I'll have more to share with a super exciting release that is coming soon. I'll be adding a bunch more datasets and releasing the code after the release, so stay tuned!
Pablo Vela@pablovelagomez1

Most people think @rerundotio is a visualization tool. In reality, it's a database masquerading as a visualizer. I wanted to showcase this functionality by building a full data pipeline consisting of: ingestion → baseline method → eval → finetuning for SLAM on egocentric data. I'll eventually extend this to the rest of my ego/exo datasets, but I wanted to start with a smaller bunch of datasets first. Rerun allows you to expose your saved .rrd files to a catalog where you store datasets. You can query, filter, and join them like any database using DataFusion under the hood. These are the same .rrd files that are automatically generated whenever you visualize anything in Rerun and decide to save it to disk. I brought in 109 VSLAM-LAB sequences across 14 datasets into the Rerun catalog as an example. These include 7Scenes, Euroc, eth3d, and others. Now I can query them with segment_table, filter_segments, and filter_contents instead of parsing CSVs and YAML files. With a strong set of ground-truth datasets for SLAM, baseline additions become nearly automatic with agents like Opus/Codex. This unification of data and visualization is imo the largest missing part for Physical AI. Visualization becomes a natural byproduct of having your data properly structured and queryable. The catalog API is what makes it a database, not just a viewer. I initially focused on VSLAM-LAB data, but I'll migrate all the egoexo data to this format in the coming days to really show just how useful this is.

English
1
10
95
19.1K
Moritz Schiebold retweetledi
Pablo Vela
Pablo Vela@pablovelagomez1·
I'm putting together the first @rerundotio meetup in SF! It'll be a casual hangout with other builders working on Physical AI. We'll have a bunch of Rerun folks there, you can chat with the team, share what you're working on, and see what others are up to. We'd love to hear what you want from Rerun and the community going forward. Food and drinks on us 🍕 Luma link here - <luma.com/56u5tv7p>
English
2
2
12
2.2K
Moritz Schiebold retweetledi
Pablo Vela
Pablo Vela@pablovelagomez1·
I moved to SF! It seems like the right time to be in the epicenter of robotics and agents. Would love some recommendations for cool places to work out of and folks to meet. My DMs are open.
Pablo Vela tweet media
English
5
2
26
1.9K
Moritz Schiebold retweetledi
Joni Rakipi
Joni Rakipi@jonirakipi·
april glove always knows 👀
English
6
15
115
21.5K
Moritz Schiebold retweetledi
Nikolaus West
Nikolaus West@NikolausWest·
There is a funny inversion with end-to-end robotics where you remove a lot of explicit perception and vision methods from inference but then still do them as part of data preparation for training. For example, to scale dataset size and diversity lots of teams are using human data to train policies. @GeneralistAI and @sundayrobotics are famously using UMI-style grippers and @physical_int and @Tesla_Optimus have talked about how they train on egocentric data. To turn that data into high quality robot-like trajectories for training you might need to do do camera calibration, hand pose tracking, image segmentation and in-painting, or even full 4D reconstruction. Generalist has reported running over 10k CPUs in their data prep pipeline. These pipelines get complex and hard to debug and manage quickly, especially when using a make-shift data layer consisting of buckets of files + a classic database. Basics like adding columns, extracting slices, and visual debugging are hard. Combining that with scalable read/write and incremental compute is even harder. Modern data-driven robotics needs a new data layer that makes working with physical data as simple as editing a table. In the meantime, experiment loops will be slow and millions of dollars of data of questionable quality will be trained on.
Nikolaus West tweet media
Nikolaus West@NikolausWest

x.com/i/article/2049…

English
3
24
221
25.4K
Moritz Schiebold retweetledi
Nikolaus West
Nikolaus West@NikolausWest·
If you’re serious about robot learning you (unfortunately) need to know about video compression. Camera streams dominate data volumes for most datasets at 90+% even when compressed. Video is more complicated to deal with but the size wins are too big to give up. The unit of compression is a Group Of Pictures (GOP). In the simplest case (what you should use in robotics), GOPs start with a keyframe (I-frame) that is followed by several delta frames (P-frames). Delta frames only need to encode the difference to the previous frame which is where the compression win comes from. That means to decode frame 15 of a 30-frame GOP you need to feed all the preceding frames in the GOP to the decoder to get out that one frame. The GOP controls the tradeoff between random access and compression. Why does this matter for robot learning? Because while training, dataloader performance is dominated by fetching and decoding video. To build a streaming dataloader (you need this for large datasets) it needs to take GOPs into consideration when fetching data for a time step. It’s hard enough to build a dataloader that doesn’t starve your GPUs that most teams forgo flexibility. That means researchers at most of the best funded robotics efforts currently wait around for large export jobs before training can start after each change to the dataset mix or the wrong hyperparameter. This situation obviously won’t last since they all know that experiment cycle times is a key lever to fast progress and the competitive pressure is enormous. If you want to compete in this space you need both flexibility and performance.
Nikolaus West tweet mediaNikolaus West tweet media
Nikolaus West@NikolausWest

x.com/i/article/2049…

English
10
48
527
46.2K
Moritz Schiebold retweetledi
Nikolaus West
Nikolaus West@NikolausWest·
Reading data for even the most simple VLA is much more complicated than for classic perception models because inputs are multimodal data and the model is predicting actions forward in time. It's easy to make subtle mistakes with time alignment and crossing episode boundaries that quietly kill model performance. It's also easy to make mistakes that loading data into GPU memory slow which starves the GPU. Debugging and fixing these issues slow you down and adds friction, and friction is the killer of robotics companies. It's also one of a hundred examples of the data layer tax that every team in Physical AI is paying daily.
Nikolaus West tweet media
Nikolaus West@NikolausWest

x.com/i/article/2049…

English
0
9
26
3.4K
Moritz Schiebold retweetledi
Chris Paxton
Chris Paxton@chris_j_paxton·
The data layer for embodied AI data is very immature, and anyone who's worked with it for a while knows how difficult it is to address these problems! I think for many teams what holds you back will be infra, ops, and not whether you have the latest flashy modeling tricks
Nikolaus West@NikolausWest

x.com/i/article/2049…

English
2
9
54
9.1K
Moritz Schiebold retweetledi
abdel
abdel@AbdelStark·
Ok so after discovering @rerundotio and taking up the challenge from @NikolausWest 😉 I went ahead and integrated Rerun into World Forge. Honestly: very clean, very easy to integrate, and very agent-friendly. So now World Forge has a proper data / events / observability layer built in by default, thanks to @rerundotio 💪🤖
abdel@AbdelStark

Introducing WorldForge: testable world-model workflows for physical AI systems. You can think of it, loosely, as “LangChain for world models”. The problem is that “world model” has become an overloaded label. Depending on context, it can mean a video generator, a cost model, a robot policy, a JEPA-style latent predictor, etc. They share almost nothing, different inputs, runtimes, failure modes. I built WorldForge to stop pretending they're interchangeable. Front-door demo: a real @huggingface LeRobot (@LeRobotHF) diffusion_pusht policy combined with LeWorldModel by @lucasmaes_ checkpoint for scoring. Both run locally on my MacBook in the demo video. LeWM is extremely efficient (~15M params), can plan up to 48× faster, and runs on commodity hardware. WorldForge wires the loop: policy → candidates → score → select Replay happens in a local TUI today, but the same loop could drive a real robot in the world. Would love feedback from people working on world models, physical AI, robotics, ML infra, and adjacent tooling. Fully open source. Contributions very welcome. Plan in the dream, replay in real world. github.com/AbdelStark/wor…

English
1
6
21
2.7K
Moritz Schiebold retweetledi
Pablo Vela
Pablo Vela@pablovelagomez1·
Most people think @rerundotio is a visualization tool. In reality, it's a database masquerading as a visualizer. I wanted to showcase this functionality by building a full data pipeline consisting of: ingestion → baseline method → eval → finetuning for SLAM on egocentric data. I'll eventually extend this to the rest of my ego/exo datasets, but I wanted to start with a smaller bunch of datasets first. Rerun allows you to expose your saved .rrd files to a catalog where you store datasets. You can query, filter, and join them like any database using DataFusion under the hood. These are the same .rrd files that are automatically generated whenever you visualize anything in Rerun and decide to save it to disk. I brought in 109 VSLAM-LAB sequences across 14 datasets into the Rerun catalog as an example. These include 7Scenes, Euroc, eth3d, and others. Now I can query them with segment_table, filter_segments, and filter_contents instead of parsing CSVs and YAML files. With a strong set of ground-truth datasets for SLAM, baseline additions become nearly automatic with agents like Opus/Codex. This unification of data and visualization is imo the largest missing part for Physical AI. Visualization becomes a natural byproduct of having your data properly structured and queryable. The catalog API is what makes it a database, not just a viewer. I initially focused on VSLAM-LAB data, but I'll migrate all the egoexo data to this format in the coming days to really show just how useful this is.
Pablo Vela@pablovelagomez1

I've been on a SLAM/SFM kick. It's one of the more underexplored and lacking areas when it comes to human teleop/data collections, so I've brought over Deep Patch Visual Odometry/SLAM to @rerundotio and @Gradio. With this example, we now have 1. pycuvslam 2. pycolmap/glomap 3. mast3r-slam 4. dpvo/slam all integrated into rerun. The question becomes, which method should be used in what situations? They all make different trade-offs with different camera requirements and throughput/accuracy. What about when a new method comes out? Now that I have several different methods, I plan to use VSLAM-LAB for evaluation. It uses @prefix_dev to isolate all the dependencies of each of these methods and easily compare them against each other. In particular, I'll be converting the data preprocessing, algorithm outputs, and evaluation into rerun recordings (rrd files). This will allow both programmatic querying of anything stored in the files (which method had the highest ATE-to-FPS ratio? Which dataset/sequence caused the most difficulty? etc. etc.), all with easy visual inspection using the rerun server to link them all together. Another really important side effect of this is how it impacts agents. As Karpathy said ``` LLMs are exceptionally good at looping until they meet specific goals, and this is where most of the "feel the AGI" magic is to be found. Don't tell it what to do, give it success criteria, and watch it go. ``` by having accuracy and throughput metrics deeply tied with human inspectable artifacts. One can really accelerate agentic development with an actual understanding of how the method/data performs. I think this is another killer use case that I'll be really leaning into to make ingestion of new datasets/methods trivial with an agent. I'm making it my mission for folks to understand that rerun as a visualization tool only scratches the surface of what its true benefit is. Deep integration between data and visuals, with powerful query capabilities. I'll be focusing on the SLAM use case first and then bringing this into the full egocentric/exocentric data collection domain!

English
3
25
223
34.4K
Moritz Schiebold retweetledi
Paul Graham
Paul Graham@paulg·
We're in Stockholm. You know how there are some places where you think "Nice place to visit, but I wouldn't want to live there"? Stockholm is the kind of place that makes you want to live there.
English
488
144
4.2K
587.9K
Moritz Schiebold retweetledi
Ouster
Ouster@ousterlidar·
Stereo cameras and digital lidar solve different parts of the same perception problem. @Stereolabs3D ZED cameras generate dense point clouds from stereo vision, capturing color, texture, and depth in a single frame. Lidar provides centimeter-accurate 3D range data independent of
English
6
22
252
25.3K
Moritz Schiebold retweetledi
Rerun
Rerun@rerundotio·
Need to skim through lots of robotics data quickly? Here is a sneak peek of our upcoming data exploration view. Shipping as an experimental feature soon 🤩
English
1
8
115
11.1K
Moritz Schiebold retweetledi
ud
ud@uddupa·
micro drones. we cooked a new streaming visual slam @astrm_labs.
English
15
29
372
48.2K