Omar Rayyan

@JieWang_ZJUI @snasiriany molmospaces.allen.ai/leaderboard/ms

0

1

254

Zu Wang@zuwang95·6 May

Happy to share what I’ve been working on since joining Genesis! GENE-26.5 is a one-of-a-kind, robotics-native multimodal foundation model that learns from diverse, in-the-wild data across modalities and outputs actions enabling a 54-DoF robot system to perform the most dexterous, long-horizon manipulation tasks to date—approaching human-level capability. This is the result of innovations across the full stack—data collection and processing, robot systems, model architecture, training strategies, and scalable evaluation infrastructure.

English

17

26

347

16.7K

Omar Rayyan@omarrayyann·28 Nis

QME

0

1

47

Jie Wang@JieWang_ZJUI·28 Nis

@snasiriany My complain of robotics now: no real benchmark and frontier lab do not compete against each other publicly --- RoboArena has not receive new generalist from big tech! We should make the evaluation as a service

English

0

1

254

Jie Wang@JieWang_ZJUI·28 Nis

RoboCasa365 is amazing, interesting to find this leaderboard, where we can have ready-to-use ckpt. And results are not surprising low lol Excited to see #DreamZero, #GeminiRobotics, #Generalist appearing here, we need more scalable sim-based benchmark

English

6

26

3.4K

Omar Rayyan@omarrayyann·27 Nis

@kevin_zakka @pabbeel Congrats Kevin!!

English

1

82

Kevin Zakka@kevin_zakka·26 Nis

Gave my PhD dissertation talk on Friday! It's been an incredible journey made possible by the best advisor who believed in me and gave me the freedom and support to explore. Thank you @pabbeel! And thank you to everyone who came to support and share this milestone with me 🙏

English

63

15

651

31.1K

Omar Rayyan@omarrayyann·21 Nis

@xuningy Cool work!

English

1

339

Xuning Yang@xuningy·20 Nis

When every generalist robot model scores 95%+ on a benchmark, the numbers become meaningless. What if we built a photorealistic benchmark that never saturates and can generate new scenes and tasks with AI Workflows in minutes? We introduce RoboLab! 🧵(1/6)

English

10

26

144

27.1K

Omar Rayyan retweetledi

Kevin Zakka@kevin_zakka·30 Mar

Just merged an amazing contribution by @omarrayyann to mjlab's viser viewer: checkpoint hot-swapping! You can now browse and load any checkpoint mid-session without restarting and it works with local checkpoints and W&B runs.

English

16

141

9.4K

Omar Rayyan retweetledi

RoboPapers@RoboPapers·25 Mar

Benchmarking, evaluating, and developing robotics code is difficult, and part of this is because no simulator really reflects the diversity and scale of real embodiments. Enter MolmoSpaces from AI2: a massive open ecosystem with a range of 230,000 handcrafted and procedurally-generated home environments, including 48,000 manipulable objects. Crucially, MolmoSpaces provides simulation environments which work for both navigation and manipulation. We talked to the team: @YejinKim4, @omarrayyann, and Max Argus, to tell us more. Watch Episode 69 of RoboPapers, with @micoolcho and @DJiafei, now!

English

16

74

20.8K

Omar Rayyan retweetledi

Michael Cho - Rbt/Acc@micoolcho·25 Mar

Jensen approves! Hercules efforts from @YejinKim4 @omarrayyann, Max Argus & team! This has a decent chance of becoming a super important benchmark fo robotics going forward. Check out this @RoboPapers episode with the MolmoSpaces folks.

RoboPapers@RoboPapers

Benchmarking, evaluating, and developing robotics code is difficult, and part of this is because no simulator really reflects the diversity and scale of real embodiments. Enter MolmoSpaces from AI2: a massive open ecosystem with a range of 230,000 handcrafted and procedurally-generated home environments, including 48,000 manipulable objects. Crucially, MolmoSpaces provides simulation environments which work for both navigation and manipulation. We talked to the team: @YejinKim4, @omarrayyann, and Max Argus, to tell us more. Watch Episode 69 of RoboPapers, with @micoolcho and @DJiafei, now!

English

3

7

1.1K

Omar Rayyan@omarrayyann·24 Mar

@shumochu @VilleKuosmanen @pravsels not really. you can just get the prefix from pi0.5's vlm backbone (like done in DSRL) and feed that to a bottleneck

English

Physical Intelligence@physical_int

2

48

Shumo Chu@shumochu·24 Mar

@VilleKuosmanen @pravsels Wouldn’t you need to reproduce pi star 06 with RECAP first? To my best knowledge there is no good OSS version of that.

English

3

0

3

611

Ville🤖@VilleKuosmanen·24 Mar

"RL Token" looks like a great and surprisingly simple post-training methodology for optimising robot models for dexterous tasks in the real world! Over the next few weeks, me and @pravsels will be attempting to reproduce the results (& open source the code) Stay tuned 👀

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

English

8

5

92

22.3K

Omar Rayyan@omarrayyann·11 Mar

Check out our MolmoBot release and the open-sourced foundational models trained entirely in simulated MolmoSpaces homes!

Ai2@allen_ai

Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.🧵

English

@AlberFuen @notmahi @jang_yoel github.com/allenai/molmos…

18

1.5K

Omar Rayyan@omarrayyann·28 Şub

QME

1

16

Alberto Fuentes (e/acc)@AlberFuen·28 Şub

@notmahi @omarrayyann @jang_yoel Can you give steps to run different models on the environments? Both full bench script and script to run the model in a single environment would be great. Thanks in advance!

English

0

54

Omar Rayyan retweetledi

Mahi Shafiullah 🏠🤖@notmahi·28 Şub

MolmoSpaces leaderboard is now open for submissions! When we created this benchmark for zero-shot real-to-sim eval in diverse homes, we didn’t expect things to heat up so quickly. But it did, thanks to @jang_yoel and team at GEAR toppling PI to take the crown on task-general category. Congrats 🎉 You can evaluate and submit your model to this leaderboard: molmospaces.allen.ai/leaderboard

Joel Jang@jang_yoel

𝐃𝐫𝐞𝐚𝐦𝐙𝐞𝐫𝐨 𝐢𝐬 #𝟏 𝐨𝐧 𝐛𝐨𝐭𝐡 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 𝐚𝐧𝐝 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 🏆 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝘁𝗵𝗶𝘀 𝗻𝗼𝘁𝗮𝗯𝗹𝗲: DreamZero-DROID is trained 𝑓𝑟𝑜𝑚 𝑠𝑐𝑟𝑎𝑡𝑐ℎ using only the DROID dataset. No pretraining on large-scale robot data, unlike competing VLAs. This demonstrates the strength of video-model backbones for generalist robot policies (VAMs/WAMs). More broadly, training 𝑜𝑛𝑙𝑦 on real data and evaluating on (1) transparent, distributed benchmarks like 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 or (2) scalable sim-benchmarks like 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 is an exciting step toward fairer and more reproducible evaluation of generalist policies, one that the community can hillclimb together to measure progress. Special thanks to the Ai2 MolmoSpaces team (@notmahi @omarrayyann @YejinKim4 Max Argus) and the RoboArena team (@pranav_atreya) for helping with the set-up and getting these evaluations! Special shout out to @youliangtan @NadunRanawakaA @chuning_zhu, who led these efforts from the GEAR side :) + We also release our DreamZero-AgiBot checkpoint & post-training code to enable very efficient few-shot adaptation. Post-train on just ~30 minutes of play data for your specific robot, and see the robot do basic language following and pick-and-place 🤗(See YAM experiments in our paper for more detail). ++ We also provide the entire codebase & preprocessed dataset to replicate the DreamZero-DROID checkpoint. 🌐 dreamzero0.github.io 💻 github.com/dreamzero0/dre… RoboArena: robo-arena.github.io/leaderboard MolmoSpaces: molmospaces.allen.ai/leaderboard

English

4

40

4.6K

Omar Rayyan@omarrayyann·28 Şub

@chris_j_paxton @notmahi Also DreamZero has a context length of 8. pi models are stateless

English

0

2

153

Chris Paxton@chris_j_paxton·28 Şub

Well dreamzero is: - a much bigger model - has this clever auxiliary loss (predicting video) which probably makes its smaller amount of data go a lot farther unfortunately not enough information yet to tell. at least from the comparisons in the paper. it seems like the aux loss stuff made a huge difference (see figure here) but we don't KNOW that pi-0.5 at 14b params wouldn't do well. although it sure seems like it made a difference. i think there's a lot of work to do on the exact best data mixture.

English

3

1

43

3.6K

Omar Rayyan@omarrayyann·28 Şub

Also thanks to @youliangtan @jang_yoel for their DreamZero API. Their world action model now leads the benchmark with zero sim data. x.com/jang_yoel/stat…

Joel Jang@jang_yoel

𝐃𝐫𝐞𝐚𝐦𝐙𝐞𝐫𝐨 𝐢𝐬 #𝟏 𝐨𝐧 𝐛𝐨𝐭𝐡 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 𝐚𝐧𝐝 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 🏆 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝘁𝗵𝗶𝘀 𝗻𝗼𝘁𝗮𝗯𝗹𝗲: DreamZero-DROID is trained 𝑓𝑟𝑜𝑚 𝑠𝑐𝑟𝑎𝑡𝑐ℎ using only the DROID dataset. No pretraining on large-scale robot data, unlike competing VLAs. This demonstrates the strength of video-model backbones for generalist robot policies (VAMs/WAMs). More broadly, training 𝑜𝑛𝑙𝑦 on real data and evaluating on (1) transparent, distributed benchmarks like 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 or (2) scalable sim-benchmarks like 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 is an exciting step toward fairer and more reproducible evaluation of generalist policies, one that the community can hillclimb together to measure progress. Special thanks to the Ai2 MolmoSpaces team (@notmahi @omarrayyann @YejinKim4 Max Argus) and the RoboArena team (@pranav_atreya) for helping with the set-up and getting these evaluations! Special shout out to @youliangtan @NadunRanawakaA @chuning_zhu, who led these efforts from the GEAR side :) + We also release our DreamZero-AgiBot checkpoint & post-training code to enable very efficient few-shot adaptation. Post-train on just ~30 minutes of play data for your specific robot, and see the robot do basic language following and pick-and-place 🤗(See YAM experiments in our paper for more detail). ++ We also provide the entire codebase & preprocessed dataset to replicate the DreamZero-DROID checkpoint. 🌐 dreamzero0.github.io 💻 github.com/dreamzero0/dre… RoboArena: robo-arena.github.io/leaderboard MolmoSpaces: molmospaces.allen.ai/leaderboard

English

8

360

Omar Rayyan@omarrayyann·28 Şub

You can get more insights than just the success rate (e.g. AR policies like DreamZero and pi0-Fast generate smoother trajectories) and cross-compare policy performance across objects.

English

0

6

195

Omar Rayyan@omarrayyann·28 Şub

MolmoSpaces-Bench leaderboard is now live! Test your generalist policies to see how they compare across tasks and environments. Feel free to reach out if you need help setting it up. molmospaces.allen.ai/leaderboard

English

5

35

1.8K

Omar Rayyan@omarrayyann·20 Şub

@YuXiang_IRVL the MJCF is here if that helps huggingface.co/datasets/allen…

English

1

7

307

Yu Xiang@YuXiang_IRVL·20 Şub

Has anyone built a URDF for the Panda arm + Robotiq 2F-85 gripper setup used in the DROID dataset? Thanks! 🙏

English

5

4

48

4.6K

Omar Rayyan@omarrayyann·16 Şub

@AlberFuen @RanjayKrishna you don't need the camera screen. are you using the "phone only" option on the app?

English

0

17

Alberto Fuentes (e/acc)@AlberFuen·16 Şub

@omarrayyann @RanjayKrishna In the middle of the teleop the camera screens Freeze for some time, then lose track of pos. Id like a way to teleop with Xbox comtroller or keyboard. Possibly the grasps are a way to go. How to use them? I was unable to do simple tasks like grab potato with the gripper

English

0

35

Ranjay Krishna@RanjayKrishna·12 Şub

The amount and diversity of robot data we need is much higher than what we can scale. We are betting on simulation! MolmoSpaces allows you to generate seemingly unlimited amounts of robot data in large diverse environments across multiple simulators.

Ai2@allen_ai

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

English

5

4

48

6.3K

Omar Rayyan@omarrayyann·16 Şub

@AlberFuen @RanjayKrishna can you elaborate on the issues you're facing. feel free to message

English

0

1

19

Alberto Fuentes (e/acc)@AlberFuen·14 Şub

@RanjayKrishna Amazing work!! somehow, it is hard to do the teleop with the TeleDex app :(

English

0

35

Omar Rayyan@omarrayyann·15 Şub

We opensource our mjcf2grasp pipeline that lets you generate and verify grasps starting from an MJCF file: github.com/allenai/molmos…

English