Omar Rayyan

97 posts

Omar Rayyan banner
Omar Rayyan

Omar Rayyan

@omarrayyann

phding @UCLA

Inscrit le Haziran 2014
486 Abonnements588 Abonnés
Tweet épinglé
Omar Rayyan
Omar Rayyan@omarrayyann·
MolmoSpaces provides singular scale and diversity. We built a benchmark that puts that scale to use. MolmoSpaces-Bench evaluates zero-shot policies across thousands of environments previously unseen to them under systematic variation, providing insights that go beyond a success rate % More Below:
Ai2@allen_ai

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

English
6
16
154
12.1K
Omar Rayyan retweeté
Mahi Shafiullah 🏠🤖
Mahi Shafiullah 🏠🤖@notmahi·
MolmoSpaces leaderboard is now open for submissions! When we created this benchmark for zero-shot real-to-sim eval in diverse homes, we didn’t expect things to heat up so quickly. But it did, thanks to @jang_yoel and team at GEAR toppling PI to take the crown on task-general category. Congrats 🎉 You can evaluate and submit your model to this leaderboard: molmospaces.allen.ai/leaderboard
Joel Jang@jang_yoel

𝐃𝐫𝐞𝐚𝐦𝐙𝐞𝐫𝐨 𝐢𝐬 #𝟏 𝐨𝐧 𝐛𝐨𝐭𝐡 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 𝐚𝐧𝐝 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 🏆 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝘁𝗵𝗶𝘀 𝗻𝗼𝘁𝗮𝗯𝗹𝗲: DreamZero-DROID is trained 𝑓𝑟𝑜𝑚 𝑠𝑐𝑟𝑎𝑡𝑐ℎ using only the DROID dataset. No pretraining on large-scale robot data, unlike competing VLAs. This demonstrates the strength of video-model backbones for generalist robot policies (VAMs/WAMs). More broadly, training 𝑜𝑛𝑙𝑦 on real data and evaluating on (1) transparent, distributed benchmarks like 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 or (2) scalable sim-benchmarks like 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 is an exciting step toward fairer and more reproducible evaluation of generalist policies, one that the community can hillclimb together to measure progress. Special thanks to the Ai2 MolmoSpaces team (@notmahi @omarrayyann @YejinKim4 Max Argus) and the RoboArena team (@pranav_atreya) for helping with the set-up and getting these evaluations! Special shout out to @youliangtan @NadunRanawakaA @chuning_zhu, who led these efforts from the GEAR side :) + We also release our DreamZero-AgiBot checkpoint & post-training code to enable very efficient few-shot adaptation. Post-train on just ~30 minutes of play data for your specific robot, and see the robot do basic language following and pick-and-place 🤗(See YAM experiments in our paper for more detail). ++ We also provide the entire codebase & preprocessed dataset to replicate the DreamZero-DROID checkpoint. 🌐 dreamzero0.github.io 💻 github.com/dreamzero0/dre… RoboArena: robo-arena.github.io/leaderboard MolmoSpaces: molmospaces.allen.ai/leaderboard

English
2
4
40
4.4K
Chris Paxton
Chris Paxton@chris_j_paxton·
Well dreamzero is: - a much bigger model - has this clever auxiliary loss (predicting video) which probably makes its smaller amount of data go a lot farther unfortunately not enough information yet to tell. at least from the comparisons in the paper. it seems like the aux loss stuff made a huge difference (see figure here) but we don't KNOW that pi-0.5 at 14b params wouldn't do well. although it sure seems like it made a difference. i think there's a lot of work to do on the exact best data mixture.
Chris Paxton tweet mediaChris Paxton tweet media
English
3
1
43
3.5K
Omar Rayyan
Omar Rayyan@omarrayyann·
Also thanks to @youliangtan @jang_yoel for their DreamZero API. Their world action model now leads the benchmark with zero sim data. x.com/jang_yoel/stat…
Joel Jang@jang_yoel

𝐃𝐫𝐞𝐚𝐦𝐙𝐞𝐫𝐨 𝐢𝐬 #𝟏 𝐨𝐧 𝐛𝐨𝐭𝐡 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 𝐚𝐧𝐝 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 🏆 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝘁𝗵𝗶𝘀 𝗻𝗼𝘁𝗮𝗯𝗹𝗲: DreamZero-DROID is trained 𝑓𝑟𝑜𝑚 𝑠𝑐𝑟𝑎𝑡𝑐ℎ using only the DROID dataset. No pretraining on large-scale robot data, unlike competing VLAs. This demonstrates the strength of video-model backbones for generalist robot policies (VAMs/WAMs). More broadly, training 𝑜𝑛𝑙𝑦 on real data and evaluating on (1) transparent, distributed benchmarks like 𝐑𝐨𝐛𝐨𝐀𝐫𝐞𝐧𝐚 or (2) scalable sim-benchmarks like 𝐌𝐨𝐥𝐦𝐨𝐒𝐩𝐚𝐜𝐞𝐬 is an exciting step toward fairer and more reproducible evaluation of generalist policies, one that the community can hillclimb together to measure progress. Special thanks to the Ai2 MolmoSpaces team (@notmahi @omarrayyann @YejinKim4 Max Argus) and the RoboArena team (@pranav_atreya) for helping with the set-up and getting these evaluations! Special shout out to @youliangtan @NadunRanawakaA @chuning_zhu, who led these efforts from the GEAR side :) + We also release our DreamZero-AgiBot checkpoint & post-training code to enable very efficient few-shot adaptation. Post-train on just ~30 minutes of play data for your specific robot, and see the robot do basic language following and pick-and-place 🤗(See YAM experiments in our paper for more detail). ++ We also provide the entire codebase & preprocessed dataset to replicate the DreamZero-DROID checkpoint. 🌐 dreamzero0.github.io 💻 github.com/dreamzero0/dre… RoboArena: robo-arena.github.io/leaderboard MolmoSpaces: molmospaces.allen.ai/leaderboard

English
1
0
8
327
Omar Rayyan
Omar Rayyan@omarrayyann·
MolmoSpaces-Bench leaderboard is now live! Test your generalist policies to see how they compare across tasks and environments. Feel free to reach out if you need help setting it up. molmospaces.allen.ai/leaderboard
Omar Rayyan tweet media
English
2
5
35
1.7K
Yu Xiang
Yu Xiang@YuXiang_IRVL·
Has anyone built a URDF for the Panda arm + Robotiq 2F-85 gripper setup used in the DROID dataset? Thanks! 🙏
Yu Xiang tweet media
English
5
4
48
4.6K
Alberto Fuentes (e/acc)
Alberto Fuentes (e/acc)@AlberFuen·
@omarrayyann @RanjayKrishna In the middle of the teleop the camera screens Freeze for some time, then lose track of pos. Id like a way to teleop with Xbox comtroller or keyboard. Possibly the grasps are a way to go. How to use them? I was unable to do simple tasks like grab potato with the gripper
English
1
0
0
35
Ranjay Krishna
Ranjay Krishna@RanjayKrishna·
The amount and diversity of robot data we need is much higher than what we can scale. We are betting on simulation! MolmoSpaces allows you to generate seemingly unlimited amounts of robot data in large diverse environments across multiple simulators.
Ai2@allen_ai

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

English
5
4
48
6.2K
Omar Rayyan
Omar Rayyan@omarrayyann·
MolmoSpaces also comes with 42M+ grasps that cover 48K+ objects across 250K+ scenes, allowing large-scale functional trajectory generation in MuJoCo and IsaacSim.
English
5
33
321
18.7K
Omar Rayyan retweeté
Jeff Cui
Jeff Cui@jeffacce·
Also check out MolmoSpaces-Bench from @omarrayyann! Our contact-anchored policies (CAPs) perform well zero-shot across diverse environments and objects. Omar is the rockstar behind our sim env for CAP, enabling us to train and evaluate multiple models in a day.
Omar Rayyan@omarrayyann

It’s hard to find true zero-shot end-to-end policies – ones that work without any fine-tuning in fully novel, simulated environments, even for single tasks! We test two policy families, the π family from @physical_int and the recent Contact-Anchored Policies (CAP) from NYU & UCB. On all our tasks, we are making steady progress – but we are nowhere close to saturation yet.

English
0
1
6
741
Omar Rayyan retweeté
Mahi Shafiullah 🏠🤖
Mahi Shafiullah 🏠🤖@notmahi·
How general are your general robotic policies? Today, we're releasing MolmoScenes-Bench to help explore this question. You can spin up ~1k envs in ~700 unique simulated homes and within hours find out how well your zero-shot policy generalizes to these unseen scenes 🧵
Omar Rayyan@omarrayyann

MolmoSpaces provides singular scale and diversity. We built a benchmark that puts that scale to use. MolmoSpaces-Bench evaluates zero-shot policies across thousands of environments previously unseen to them under systematic variation, providing insights that go beyond a success rate % More Below:

English
1
3
25
3.5K
Omar Rayyan
Omar Rayyan@omarrayyann·
Another example is prompt-sensitivity in lang-conditioned models. On the exact same tasks, early π models fail more when given queries less frequent in DROID dataset – newer π models almost entirely close this gap.
Omar Rayyan tweet media
English
1
0
3
267
Omar Rayyan
Omar Rayyan@omarrayyann·
MolmoSpaces provides singular scale and diversity. We built a benchmark that puts that scale to use. MolmoSpaces-Bench evaluates zero-shot policies across thousands of environments previously unseen to them under systematic variation, providing insights that go beyond a success rate % More Below:
Ai2@allen_ai

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.

English
6
16
154
12.1K