
Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.
Dongping Chen
13 posts

@Dongping0612
Ph.D. Student @UofMaryland | Incoming Intern @Adobe | Ex intern @uwcse @RAIVNLab Agentic AI / Multimodal / Data-Centric AI

Introducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.







Introducing AURORA 🌟: Our new training framework to enhance multimodal language models with Perception Tokens; a game-changer for tasks requiring deep visual reasoning like relative depth estimation and object counting. Let’s take a closer look at how it works.🧵[1/8]


Have trouble finding a benchmark for your use case? Introducing TaskMeAnything, a benchmark generation engine that creates VQA benchmarks on demand for assessing multimodal language models like GPT-4o. Website: task-me-anything.org
