CortexAI

15 posts

CortexAI banner
CortexAI

CortexAI

@cortexairobot

Building the world's most diverse real-world, real-workplace, and industry-scale egocentric + robot dataset.

San Francisco, CA Katılım Nisan 2025
8 Takip Edilen262 Takipçiler
Sabitlenmiş Tweet
CortexAI
CortexAI@cortexairobot·
We’re excited to share MolmoAct 2, a fully open-sourced robot foundation model for real-world deployment. Pushing SOTA is truly a collaborative effort. We’re grateful to be @allen_ai's exclusive data collection partners on this project, contributing: - 700+ hours of real-world bimanual data - Five robotics policies - Third-party benchmarks of MolmoAct 2’s real-world fine-tuning performance 🦾
Ai2@allen_ai

Robotics models often struggle outside controlled environments. Ours is built to work in real ones. Today we're launching MolmoAct 2, which can assist with a host of chores & lab tasks, plus the MolmoAct 2-Bimanual YAM dataset—the largest open robotics dataset of its kind. 🧵

English
1
5
17
1.8K
CortexAI
CortexAI@cortexairobot·
We provided over 700 hours of robot demo training data towards the MolmoAct2 project, performing tasks like folding a towel, scanning groceries, charging a smartphone, and table bussing. It still blows our mind to see robot arms successfully interacting with objects they've never seen before! 🤯
Jiafei Duan@DJiafei

One of the coolest things we did this time around: we set up in a public space and asked folks to volunteer their own personal items for MolmoAct2 to interact with. Some of the feedback and results genuinely shocked us — like how it picked up such oddly shaped objects. We're always looking for beta testers, or you can try it yourself since everything is open-source end to end. @allen_ai

English
1
3
7
552
CortexAI
CortexAI@cortexairobot·
@chris_j_paxton We had a blast being this third-party benchmarking firm. We believe that this helps labs across the board better evaluate their models. 🌟
English
1
2
1
808
Chris Paxton
Chris Paxton@chris_j_paxton·
One really cool thing from this report: retaining a third-party firm to do benchmarking. if this was done double-blind it would be essentially the best possible setup for knowing which models are best. hope to see these practices in future model releases.
Chris Paxton tweet media
Ai2@allen_ai

Robotics models often struggle outside controlled environments. Ours is built to work in real ones. Today we're launching MolmoAct 2, which can assist with a host of chores & lab tasks, plus the MolmoAct 2-Bimanual YAM dataset—the largest open robotics dataset of its kind. 🧵

English
5
7
34
5.3K
CortexAI
CortexAI@cortexairobot·
We’re excited to share MolmoAct 2, a fully open-sourced robot foundation model for real-world deployment. Pushing SOTA is truly a collaborative effort. We’re grateful to be @allen_ai's exclusive data collection partners on this project, contributing: - 700+ hours of real-world bimanual data - Five robotics policies - Third-party benchmarks of MolmoAct 2’s real-world fine-tuning performance 🦾
Ai2@allen_ai

Robotics models often struggle outside controlled environments. Ours is built to work in real ones. Today we're launching MolmoAct 2, which can assist with a host of chores & lab tasks, plus the MolmoAct 2-Bimanual YAM dataset—the largest open robotics dataset of its kind. 🧵

English
1
5
17
1.8K
CortexAI retweetledi
Lucas Ngoo
Lucas Ngoo@lucasngoo·
Big release from @allen_ai! @cortexairobot supported Ai2 on 720h of bimanual robot data and ran the real-world evals for MolmoAct2. Largest open dataset of its kind + strong results vs baselines. Congrats to the Ai2 team @DJiafei @hq_fang!
Ai2@allen_ai

Robotics models often struggle outside controlled environments. Ours is built to work in real ones. Today we're launching MolmoAct 2, which can assist with a host of chores & lab tasks, plus the MolmoAct 2-Bimanual YAM dataset—the largest open robotics dataset of its kind. 🧵

English
1
5
9
742
CortexAI retweetledi
Jiafei Duan
Jiafei Duan@DJiafei·
1/🧵 Building more with less. In the original MolmoAct, we introduced our first in-house dataset: 22 hours of teleoperated data on a single Franka. Today we're releasing the MolmoAct 2 Bimanual YAM dataset, 720 hours of high-quality data collected with @cortexairobot . Not toy tasks. Real-world tasks ready to deploy into business workflows today.
English
1
3
10
1.5K
CortexAI retweetledi
Ai2
Ai2@allen_ai·
We retained @cortexairobot to run a third-party real-world fine-tuning benchmark. Across trials on a broad suite of tabletop, in-the-wild, and mobile tasks, MolmoAct 2 outperformed systems including OpenVLA-OFT, π0.5, X-VLA, & Cosmos Policy.
Ai2 tweet media
English
1
3
14
1.3K
CortexAI retweetledi
Ai2
Ai2@allen_ai·
Robotics models often struggle outside controlled environments. Ours is built to work in real ones. Today we're launching MolmoAct 2, which can assist with a host of chores & lab tasks, plus the MolmoAct 2-Bimanual YAM dataset—the largest open robotics dataset of its kind. 🧵
English
11
72
284
364.1K
CortexAI retweetledi
Jiafei Duan
Jiafei Duan@DJiafei·
Excited to share Embodied Reasoning in Action at @CVPR 2026 — a workshop and open challenge on robotic manipulation. We’re launching two challenges: ⚡ RoboSpatial — point, fit, and place from RGB-D home scenes ⚡ PointArena — language-guided pixel pointing for VLM evaluation 💰 Up to $6K in cash prizes 📍 Denver, CO 📅 June 3–4, 2026 🗓️ Challenge deadline: May 23, 2026 Huge thanks to our sponsors: @NVIDIARobotics , @BitRobotNetwork , and @cortexairobot Learn more: embodied-reasoning.github.io
Jiafei Duan tweet media
English
1
10
43
10K
CortexAI retweetledi
Lucas Ngoo
Lucas Ngoo@lucasngoo·
We’re now deploying mobile bimanual robots in-the-wild collecting real-world data, running model evals, and capturing recovery trajectories for RL
English
3
14
58
4.2K
CortexAI retweetledi
Lucas Ngoo
Lucas Ngoo@lucasngoo·
1/ They pretrain a robot world model on ~44k hours of egocentric human video. Mostly RGB. No detailed action labels. So the question is: how do you learn action-conditioned dynamics from unlabeled video? 2/ Their idea is “latent actions.” They train a VAE that takes two consecutive frames (fₜ, fₜ₊₁) and compresses the transition into a small vector. That vector represents what changed between the frames. It becomes a proxy for the action. 3/ They use these latent actions to condition a video world model: frameₜ + latent_actionₜ → frameₜ₊₁ So instead of passive next-frame prediction, the model learns transitions conditioned on action. They benchmark this and show latent actions perform close to using real hand pose labels (e.g. EgoDex). 4/ After large-scale human pretraining, they post-train on real robots. They reset the action-conditioning layer and replace latent actions with real robot controls. Since the model already learned general physics from human video, much less robot data is needed to adapt to a new embodiment. 5/ They also show that increasing the scale and diversity of human video improves generalization to unseen objects and novel action variations. Now imagine training on 100 million hours of large scale, diverse, real world workplace data. This is the future we are excited to help power at @cortexairobot.
Jim Fan@DrJimFan

Announcing DreamDojo: our open-source, interactive world model that takes robot motor controls and generates the future in pixels. No engine, no meshes, no hand-authored dynamics. It's Simulation 2.0. Time for robotics to take the bitter lesson pill. Real-world robot learning is bottlenecked by time, wear, safety, and resets. If we want Physical AI to move at pretraining speed, we need a simulator that adapts to pretraining scale with as little human engineering as possible. Our key insights: (1) human egocentric videos are a scalable source of first-person physics; (2) latent actions make them "robot-readable" across different hardware; (3) real-time inference unlocks live teleop, policy eval, and test-time planning *inside* a dream. We pre-train on 44K hours of human videos: cheap, abundant, and collected with zero robot-in-the-loop. Humans have already explored the combinatorics: we grasp, pour, fold, assemble, fail, retry—across cluttered scenes, shifting viewpoints, changing light, and hour-long task chains—at a scale no robot fleet could match. The missing piece: these videos have no action labels. So we introduce latent actions: a unified representation inferred directly from videos that captures "what changed between world states" without knowing the underlying hardware. This lets us train on any first-person video as if it came with motor commands attached. As a result, DreamDojo generalizes zero-shot to objects and environments never seen in any robot training set, because humans saw them first. Next, we post-train onto each robot to fit its specific hardware. Think of it as separating "how the world looks and behaves" from "how this particular robot actuates." The base model follows the general physical rules, then "snaps onto" the robot's unique mechanics. It's kind of like loading a new character and scene assets into Unreal Engine, but done through gradient descent and generalizes far beyond the post-training dataset. A world simulator is only useful if it runs fast enough to close the loop. We train a real-time version of DreamDojo that runs at 10 FPS, stable for over a minute of continuous rollout. This unlocks exciting possibilities: - Live teleoperation *inside* a dream. Connect a VR controller, stream actions into DreamDojo, and teleop a virtual robot in real time. We demo this on Unitree G1 with a PICO headset and one RTX 5090. - Policy evaluation. You can benchmark a policy checkpoint in DreamDojo instead of the real world. The simulated success rates strongly correlate with real-world results - accurate enough to rank checkpoints without burning a single motor. - Model-based planning. Sample multiple action proposals → simulate them all in parallel → pick the best future. Gains +17% real-world success out of the box on a fruit packing task. We open-source everything!! Weights, code, post-training dataset, eval set, and whitepaper with tons of details to reproduce. DreamDojo is based on NVIDIA Cosmos, which is open-weight too. 2026 is the year of World Models for physical AI. We want you to build with us. Happy scaling! Links in thread:

English
3
11
105
14.3K
CortexAI retweetledi
Lucas Ngoo
Lucas Ngoo@lucasngoo·
Thanks @Fondocom for the podcast. Shared about: > World models are the equivalent of language models for the physical world: predicting next visual frames and next robot actions. > Scaling laws for robotics world models: Large-scale, diverse, real-world egocentric data leads to better world models, which in turn lead to better robot action predictions from the model. > Progress comes from real-world deployment with humans in the loop: Human operators initially monitor and correct robot trajectories, and that recovery data feeds back into training to gradually increase autonomy.
English
1
9
17
1.8K
CortexAI
CortexAI@cortexairobot·
Cortex AI in Times Square. Accelerating physical AI with large-scale, real-world robotics data. Thanks to @brexHQ for making it happen.
CortexAI tweet media
English
0
3
12
839
CortexAI retweetledi
Y Combinator
Y Combinator@ycombinator·
Cortex AI (@cortexairobot) produces the world's most diverse egocentric and robot dataset, captured in real workplace environments. Leading research labs use Cortex AI to collect data and deploy robotics foundation models in real-world settings.
English
8
6
99
16.3K