X Square Robot@XSquareRobot
X Square Robot Unveils New Embodied AI Model, Says Robots Will Arrive in Homes in 35 Days
Backed by Alibaba, ByteDance, Xiaomi and Meituan, X Square Robot unveiled a next-generation embodied AI foundation model for home robots and said its first deployments in everyday households will begin within 35 days.
X Square Robot on Tuesday unveiled WALL-B, a new embodied AI foundation model designed for deployment in real-world homes, marking what the company described as a major step toward bringing general-purpose robots into daily family life.
At a launch event themed "Born to Bot, Bot to Family," the company also introduced its World Unified Model (WUM) architecture, a training framework that combines vision, language, action and physical prediction within a single system from the outset. X Square said the model is intended to help robots operate in the far more unpredictable setting of a home, where tasks, layouts and interactions vary from moment to moment.
"Robots in factories and in homes are completely different. In factories, they repeat the same action 10,000 times without variation. In a home, however, they need to perform 10,000 different actions, each unique and non-repetitive. Therefore, the challenge of a truly intelligent robot lies not in repeating a single action, but in the ability to execute new, untrained movements within unstructured environments. Deploying robots in the home is one of the most significant technical hurdles of our time," said Qian Wang, founder and CEO of X Square Robot.
WALL-B is the first real-world implementation of the World Unified Model architecture. Unlike modular systems that train perception, language and control separately, X Square Robot said World Unified Model optimizes those capabilities jointly from the very beginning. The company said that allows physical prediction — including force, friction and collision dynamics — to emerge as part of the model itself, rather than being layered on afterward.
"We train all capabilities—vision, language, action, and prediction—within the same network from day one. Much like infants, who do not learn to see, move and speak in isolated, sequential stages, but instead see, move listen and act simultaneously while receiving feedback, we have integrated all these capabilities into a unified whole," said Wang Hao, CTO of X Square.
X Square Robot said the development of WALL-B rests on two pillars. The first is a data strategy that prioritizes training on authentic, non-staged home environments to cover the “long-tail” distribution of real-world scenarios, such as misplaced objects and temporary occlusions. Unlike models primarily trained on synthetic data or laboratory datasets, this strategy exposes WALL-B to the natural clutter of lived-in spaces—misplaced items, unexpected obstacles, and spontaneous human activity—ensuring that the training data reflects real-world conditions rather than a simplified version. The second is a physics-aware predictive mechanism that anticipates physical outcomes before an action is taken, enabling the model to respond to contact dynamics instead of just reacting. The development of the self-developed WUM architecture on physical robotic platforms highlights the company’s accumlated experience in bridging sim-to-real gaps across varied operational contexts.
Wang commented that the current AI model is still in an "intern" stage, subject to errors requiring remote assistance. For instance, it may mistakenly place slippers in the kitchen or pause while wiping a table to "think". However, the model operates nonstop 24 hours a day, becoming increasingly "intelligent" as each day of operation generates new data. In 35 days, on May 25, X Square Robot will officially bring its robots into everyday homes, underscoring the company’s long-term commitment to the home robotics sector.