Yi Gu
36 posts








2025 has been a productive year for me as a researcher and engineering lead. I managed to spend time working on three exciting technical projects in addition to my duty of running the university, and made some significant progress: 1: PAN: a world model built for simulation, prediction, and agentic reasoning over arbitrary time/space horizon, rather than just generating shot video clips as other “world models” do. In the CWM paper (arxiv.org/abs/2507.05169), we proposed a new architecture called Generative Latent Prediction (#GLP) for structured latent-space reasoning while maintaining fidelity to the physical environment, which is defined by three key components: 1- Latent Reasoning Backbone — an LLM/DM-driven module that produces structured, stateful representations conditioned on history and action; 2- Generative Supervision — a diffusion-based decoder that renders the consequences of latent transitions back into the perception space, providing explicit grounding in observable reality; and 3- Closed-loop Learning Objective — a training strategy that continually aligns simulated dynamics with real-world evidence, reducing drift and reinforcing causal consistency. At the Institute of Foundation Models (IFM) of @mbzuai, we built the PAN world model (ifm.ai/pan/) based on this architecture, which moves PAN beyond correlation-driven prediction toward mechanistic understanding, enabling the model to learn how and why the environment changes rather than relying solely on abstract latent dynamics. The combination of generative grounding, stepwise verification, and action-conditioned reasoning provides robustness in settings where interpretability, causal structure, and physical consistency are essential, and allows PAN to exceed significantly over existing WMs on novel and challenging benchmarks beyond mere short-horizon video constancy, such as Action Simulation Fidelity, Long-Horizon Consistency, and Simulative Reasoning and Planning Quality. These capabilities are particularly relevant across domains such as personalized game, agentic and embodied robotics, and multi-physics simulation. 2: AIDO: the AI-driven Digital Organism (arxiv.org/abs/2412.06993) is an AI system that enables simulation of all biological, physiological, and clinical events occurring within a living organism — outputs how a real biological system would respond, against any expressible and actionable biological interaction, intervention, and manipulation, through a digital interface – like a World Model would do in world simulation upon action prompting. This contrasts existing works under the banner of “virtual cell” whereas in reality focusing on functional approximation in classical machine learning style to predict RNA counts of N-k genes upon perturbation of k genes (where k typically equals to 1, and represents an abstract, isolated, and idealistic binary “action” not actually realizable in real biological experiments). At @genbioai (genbio.ai), we are building the Virtual Cell, corresponding to the cellular level of the AIDO, as a world model of the cell that simulates biological possibilities at both molecular (e.g., RNA count distributions, but also other molecular phenomenon such as drug interactions) and cellular level (such as cell shape, dynamics, and function). It is built on a novel neural architecture that integrates multimodal biological data with unconventional tokenization schemes; learns representations of sequence, structure, interaction, sub-cellular units, and higher-order biological entities in a causal and hierarchical manner; leverages innovative pre-train and post-train schemes, and allows action-conditioned generation of biological outputs across scales. Our AIDO system features in-context molecular design and holistic cell simulation platforms, and an Agent Interface to enable researchers performing in silico experiments on the virtual-bio engine over a wide range of tasks like discovering new targets and simulating drugs and diseases mechanisms. Our system ranks No. 1 Out of 97 Methods in ProteinGym Benchmark, and is hosted by Chan Zuckerberg Initiatives as a Representative FM for Virtual Cell. We will soon release the agentic Virtual Cell Lab to the scientific community for simulative biological research and experiments. 3: K2 LLMs: including K2-v2 (ifm.ai/k2/) — world’s strongest fully open LLM in its class (70B), rivaling open-weight leaders and approaches the performance of models over three times its size, and K2-think (k2think.ai/k2think) — world’s fastest and most parameter-efficient reasoning LLM post-trained from K2-v2, both from the @llm360 initiative and from the IFM. In a world where most U.S. frontier models dominate performance, but remain completely closed, while Chinese open-weight systems occupy a large semi-open middle band, our K2 models represent an effort to better serve the AI community and the public users with truly open-source foundation models that are transparent, reproducible, and competitive, with a 360-open approach: making public not just model weights, but also training data, mid-training checkpoints, logs and methodology, and fine-tuning recipes. In K2-v2 (arxiv.org/abs/2512.06201), We actively infuse domain knowledge, reasoning, long-context, and tool use throughout the training process, which explicitly prepares the model for complex reasoning tasks after post-training. In K2-think (arxiv.org/abs/2509.07604), the key technical elements underlying the remarkable performance include: 1) long chain-of-thought supervised fine tuning, 2) reinforcement learning with verifiable rewards, 3) agentic planning before reasoning, 4) test-time scaling, 5) speculative decoding, and 6) inference optimized hardware. Our models punched above their weights and with their 360-degree transparency, directly address reproducibility, auditability, and governance the constraints that will define real-world deployment. As we say goodbye to 2025, I’d like to thank my collaborators, developers, and students from IFM, GenBio, MBZUAI, CMU for the wonderful collaboration. More to come in 2026, you will see bigger and more powerful K2 (LLM), PAN (WM), and AIDO releases, and more advancements in architectural and system work!






Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications. Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090 (see tutorial: genesis-world.readthedocs.io/en/latest/user…). The Genesis physics engine and simulation platform is fully open source at github.com/Genesis-Embodi…. We'll gradually roll out access to our generative framework in the near future. Genesis implements a unified simulation framework all from scratch, integrating a wide spectrum of state-of-the-art physics solvers, allowing simulation of the whole physical world in a virtual realm with the highest realism. We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications. Open Source Code: github.com/Genesis-Embodi… Project webpage: genesis-embodied-ai.github.io Documentation: genesis-world.readthedocs.io 1/n











Can machines understand people’s minds from multimodal inputs? We introduce a comprehensive benchmark: “MMToM-QA: Multimodal Theory of Mind Question Answering” 📜 arxiv.org/abs/2401.08743




