Yi Gu

36 posts

Yi Gu

@YiGu025

San Diego, CA Katılım Ekim 2022

62 Takip Edilen77 Takipçiler

Yi Gu retweetledi

Institute of Foundation Models@IFM_MBZUAI·6d

The Institute of Foundation Models is coming to Stanford with the team behind K2 Think and PAN. On May 21, IFM is hosting its first Stanford event on how foundation models move from research to real systems, with a deep dive into IFM’s reasoning and world models.

Institute of Foundation Models tweet media

English

3.5K

Yi Gu retweetledi

Institute of Foundation Models@IFM_MBZUAI·11 Nis

A visually convincing rollout is not the same thing as a useful world model. WR-Arena is built to test the harder question: can a model simulate futures well enough to support action, planning, and reasoning? That’s the shift from simple next-state prediction to realistic world simulation grounded in real-world utility. Paper + code are live. t.co/waRc0MJmwP t.co/ZzN76nOwoI #AI #WorldModels #Benchmarking #EmbodiedIntelligence #PhysicalAI #MachineLearning

English

5.1K

Yi Gu retweetledi

Institute of Foundation Models@IFM_MBZUAI·25 Mar

Monday at @CarnegieMellon , students and researchers explored how frontier AI systems move from the lab to real-world applications, including a hands-on look at IFM’s PAN world model. Huge thanks to everyone who joined us! Learn more about PAN: ifm.ai/pan/

English

1.1K

Yi Gu@YiGu025·30 Oca

Glad to be a core member of PAN in this fantastic year! See cool demos here (-:ifm.ai/pan/)

Eric Xing@ericxing

2025 has been a productive year for me as a researcher and engineering lead. I managed to spend time working on three exciting technical projects in addition to my duty of running the university, and made some significant progress: 1: PAN: a world model built for simulation, prediction, and agentic reasoning over arbitrary time/space horizon, rather than just generating shot video clips as other “world models” do. In the CWM paper (arxiv.org/abs/2507.05169), we proposed a new architecture called Generative Latent Prediction (#GLP) for structured latent-space reasoning while maintaining fidelity to the physical environment, which is defined by three key components: 1- Latent Reasoning Backbone — an LLM/DM-driven module that produces structured, stateful representations conditioned on history and action; 2- Generative Supervision — a diffusion-based decoder that renders the consequences of latent transitions back into the perception space, providing explicit grounding in observable reality; and 3- Closed-loop Learning Objective — a training strategy that continually aligns simulated dynamics with real-world evidence, reducing drift and reinforcing causal consistency. At the Institute of Foundation Models (IFM) of @mbzuai, we built the PAN world model (ifm.ai/pan/) based on this architecture, which moves PAN beyond correlation-driven prediction toward mechanistic understanding, enabling the model to learn how and why the environment changes rather than relying solely on abstract latent dynamics. The combination of generative grounding, stepwise verification, and action-conditioned reasoning provides robustness in settings where interpretability, causal structure, and physical consistency are essential, and allows PAN to exceed significantly over existing WMs on novel and challenging benchmarks beyond mere short-horizon video constancy, such as Action Simulation Fidelity, Long-Horizon Consistency, and Simulative Reasoning and Planning Quality. These capabilities are particularly relevant across domains such as personalized game, agentic and embodied robotics, and multi-physics simulation. 2: AIDO: the AI-driven Digital Organism (arxiv.org/abs/2412.06993) is an AI system that enables simulation of all biological, physiological, and clinical events occurring within a living organism — outputs how a real biological system would respond, against any expressible and actionable biological interaction, intervention, and manipulation, through a digital interface – like a World Model would do in world simulation upon action prompting. This contrasts existing works under the banner of “virtual cell” whereas in reality focusing on functional approximation in classical machine learning style to predict RNA counts of N-k genes upon perturbation of k genes (where k typically equals to 1, and represents an abstract, isolated, and idealistic binary “action” not actually realizable in real biological experiments). At @genbioai (genbio.ai), we are building the Virtual Cell, corresponding to the cellular level of the AIDO, as a world model of the cell that simulates biological possibilities at both molecular (e.g., RNA count distributions, but also other molecular phenomenon such as drug interactions) and cellular level (such as cell shape, dynamics, and function). It is built on a novel neural architecture that integrates multimodal biological data with unconventional tokenization schemes; learns representations of sequence, structure, interaction, sub-cellular units, and higher-order biological entities in a causal and hierarchical manner; leverages innovative pre-train and post-train schemes, and allows action-conditioned generation of biological outputs across scales. Our AIDO system features in-context molecular design and holistic cell simulation platforms, and an Agent Interface to enable researchers performing in silico experiments on the virtual-bio engine over a wide range of tasks like discovering new targets and simulating drugs and diseases mechanisms. Our system ranks No. 1 Out of 97 Methods in ProteinGym Benchmark, and is hosted by Chan Zuckerberg Initiatives as a Representative FM for Virtual Cell. We will soon release the agentic Virtual Cell Lab to the scientific community for simulative biological research and experiments. 3: K2 LLMs: including K2-v2 (ifm.ai/k2/) — world’s strongest fully open LLM in its class (70B), rivaling open-weight leaders and approaches the performance of models over three times its size, and K2-think (k2think.ai/k2think) — world’s fastest and most parameter-efficient reasoning LLM post-trained from K2-v2, both from the @llm360 initiative and from the IFM. In a world where most U.S. frontier models dominate performance, but remain completely closed, while Chinese open-weight systems occupy a large semi-open middle band, our K2 models represent an effort to better serve the AI community and the public users with truly open-source foundation models that are transparent, reproducible, and competitive, with a 360-open approach: making public not just model weights, but also training data, mid-training checkpoints, logs and methodology, and fine-tuning recipes. In K2-v2 (arxiv.org/abs/2512.06201), We actively infuse domain knowledge, reasoning, long-context, and tool use throughout the training process, which explicitly prepares the model for complex reasoning tasks after post-training. In K2-think (arxiv.org/abs/2509.07604), the key technical elements underlying the remarkable performance include: 1) long chain-of-thought supervised fine tuning, 2) reinforcement learning with verifiable rewards, 3) agentic planning before reasoning, 4) test-time scaling, 5) speculative decoding, and 6) inference optimized hardware. Our models punched above their weights and with their 360-degree transparency, directly address reproducibility, auditability, and governance the constraints that will define real-world deployment. As we say goodbye to 2025, I’d like to thank my collaborators, developers, and students from IFM, GenBio, MBZUAI, CMU for the wonderful collaboration. More to come in 2026, you will see bigger and more powerful K2 (LLM), PAN (WM), and AIDO releases, and more advancements in architectural and system work!

English

419

Yi Gu retweetledi

Hao AI Lab@haoailab·19 Ara

🔥CAD: Efficient Long-context Language Model Training by Core Attention Disaggregation Repo: github.com/hao-ai-lab/dis… Blog: hao-ai-lab.github.io/blogs/distca/ Training a long-context LLM model can suffer from severe workload imbalance caused by core-attention - the softmax(QK^T)V part. Core-attention disaggregation (CAD) fundamentally eliminates workload imbalance by disaggregating core-attention from the rest of the model.

GIF

English

251

89.9K

Yi Gu retweetledi

LLM360@llm360·5 Ara

To mark the 2nd anniversary of LLM360, we are proud to release K2-V2: a 70B reasoning-centric foundation model that delivers frontier capabilities. As a push for "360-open" transparency, we are releasing not only weights, but the full recipe: data composition, training code, logs, and intermediate checkpoints. About K2-V2: 🧠 70B params, reasoning-optimized 🧊 512K context window 🔓 "360-Open" (Data, Logs, Checkpoints) 📈 SOTA on olympiad math and complex logic puzzles

English

21.8K

Yi Gu retweetledi

Guangyi Liu@guangyi_l·15 Kas

🚀 Excited to announce the release of PAN, a general world model I’ve been working on for years. PAN can simulate physical, agentic, and nested worlds — generating infinite interactive experiences to train and evaluate AI agents. Check out demo: ifm.mbzuai.ac.ae/pan/ 👇

English

480

78.3K

Yi Gu retweetledi

DailyPapers@HuggingPapers·15 Kas

MBZUAI's PAN Team unveils PAN: a new world model This general, interactable, and long-horizon world model predicts future states through high-quality video simulation, conditioned on history and natural language actions.

English

108

8.5K

Yi Gu retweetledi

Zhiting Hu@ZhitingHu·14 Kas

🔥Really excited to see the release of PAN world model, a project I had been working over the past years. PAN is a general world model capable of simulating physical, agentic, and nested worlds, synthesizing infinite interactive experiences for training AI agents. Building on top of pretrained LLMs and video diffusion models, PAN connects language, perception, action, and latent thoughts, for long-horizon simulation and reasoning. PAN shows overwhelming performance gains over JEPA-2, Cosmos-2, and other prior models. More in the thread👇 ... 1/

English

240

31.2K

Yi Gu retweetledi

Jiannan Xiang@szxiangjn·14 Kas

🚀 Introducing PAN, our latest general world model. 💡 Compared to traditional video generation models like Sora 2, PAN simulates worlds you can interact with, over long horizons, with natural-language actions.

English

2.5K

Yi Gu retweetledi

Maitrix.org@MaitrixOrg·6 Şub

🤖Thrilled to introduce _ReasonerAgent_ - A fully open source, ready-to-run agent that does research🧐 in a web browser and answers your queries Use ReasonerAgent to help you: ✈️search for flights, 🛍️compile shopping options, 🗞️research news coverage, etc. 📘Check out more 👇 1/6

English

141

23.9K

Yi Gu retweetledi

Chuang Gan@gan_chuang·19 Ara

We’re excited to announce the official release of our Genesis Simulator! github.com/Genesis-Embodi… Since 2018, I decided to shift my research focus from vision to embodied AI, driven by a fascination with creating general-purpose agents capable of interacting with the physical world and other intelligent beings with human-like flexibility—a field we refer to as embodied AGI. The core of our approach is to reverse-engineer human mental models and build robotic brains driven by generative physics engines! I recognize that many roboticists are skeptical of this approach, pointing to the difficulties of setting up simulators and addressing the sim-to-real gap. They advocate for focusing solely on learning from real-world data instead. I understand the concerns, but I firmly believe that we cannot bypass physics simulators just because creating a good one is challenging! I feel fortunate to have met @zhou_xian_ two years ago, though I also feel a bit guilty for delaying his PhD graduation for so long 😀. Together with an incredible team of students and researchers who trust us, we've reached a significant milestone with this generative physical world! To be honest, there have been times when I felt this simulator might be too advanced to release, but we believe it's crucial to make it fully open-source and build a strong community around our mission! Please join the Genesis community! We hope to convince the robotics world that the "Generative Physics Simulator is all You Need!"

Zhou Xian@zhou_xian_

Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications. Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090 (see tutorial: genesis-world.readthedocs.io/en/latest/user…). The Genesis physics engine and simulation platform is fully open source at github.com/Genesis-Embodi…. We'll gradually roll out access to our generative framework in the near future. Genesis implements a unified simulation framework all from scratch, integrating a wide spectrum of state-of-the-art physics solvers, allowing simulation of the whole physical world in a virtual realm with the highest realism. We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications. Open Source Code: github.com/Genesis-Embodi… Project webpage: genesis-embodied-ai.github.io Documentation: genesis-world.readthedocs.io 1/n

English

416

62.2K

Yi Gu@YiGu025·15 Ara

@ShunyuYao12 Because computer scientists are lazy. They want to make their own life easier

English

132

Shunyu Yao@ShunyuYao12·15 Ara

isn’t it a miracle that most computer systems are mostly debuggable

English

4.5K

Yi Gu retweetledi

Maitrix.org@MaitrixOrg·26 Eki

How does @AnthropicAI's Claude 3.5 Sonnet (new) perform vs the old version and other models? Decentralized Arena did fine-grained evaluation. Here are some takeaways 👉 1⃣ Claude 3.5 Sonnet (new) improves over its old version, especially in Math (Algebra, Geometry, Probability). 🧮 2⃣ Interestingly, 3.5 Sonnet (new) lags behind its old version in chat dimensions like MT-Bench. We found a possible reason is 3.5 Sonnet (new) tends to produce shorter responses on these queries. 🗣️ 3⃣ 3.5 Sonnet (new) generally falls behind ChatGPT-4o Check out more results on the De-Arena leaderboards on @huggingface: huggingface.co/spaces/LLM360/…

English

2.2K

Yi Gu retweetledi

Maitrix.org@MaitrixOrg·10 Eki

With @llm360, we're thrilled to release ⚔️Decentralized Arena built on Collective LLM Intelligence 🤖🤖 An automated "Chatbot Arena" for democratic LLM benchmarking on any fine-grained dimensions, e.g.: 🧮(Math) algebra, geometry, probability ... 💡(Reasoning) symbolic, social ... 🧪(Science) biology, chemistry ... ⌨️(Coding) python, C++, SQL ... ❓any other dims you care about! 🔥95% correlation with Chatbot Arena in "Overall" dimension (on 50+ models) Key idea: 👉Chatbot Arena ⬅️Wisdom of the human crowds 👉Decentralized Arena ⬅️Wisdom of the LLM crowds With @llm360, we're working to make Decentralized Arena fully transparent and reproducible Blog: de-arena.maitrix.org Leaderboard: huggingface.co/spaces/LLM360/… More interesting results: 1/

GIF

English

23.6K

Yi Gu retweetledi

LLM360@llm360·7 Eki

📢📢 We are releasing TxT360: a globally deduplicated dataset for LLM pretraining 🌐 99 Common Crawls 📘 14 Curated Sources 👨‍🍳 recipe to easily adjust data weighting and train the most performant models Dataset: huggingface.co/datasets/LLM36… Blog: huggingface.co/spaces/LLM360/…

English

246

52.2K

Yi Gu retweetledi

LLM360@llm360·26 Eyl

🎉🎉🎉 We are excited to share that the LLM360 team had two papers accepted to COLM! Crystal: Illuminating LLM Abilities on Language and Code LLM360: Towards Fully Transparent Open-Source LLMs The LLM360 team will be @COLM_conf next week - excited to see everyone there!

English

5.2K

Yi Gu retweetledi

Maitrix.org@MaitrixOrg·16 Ağu

🥳Our work on Multi-Model Theory of Mind evaluation won the #ACL2024 Outstanding Paper Award! How well can machines 🤖 form a coherent mental picture🧠of humans from vision-language observations? Can machines understand humans' goals and beliefs? Our MMToM-QA shows models like GPT-4v are still limited, achieving a score of only 40, far behind the human score of 93. Check out the benchmark, leaderloader, and analyses: chuanyangjin.com/mmtom-qa Congrats @chuanyang_jin @tianminshu Check out more exciting projects by @MaitrixOrg: maitrix.org

Chuanyang Jin@chuanyang_jin

Can machines understand people’s minds from multimodal inputs? We introduce a comprehensive benchmark: “MMToM-QA: Multimodal Theory of Mind Question Answering” 📜 arxiv.org/abs/2401.08743

English

11.2K

Yi Gu@YiGu025·6 Haz

@ShunyuYao12 Caption a looot of video data

English

146

Shunyu Yao@ShunyuYao12·6 Haz

If u have a lot of llm api quota expiring today, whats the most scalable and useful way to use it?

English

3.8K

Yi Gu retweetledi

LLM360@llm360·29 May

Please welcome K2-65B🏔️, the most performant fully-open LLM released to date. As a blueprint for open-source AGI, we release all model checkpoints, code, logs, and data. About K2: 🧠65 billion parameters 🪟Fully transparent & reproducible 🔓Apache 2.0 📈Outperforms Llama 2 70B

English

135

469

138.4K

Keşfet

@CarnegieMellon @zhou_xian_ @ShunyuYao12 @AnthropicAI @huggingface @llm360 @COLM_conf @chuanyang_jin