Mingkai Deng

141 posts

Mingkai Deng banner
Mingkai Deng

Mingkai Deng

@mdeng34

PhD student @LTIatCMU | MSML @mldcmu | BA Math-Stats + CS @Columbia | Working on agent models and world models

Pittsburgh, USA Katılım Eylül 2016
323 Takip Edilen647 Takipçiler
Sabitlenmiş Tweet
Mingkai Deng
Mingkai Deng@mdeng34·
Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.
Mingkai Deng tweet media
English
3
41
243
49.3K
Mingkai Deng
Mingkai Deng@mdeng34·
Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.
Mingkai Deng tweet media
English
3
41
243
49.3K
Mingkai Deng
Mingkai Deng@mdeng34·
We agree with LeCun that world model is the way to AGI/ASI. However, we have different ideas on how WMs should be built and used: 1. LLMs are not doomed; it’s one instance of a world model in language space 2. Generative modeling is great; it provides powerful supervision with minimal assumptions — more bitter lesson pilled 3. MPC is not all you need for agents; you need to self-regulate planning (itself an action for WM), use WM for learning policies, and more More details in our paper “Critiques of World Models”: arxiv.org/abs/2507.05169
English
0
3
13
472
Mingkai Deng retweetledi
Mingkai Deng retweetledi
Mingkai Deng retweetledi
Mingkai Deng retweetledi
Han Guo
Han Guo@HanGuo97·
LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes them to hide in the matmul’s shadow, fused into its epilogue before results leave the chip. Bonus: LLMs can write fast CODA kernels too (approaching SoLs).
Han Guo tweet media
English
15
100
664
185.6K
Mingkai Deng retweetledi
Mingkai Deng
Mingkai Deng@mdeng34·
This is a prototype using language-based world models. Stay tuned for our next steps on multimodal and physical world models. The concept of a configurator, which decides when and how deeply to engage a reasoning process, is not specific to planning, but extensible to learning and adaptation going forward. 📄 SR²AM: arxiv.org/abs/2605.22138 📄 SiRA: arxiv.org/abs/2507.23773 🌐 Project: sailing-lab.github.io/sr2am-self-reg… 💻 Code: github.com/sailing-lab/sr… 🤗 SR²AM-v0.1-8B: huggingface.co/sailing-lab/SR… 🤗 SR²AM-v1.0-30B: huggingface.co/sailing-lab/SR… Joint work with @jinyuhou0, @larasnevess, @varad0309, @tw_killian, @waterluffy, @ericxing
English
1
9
50
3.6K
Mingkai Deng
Mingkai Deng@mdeng34·
How does self-regulated simulative reasoning perform in practice? SR²AM-v0.1-8B achieves results competitive with GPT-OSS (120B) and GLM-4.6 (355B). SR²AM-v1.0-30B is competitive with DeepSeek-V3.2 (685B) and Kimi-K2.5 (1T) at 𝟮𝟲–𝟵𝟱% fewer reasoning tokens than comparable 30/32B agentic LLMs. The key finding from RL training: the model learns to plan further ahead (+22.8% horizon) rather than more often (+2% frequency). Allocation, not compression.
Mingkai Deng tweet media
English
1
4
32
2.1K
Mingkai Deng retweetledi
Hector Liu
Hector Liu@waterluffy·
We have started the IFM workshop featuring our work on LLMs and World Models. We are closing the final one before summer at Stanford. If you are interested in how modern LLMs are made, our open source projects is one of the best resources you can find ifm.ai
Institute of Foundation Models@IFM_MBZUAI

The Institute of Foundation Models is coming to Stanford with the team behind K2 Think and PAN. On May 21, IFM is hosting its first Stanford event on how foundation models move from research to real systems, with a deep dive into IFM’s reasoning and world models.

English
0
3
13
2.1K
Mingkai Deng
Mingkai Deng@mdeng34·
Really interesting post -- agreed that our goals should be physical AGI, and goal-driven beats idea-driven. Though we see it differently on a couple of things: 1. If you pick 99% + 1 hour of demonstrated task data as your success criteria, world model will surely look unnecessary. But physical AGI is about dealing with situations you **cannot** demonstrate ahead of time. This is not a methods debate, but a goal debate. World model solves this problem by simulating possible outcomes and generating synthetic experience for unseen tasks. 2. One useful analogy: LLMs aren't strong just because of post-training RL. Self-supervised pretraining is arguably the source of its intelligence. World models play the same role for physical AI -- they're not a training trick you can skip with more data, but an indispensable component for understanding and reasoning. 3. Language is not just a "crutch while we don't have enough robotics data" -- it encodes institutions, social norms, and mental states that physical interaction data can't capture efficiently, regardless of scale. This is what led us to the GLP (Generative Latent Prediction) world model architecture. It includes an enhanced LLM dynamics backbone and mixed continuous/discrete latent states. Language and physical commonsense aren't A or B, but complementary abstractions the world model should unify. PAN, a world model built on GLP, is trained on internet data but already enables open-domain action simulation that transfers to robotic policies. More on GLP: arxiv.org/abs/2507.05169 More on PAN: arxiv.org/abs/2511.09057
English
0
1
2
173
Mingkai Deng retweetledi
Jinyu Hou
Jinyu Hou@jinyuhou0·
Really thought-provoking post — the goal-driven vs. idea-driven distinction resonates a lot. It got me wonder though: perhaps goal-driven research doesn't have to be agnostic about world model? In our recent work (arxiv.org/abs/2507.05169), we argue that the field has gotten too focused on world models as video generators, when their real value should be as reasoning engines — specifically, simulating counterfactual action outcomes to enable planning, which seems closely aligned with the zero-shot physical AGI goal outlined here. These aren't mutually exclusive with data scaling — if anything, a good world model should amplify the value of the data you already have by enabling generalization beyond its empirical coverage. Would love to hear your thoughts.
English
0
1
2
164
Mingkai Deng
Mingkai Deng@mdeng34·
Grateful to Profs @daphneipp and @841io for inviting me to give a guest lecture at @CarnegieMellon today on world models and related directions in AI. I appreciated the thoughtful questions and the chance to share some ideas from ongoing work.
English
0
1
9
944
Mingkai Deng
Mingkai Deng@mdeng34·
@DrJimFan Totally agreed. Our position paper last year, Critiques of World Models, also discussed this: arxiv.org/abs/2507.05169 We argued that WMs will be the next-gen engine for simulative reasoning and learning, and proposed the Generative Latent Prediction (GLP) architecture for WMs
English
0
0
0
46
Mingkai Deng retweetledi
LLM360
LLM360@llm360·
To mark the 2nd anniversary of LLM360, we are proud to release K2-V2: a 70B reasoning-centric foundation model that delivers frontier capabilities. As a push for "360-open" transparency, we are releasing not only weights, but the full recipe: data composition, training code, logs, and intermediate checkpoints. About K2-V2: 🧠 70B params, reasoning-optimized 🧊 512K context window 🔓 "360-Open" (Data, Logs, Checkpoints) 📈 SOTA on olympiad math and complex logic puzzles
LLM360 tweet media
English
2
25
55
21.8K
Mingkai Deng retweetledi
Eric Xing
Eric Xing@ericxing·
Now you have an alternative to the super popular but unfortunately not so transparent (you have no idea how it was trained, what data was used, is it safe …) base LLMs such as Qwen 2.5 or 3, to build your own reasoning or general purpose LLMs through post-train, SFT, RL, etc. It is 360-open and reproducible.
MBZUAI@mbzuai

Today, we are releasing a new version of K2 (K2-V2), a 360-open LLM built from scratch as a superior base for reasoning adaptation, while still excelling at core LLM capabilities like conversation, knowledge retrieval, and long-context understanding. K2 fills a major gap: highly capable models with no transparency. Instead of releasing only weights, we’re sharing the full training story — dataset recipes, mid-training checkpoints, logs, code, and evaluation tools. That’s 360-open. What’s inside: • 70B dense transformer engineered as a reasoning-enhanced base model • Native 512K context (extendable via RoPE scaling) • Mid-training reasoning phase • Strong tool-use scaffolding What we’re open-sourcing: • 250M+ reasoning traces (math, planning, multi-step logic) • Full pre- & mid-training data compositions • All mid-training checkpoints • Training logs, code, Eval360 Performance: • GPQA-Diamond: 55.1% mid-training → 69.3% after SFT (strongest fully open 70B model) • KK-8 Logic Puzzles: 83% — competitive with DeepSeek-R1 & OpenAI o3-mini-high • ArenaHard V2: 62.1% — close to Qwen3 235B • Outperforms Qwen2.5-72B and approaches Qwen3-235B despite being smaller and fully transparent. 🔗 The Model: bit.ly/3KIYwuo 🔗Technical Report: bit.ly/49V8h2U 🔗Blog: bit.ly/49V7gb6

English
1
11
44
9.4K