Mingkai Deng

0

3

351

fronx 🐙✨@fronxer·22h

@mdeng34 @jinyuhou0 @larasnevess @varad0309 @tw_killian @waterluffy @ericxing This vaguely reminds me of active inference (Friston). Is the similarity superficial?

English

0

419

Mingkai Deng@mdeng34·1d

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

English

3

41

243

49.3K

Mingkai Deng@mdeng34·22h

We agree with LeCun that world model is the way to AGI/ASI. However, we have different ideas on how WMs should be built and used: 1. LLMs are not doomed; it’s one instance of a world model in language space 2. Generative modeling is great; it provides powerful supervision with minimal assumptions — more bitter lesson pilled 3. MPC is not all you need for agents; you need to self-regulate planning (itself an action for WM), use WM for learning policies, and more More details in our paper “Critiques of World Models”: arxiv.org/abs/2507.05169

English

3

13

472

davinci@leothecurious·1d

just another lecun win?

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

English

2

6

123

24.1K

Mingkai Deng@mdeng34·1d

@codecroc @jinyuhou0 @larasnevess @varad0309 @tw_killian @waterluffy @ericxing Exactly! Thank you for the apt summary

English

0

1

633

Rahul Chavan@codecroc·1d

@mdeng34 @jinyuhou0 @larasnevess @varad0309 @tw_killian @waterluffy @ericxing siRA showing up to 124% improvement over reactive baselines suggests the bottleneck was never pure generation quality. it was the lack of consequence modeling before action selection.

English

0

8

753

Mingkai Deng retweetledi

Lara Sá Neves@larasnevess·1d

SR²AM is out! Thinking longer ≠ thinking smarter. SR²AM knows which one it needs. A configurator regulates internal simulation: when to predict future states, how far, and when to skip. Result: 30B competing with 685B–1T at a fraction of the token cost. Model and code available

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

English

6

16

2.9K

Mingkai Deng retweetledi

Rishi Malhotra@ithinkimrishi·1d

Interesting paper from IFM on agentic reasoning

This is a prototype using language-based world models. Stay tuned for our next steps on multimodal and physical world models. The concept of a configurator, which decides when and how deeply to engage a reasoning process, is not specific to planning, but extensible to learning and adaptation going forward. 📄 SR²AM: arxiv.org/abs/2605.22138 📄 SiRA: arxiv.org/abs/2507.23773 🌐 Project: sailing-lab.github.io/sr2am-self-reg… 💻 Code: github.com/sailing-lab/sr… 🤗 SR²AM-v0.1-8B: huggingface.co/sailing-lab/SR… 🤗 SR²AM-v1.0-30B: huggingface.co/sailing-lab/SR… Joint work with @jinyuhou0, @larasnevess, @varad0309, @tw_killian, @waterluffy, @ericxing

English

1

5

972

Mingkai Deng retweetledi

Taylor W. Killian@tw_killian·1d

New work led by the inimitable @mdeng34 and @jinyuhou0. We took a fair bit of time thinking about whether an agent can assess how much effort it needs to spend on thinking through the problems it is presented. The resulting algorithm is one step to a fully adaptive future!

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

English

2

7

1.1K

Mingkai Deng retweetledi

Han Guo@HanGuo97·2d

LLM training is built on fast MatMuls. But many surrounding ops still run as memory-bound kernels. CODA reparameterizes them to hide in the matmul’s shadow, fused into its epilogue before results leave the chip. Bonus: LLMs can write fast CODA kernels too (approaching SoLs).

English

15

100

664

185.6K

Mingkai Deng retweetledi

Jinyu Hou@jinyuhou0·1d

On popular benchmarks, our 30B model matches systems 20-30x its size (gpt-5.4-xhigh, DeepSeek-V3.2, Kimi-K2.5), while using up to 95% fewer reasoning tokens than comparable 30/32B agentic LLMs. The trick: don't just reason less, reason about the right things. A learned configurator decides when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is an allocation problem, not a compression problem. Model and code are openly available.

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

English

2

22

179

16.9K

Mingkai Deng@mdeng34·1d

This is a prototype using language-based world models. Stay tuned for our next steps on multimodal and physical world models. The concept of a configurator, which decides when and how deeply to engage a reasoning process, is not specific to planning, but extensible to learning and adaptation going forward. 📄 SR²AM: arxiv.org/abs/2605.22138 📄 SiRA: arxiv.org/abs/2507.23773 🌐 Project: sailing-lab.github.io/sr2am-self-reg… 💻 Code: github.com/sailing-lab/sr… 🤗 SR²AM-v0.1-8B: huggingface.co/sailing-lab/SR… 🤗 SR²AM-v1.0-30B: huggingface.co/sailing-lab/SR… Joint work with @jinyuhou0, @larasnevess, @varad0309, @tw_killian, @waterluffy, @ericxing

English

9

50

3.6K

Mingkai Deng@mdeng34·1d

How does self-regulated simulative reasoning perform in practice? SR²AM-v0.1-8B achieves results competitive with GPT-OSS (120B) and GLM-4.6 (355B). SR²AM-v1.0-30B is competitive with DeepSeek-V3.2 (685B) and Kimi-K2.5 (1T) at 𝟮𝟲–𝟵𝟱% fewer reasoning tokens than comparable 30/32B agentic LLMs. The key finding from RL training: the model learns to plan further ahead (+22.8% horizon) rather than more often (+2% frequency). Allocation, not compression.

English

Institute of Foundation Models@IFM_MBZUAI

4

32

2.1K

Mingkai Deng retweetledi

Hector Liu@waterluffy·5d

We have started the IFM workshop featuring our work on LLMs and World Models. We are closing the final one before summer at Stanford. If you are interested in how modern LLMs are made, our open source projects is one of the best resources you can find ifm.ai

The Institute of Foundation Models is coming to Stanford with the team behind K2 Think and PAN. On May 21, IFM is hosting its first Stanford event on how foundation models move from research to real systems, with a deep dive into IFM’s reasoning and world models.

English

3

13

2.1K

Mingkai Deng@mdeng34·11 Nis

Really interesting post -- agreed that our goals should be physical AGI, and goal-driven beats idea-driven. Though we see it differently on a couple of things: 1. If you pick 99% + 1 hour of demonstrated task data as your success criteria, world model will surely look unnecessary. But physical AGI is about dealing with situations you **cannot** demonstrate ahead of time. This is not a methods debate, but a goal debate. World model solves this problem by simulating possible outcomes and generating synthetic experience for unseen tasks. 2. One useful analogy: LLMs aren't strong just because of post-training RL. Self-supervised pretraining is arguably the source of its intelligence. World models play the same role for physical AI -- they're not a training trick you can skip with more data, but an indispensable component for understanding and reasoning. 3. Language is not just a "crutch while we don't have enough robotics data" -- it encodes institutions, social norms, and mental states that physical interaction data can't capture efficiently, regardless of scale. This is what led us to the GLP (Generative Latent Prediction) world model architecture. It includes an enhanced LLM dynamics backbone and mixed continuous/discrete latent states. Language and physical commonsense aren't A or B, but complementary abstractions the world model should unify. PAN, a world model built on GLP, is trained on internet data but already enables open-domain action simulation that transfers to robotic policies. More on GLP: arxiv.org/abs/2507.05169 More on PAN: arxiv.org/abs/2511.09057

English

1

2

173

Pete Florence@peteflorence·7 Nis

x.com/i/article/2041…

ZXX

43

161

1.1K

324.6K

Mingkai Deng retweetledi

Jinyu Hou@jinyuhou0·11 Nis

Really thought-provoking post — the goal-driven vs. idea-driven distinction resonates a lot. It got me wonder though: perhaps goal-driven research doesn't have to be agnostic about world model? In our recent work (arxiv.org/abs/2507.05169), we argue that the field has gotten too focused on world models as video generators, when their real value should be as reasoning engines — specifically, simulating counterfactual action outcomes to enable planning, which seems closely aligned with the zero-shot physical AGI goal outlined here. These aren't mutually exclusive with data scaling — if anything, a good world model should amplify the value of the data you already have by enabling generalization beyond its empirical coverage. Would love to hear your thoughts.

English

1

2

164

Mingkai Deng@mdeng34·27 Mar

Grateful to Profs @daphneipp and @841io for inviting me to give a guest lecture at @CarnegieMellon today on world models and related directions in AI. I appreciated the thoughtful questions and the chance to share some ideas from ongoing work.

English

1

9

944

Mingkai Deng@mdeng34·6 Şub

@DrJimFan Totally agreed. Our position paper last year, Critiques of World Models, also discussed this: arxiv.org/abs/2507.05169 We argued that WMs will be the next-gen engine for simulative reasoning and learning, and proposed the Generative Latent Prediction (GLP) architecture for WMs

English

Today, we are releasing a new version of K2 (K2-V2), a 360-open LLM built from scratch as a superior base for reasoning adaptation, while still excelling at core LLM capabilities like conversation, knowledge retrieval, and long-context understanding. K2 fills a major gap: highly capable models with no transparency. Instead of releasing only weights, we’re sharing the full training story — dataset recipes, mid-training checkpoints, logs, code, and evaluation tools. That’s 360-open. What’s inside: • 70B dense transformer engineered as a reasoning-enhanced base model • Native 512K context (extendable via RoPE scaling) • Mid-training reasoning phase • Strong tool-use scaffolding What we’re open-sourcing: • 250M+ reasoning traces (math, planning, multi-step logic) • Full pre- & mid-training data compositions • All mid-training checkpoints • Training logs, code, Eval360 Performance: • GPQA-Diamond: 55.1% mid-training → 69.3% after SFT (strongest fully open 70B model) • KK-8 Logic Puzzles: 83% — competitive with DeepSeek-R1 & OpenAI o3-mini-high • ArenaHard V2: 62.1% — close to Qwen3 235B • Outperforms Qwen2.5-72B and approaches Qwen3-235B despite being smaller and fully transparent. 🔗 The Model: bit.ly/3KIYwuo 🔗Technical Report: bit.ly/49V8h2U 🔗Blog: bit.ly/49V7gb6

46

Jim Fan@DrJimFan·3 Şub

x.com/i/article/2018…

ZXX

151

414

2.6K

660.2K

Mingkai Deng retweetledi

LLM360@llm360·5 Ara

To mark the 2nd anniversary of LLM360, we are proud to release K2-V2: a 70B reasoning-centric foundation model that delivers frontier capabilities. As a push for "360-open" transparency, we are releasing not only weights, but the full recipe: data composition, training code, logs, and intermediate checkpoints. About K2-V2: 🧠 70B params, reasoning-optimized 🧊 512K context window 🔓 "360-Open" (Data, Logs, Checkpoints) 📈 SOTA on olympiad math and complex logic puzzles

English

2

25

55

21.8K

Mingkai Deng retweetledi

Eric Xing@ericxing·5 Ara

Now you have an alternative to the super popular but unfortunately not so transparent (you have no idea how it was trained, what data was used, is it safe …) base LLMs such as Qwen 2.5 or 3, to build your own reasoning or general purpose LLMs through post-train, SFT, RL, etc. It is 360-open and reproducible.

MBZUAI@mbzuai

English