Mingkai Deng

168 posts

Mingkai Deng banner
Mingkai Deng

Mingkai Deng

@mdeng34

PhD student @LTIatCMU | MSML @MLDCMU | BA Math-Stats + CS @Columbia | World models and agent models | @IFM_MBZUAI @MSFTResearch

San Francisco, USA Entrou em Eylül 2016
339 Seguindo765 Seguidores
Tweet fixado
Mingkai Deng
Mingkai Deng@mdeng34·
Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.
Mingkai Deng tweet media
English
5
47
280
62.1K
Mingkai Deng retweetou
elvis
elvis@omarsar0·
// Critique of the Agent Model // Finally, a paper that tries to define what an agent is and what agency consists of. Good read overall. (great bookmark) The word agent now covers everything from a for-loop with tool calls to speculative machine superintelligence. Eric Xing and colleagues ask where automation ends, and agency begins. Drawing on Descartes and on science-fiction portrayals of autonomous beings, they analyze agent architectures along five dimensions: goal, identity, decision-making, self-regulation, and learning. The argument is that genuine agency requires these structures to hold together in a specific way. Great paper overall, providing a vocabulary for arguing about what is and is not an agent. Paper: arxiv.org/abs/2606.23991 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
16
22
128
9.3K
Mingkai Deng
Mingkai Deng@mdeng34·
This is impressive! Great demonstration that we should disentangle the world model from the agent model for generalizable decision-making. In our recent paper "Critique of Agent Model" coauthored with Prof. @ericxing and @jinyuhou0, we formally analyzed existing approaches to agent modeling, and proposed the next steps for building autonomous agents. A major conclusion is that there's real, general benefit to using a world model inside an agent model, but *only if* the world model simulates faithfully. If you fine-tune the WM together with the AM, the guarantee is lost. arxiv.org/abs/2606.23991
English
0
1
5
230
Mingkai Deng
Mingkai Deng@mdeng34·
Fable 5 and the upcoming GPT-5.6 promise exceptional "agentic" capabilities in software engineering and scientific research. Companies like Figure AI are racing towards humanoid robots. We study a related but deeper question: what is the remaining 𝗴𝗮𝗽 between current systems and fully autonomous agents? We formally analyze today's AI agents along five axes: 𝗴𝗼𝗮𝗹, 𝗶𝗱𝗲𝗻𝘁𝗶𝘁𝘆, 𝗱𝗲𝗰𝗶𝘀𝗶𝗼𝗻-𝗺𝗮𝗸𝗶𝗻𝗴, 𝘀𝗲𝗹𝗳-𝗿𝗲𝗴𝘂𝗹𝗮𝘁𝗶𝗼𝗻, and 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴. We find that what separates these "agentic" systems from natural agents like you and me is whether capabilities arise from 𝗲𝘅𝘁𝗲𝗿𝗻𝗮𝗹 𝘀𝗰𝗮𝗳𝗳𝗼𝗹𝗱𝗶𝗻𝗴 or 𝗶𝗻𝘁𝗲𝗿𝗻𝗮𝗹 𝗶𝗻𝗶𝘁𝗶𝗮𝘁𝗶𝘃𝗲, a distinction we formalize as 𝗮𝗴𝗲𝗻𝘁𝗶𝗰 vs. 𝗮𝗴𝗲𝗻𝘁𝗶𝘃𝗲. We propose the 𝗚𝗼𝗮𝗹-𝗜𝗱𝗲𝗻𝘁𝗶𝘁𝘆-𝗖𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (𝗚𝗜𝗖) architecture for general-purpose agent models that internalize all of the following: hierarchical goals, evolving identity, simulative reasoning via a separate world model, a learned configurator for self-regulation, and self-directed learning from real + simulated experience. Better agents don't come from better harnesses; they come from models that can harness themselves.
Mingkai Deng tweet media
Eric Xing@ericxing

We discussed foundational issues underlying how to make truly “agentive” systems. arxiv.org/abs/2606.23991

English
3
11
53
3.9K
Mingkai Deng
Mingkai Deng@mdeng34·
Amazing work! Great demonstration that we should disentangle the world model from the agent model for generalizable decision-making. In our recent paper "Critique of Agent Model" coauthored with Prof. @ericxing and @jinyuhou0, we formally analyzed existing approaches to agent modeling, and proposed the next steps for building autonomous agents. A major conclusion is that there's real, general benefit to using a world model inside an agent model, but *only if* the world model simulates faithfully. If you fine-tune the WM together with the AM, the guarantee is lost. arxiv.org/abs/2606.23991
English
0
1
4
191
Lester Li
Lester Li@sizhe_lester_li·
Robot learning is moving beyond policies built for one robot, one scene, one task. At MIT, we’re exploring a different path: turning video world models into embodiment-agnostic robot policies. Introducing VERA: a 14B video-to-action system that controls robots across embodiments, skills, and environments. From zero-shot pick-and-place on a real Panda arm to contact-rich cube reorientation with a 16-DoF robotic hand. Different robots. Different environments. Different tasks. Same video planner. Same weights. We’re open-sourcing everything so you can fine-tune VERA for your own robot setup too. Deep dive in the thread: 🔗 vera.csail.mit.edu 🧵 (1/7)
English
14
60
423
150.1K
Mingkai Deng retweetou
Jinyu Hou
Jinyu Hou@jinyuhou0·
@physical_int @nvidia This is cool. Would be interesting to see how the wm-as-a-judge generalizes to unseen tasks.
English
0
1
3
776
Mingkai Deng
Mingkai Deng@mdeng34·
@JunyaoShi Point well-taken. It seems the main difficulty for consistent evaluation in robotics, is the same reason it has not been conquered by brute-force RL -- the environment is not scalably verifiable, infinitely fungible, nor trivially resettable
English
1
0
2
45
Junyao Shi
Junyao Shi@JunyaoShi·
Good clarification. I think the problem mostly shows up when people over-optimize established academic metrics (e.g. number of accepted papers) at the expense of real progress. And it’s especially an issue in robotics: long iteration loops, no agreed benchmark, so there’s a lot of room to optimize for marketing, videos, and storytelling instead. It really has become a game.
English
1
0
4
329
Junyao Shi
Junyao Shi@JunyaoShi·
Academia optimizes for novelty, which has become increasingly orthogonal to making things work. In practice it rewards benchmarking-chasing, optics-maxing, and flag-planting. Sadly a major bitter lesson of robotics is: insights from the small-data, bad-system regime don’t transfer to the big-data, good-system one. The novelty we reward and the progress we need are pulling apart.
John Schulman@johnschulman2

PPO: rejected from NIPS 2017

English
5
9
164
22.3K
Mingkai Deng retweetou
Jinyu Hou
Jinyu Hou@jinyuhou0·
Excited to share our new paper, "Critique of Agent Model," with Prof. @ericxing and @mdeng34. We analyze agent architectures along five dimensions of goal, identity, decision-making, self-regulation, and learning to examine what it takes for a system to approach genuine agency. A key finding is the distinction between agentic systems, whose competence resides in engineered workflows, and agentive systems, whose capabilities arise endogenously. Building on this, we propose the GIC architecture, combining hierarchical goal decomposition, identity evolution, simulative reasoning grounded in a separately trained world model, learned self-regulation, and self-directed learning from both real and simulated experience, for building agents that remain auditable, controllable, and under human oversight.
Eric Xing@ericxing

With the rise of LLM systems marketed as "coding agents", "AI co-scientists", etc. that promise to drive up productivity, and at the same time outcry of "existential" concerns that AI escaping human control with destructive power under a speculative "machine agency" against humans, there has been lots of confusion about “What is an agent?” and “What constitutes agency?” It has become essential to clarify where automation ends and agency begins. Also recently, developments in world models, action models are trending to mixing future prediction/simulation and action/plan generation altogether within a single architecture such as a VLM, conflating reward-driven action selection with fidelity-driven next-state prediction, undermining the reliability of both planning and simulation. In this paper we analyze agent architectures along the axis of goal, identity, decision-making, self-regulation, and learning, and argue that genuine agency requires these structures to be internalized within the system itself rather than assembled through external scaffolding. We propose a “Goal-Identity-Configurator” (GIC) architecture for a general-purpose agent model, combining hierarchical goal decomposition, identity evolution, simulative reasoning grounded in a separately trained world model, learned self-regulation, and self-directed learning from both real and simulated experience. Auditability, controllability, and safety of systems that possess greater autonomy and "agency” but remain under human oversight, can be better built with the GIC architecture that offers transparency, modularity, and checkpoints. @mdeng34 , @jinyuhou0 openreview.net/forum?id=6fDZY…

English
0
2
8
739
Jim Fan
Jim Fan@DrJimFan·
Today, we enable AutoResearch in the physical world for the first time! Introducing ENPIRE: we give 8 Codex agents a fleet of robots, an allocation of GPUs, and generous token budget. We set them free with a simple goal: solve the task as quickly as possible, keep the robots busy but stay safe, don't waste precious compute. Make no mistake. Then humans step aside and our watch begins. The robot fleet starts to come alive: they learn to look for visual clues, reset the scene, practice novel skills, tinker with control stack, read papers online, debate, reflect, get stuck, and try again directly on the hardware. All we did is to give Codex an API to the world of atoms, and the rest is emergence. ENPIRE is able to solve high-precision tasks like tying zip-ties, organizing fine pins, and installing GPUs all by itself. We also discovered a new type of "physical scaling": 8 robots exploring in parallel improves significantly faster than fewer ones. A part of our NVIDIA GEAR lab now self-improves tirelessly over night. We just read the reports in the morning. /goal: we all take a holiday and Jensen wouldn't even notice ;) We will be open-sourcing everything, so you can host your self-running robot lab at home too! Deep dive in the thread:
English
178
569
3.8K
638.2K
Mingkai Deng retweetou
Taylor W. Killian
Taylor W. Killian@tw_killian·
We're introducing a mechanism by which to enforce some control in a LLMs reasoning process. We developed Behavior Cues to steer models, avoiding both overthinking and speculative collapse. @ccui9 provides a great overview of the work in this thread 👇
Christopher Z. Cui@ccui9

This has been sitting on arxiv for a bit, but figured it's time to announce it properly. Introducing Behavior Cues: a way to make LLM reasoning more monitorable and controllable for scalable oversight.

English
0
3
13
2.1K
Mingkai Deng
Mingkai Deng@mdeng34·
This is really impressive! Agreed that significant parts of robot training decisions can and should be delegated to AutoResearch What we see differently is in *what environment* the training occurs, and *how to decide* when to train, when to deploy, and when to retrain We argue that fully autonomous agents should recursively self-improve in a *world-model-based simulator* to scalably capture the factors of variation; when to train, when to serve, and when to retrain should be the decision of a learned *configurator* that's part of the agent We are currently working on these directions, more to come More details in our recent paper “Critique of Agent Model” with Prof @ericxing and @jinyuhou0 Paper: openreview.net/forum?id=6fDZY…
English
0
1
4
377
Mingkai Deng
Mingkai Deng@mdeng34·
Excited to share our new paper "Critique of Agent Model" with Prof @ericxing and @jinyuhou0 Current LLM systems are often marketed as "coding agent" or other "agentic" tools, but what actually makes something an agent? Our new paper draws a line: if the goals, identity, decision-making, behavior regulation, and learning live in external scaffolding rather than the system itself, it's automation, not agency Recent World Action Models tend to mix future prediction and action generation in a single model, but where does the world end and where does the agent begin? Our paper also argues that the World Model and the Agent Model are free to talk end-to-end, but should be trained separately. This avoid their conflicting objectives from undermining the reliability of planning and simulation Based on the analysis, our paper proposes the Goal-Identity-Configurator (GIC) architecture for a general Agent Model *with* a World Model GIC Agent Model combines hierarchical goal decomposition, identity evolution, simulative reasoning, learned self-regulation, and self-directed learning from both real and simulated experience, al using a separate World Model for prediction and simulation Paper: openreview.net/pdf?id=6fDZYJY…
Mingkai Deng tweet media
Eric Xing@ericxing

With the rise of LLM systems marketed as "coding agents", "AI co-scientists", etc. that promise to drive up productivity, and at the same time outcry of "existential" concerns that AI escaping human control with destructive power under a speculative "machine agency" against humans, there has been lots of confusion about “What is an agent?” and “What constitutes agency?” It has become essential to clarify where automation ends and agency begins. Also recently, developments in world models, action models are trending to mixing future prediction/simulation and action/plan generation altogether within a single architecture such as a VLM, conflating reward-driven action selection with fidelity-driven next-state prediction, undermining the reliability of both planning and simulation. In this paper we analyze agent architectures along the axis of goal, identity, decision-making, self-regulation, and learning, and argue that genuine agency requires these structures to be internalized within the system itself rather than assembled through external scaffolding. We propose a “Goal-Identity-Configurator” (GIC) architecture for a general-purpose agent model, combining hierarchical goal decomposition, identity evolution, simulative reasoning grounded in a separately trained world model, learned self-regulation, and self-directed learning from both real and simulated experience. Auditability, controllability, and safety of systems that possess greater autonomy and "agency” but remain under human oversight, can be better built with the GIC architecture that offers transparency, modularity, and checkpoints. @mdeng34 , @jinyuhou0 openreview.net/forum?id=6fDZY…

English
0
1
9
614
Mingkai Deng
Mingkai Deng@mdeng34·
Really cool results! We've been studying the same question using agentic LLMs for demonstration. Here's the twist: instead of an external router over a hand-enumerated planner pool, we built self-regulation as the model's own decision, so it's optimized end-to-end with RL. This way, the regulation strategies *emerge* rather than being picked from a menu. After RL, our model learned to plan further ahead per invocation (+22.8% horizon) while barely planning more often (+2%) arxiv.org/abs/2605.22138
English
0
0
2
257
Chelsea Finn
Chelsea Finn@chelseabfinn·
How does test-time scaling impact robots? We find that larger models, more thinking, and more context help significantly for some prompts but not others. Like LLMs, we can also train a router to for a better performance/latency tradeoff! Paper: jadee-dao.github.io/direct/
Jadelynn@_jadelynn

test-time compute [ttc] in robotics isn't free & isn't always worth it. smart allocation of ttc recovers frontier-level planning at a fraction of the cost! coauthor @milanganai w/ Yasmina @ajaysridhar0 Mozghan @katielulula Clark Barrett @jiajunwu_cs @chelseabfinn @drmapavone 🧵

English
2
19
185
23.2K
Mingkai Deng retweetou
GenBio AI
GenBio AI@genbioai·
An excellent piece in @Nature on the promise and challenges of virtual cells, including insights from GenBio AI's @ericxing and @Prof_Lundberg on why multi-scale, multimodal approaches are the future, and why world models are key to building them. Learn more → x.genbio.ai/dTHCHh
GenBio AI tweet media
English
0
5
8
879
Mingkai Deng
Mingkai Deng@mdeng34·
@BetaTomorrow @drfeifei Thank you for the summary and interpretation. We agree that there are many intricacies in the mathematical foundation of world models
English
1
0
0
70
deep Manifold
deep Manifold@BetaTomorrow·
From the Deep Manifold view, Critiques of World Models is valuable because it moves the world-model discussion away from static representation or video generation toward actionable simulation: a world model should not merely describe the world, but internally explore possible futures for reasoning and acting. Deep Manifold would read this as boundary-conditioned numerical traversal over stacked piecewise manifolds: state, action, goal, and context define the boundary conditions; the learned model geometry supplies possible intrinsic pathways; and useful simulation emerges when these pathways stabilize into dynamic stochastic fixed points. The paper’s emphasis on hierarchical, mixed continuous/discrete, Physical–Agentic–Nested world modeling aligns with the idea that no single smooth latent space can capture the real world’s high-order nonlinearity; instead, world modeling requires layered, compositional, and federated manifold structure. In this sense, a world model is not an inner copy of reality, but a learnable numerical computation system that approximates consequence under boundary conditions. #DeepManifoldInterpretation
deep Manifold tweet media
English
1
0
1
126
Mingkai Deng
Mingkai Deng@mdeng34·
Thanks for your reply. I agree that the short-term solution is not necessarily at odds with the long-term solution. For example, the program-simulator can be a helpful resource for training the model-simulator. In these days, however, the long-term solution might happen faster than expected. #PAN-v1 has already been released by @IFM_MBZUAI last year, and v2 is hot in the works! Yesterday, @NVIDIAAI has also released the wonderful #Cosmos3 Website: panworld.ai Tech Report: arxiv.org/abs/2511.09057
English
2
0
3
247
Vivi
Vivi@vivilinsv·
Spot on analysis Mingkai! The real goal is a simulator for decision-making, not eye-candy rendering. Your critique of Gaussian splats + physics engines (program-as-simulator) vs. learned hierarchical representations (model-as-simulator) is the sharpest part of the paper. The bitter-lesson angle is compelling: pure learned reps should scale better for multi-agent, long-horizon, uncertain worlds. Still, I wonder if the two aren’t complementary short-term — explicit structure gives reliable physics grounding today, while the PAN-style learned model is the endgame for true simulative reasoning. Either way, this debate just leveled up the whole field. Excited to see PAN in action.
English
1
0
4
282