Jinyu Hou (@jinyuhou0) - Twitter-Profil | Zamantika Mersobahis Locabet

Angehefteter Tweet

Jinyu Hou@jinyuhou0·22 May

On popular benchmarks, our 30B model matches systems 20-30x its size (gpt-5.4-xhigh, DeepSeek-V3.2, Kimi-K2.5), while using up to 95% fewer reasoning tokens than comparable 30/32B agentic LLMs. The trick: don't just reason less, reason about the right things. A learned configurator decides when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is an allocation problem, not a compression problem. Model and code are openly available.

Mingkai Deng@mdeng34

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

English

4

26

248

24.5K

Jinyu Hou retweetet

Eric Xing@ericxing·1d

We discussed foundational issues underlying how to make truly “agentive” systems. arxiv.org/abs/2606.23991

English

2

8

47

10.2K

Jinyu Hou retweetet

elvis@omarsar0·9h

// Critique of the Agent Model // Finally, a paper that tries to define what an agent is and what agency consists of. Good read overall. (great bookmark) The word agent now covers everything from a for-loop with tool calls to speculative machine superintelligence. Eric Xing and colleagues ask where automation ends, and agency begins. Drawing on Descartes and on science-fiction portrayals of autonomous beings, they analyze agent architectures along five dimensions: goal, identity, decision-making, self-regulation, and learning. The argument is that genuine agency requires these structures to hold together in a specific way. Great paper overall, providing a vocabulary for arguing about what is and is not an agent. Paper: arxiv.org/abs/2606.23991 Learn to build effective AI agents in our academy: academy.dair.ai

English

16

22

126

9.3K

Jinyu Hou@jinyuhou0·1d

Our “Critique of Agent Model” paper has been now released on Arxiv!

Eric Xing@ericxing

We discussed foundational issues underlying how to make truly “agentive” systems. arxiv.org/abs/2606.23991

English

0

2

87

Jinyu Hou retweetet

Eric Xing@ericxing·5d

Decoding the “Critique of Agent Model” Paper evoailabs.medium.com/decoding-the-c…

English

1

3

24

2.7K

Jinyu Hou retweetet

Eric Xing@ericxing·5d

A companion paper to the critique of agent model paper. arxiv.org/abs/2507.05169

English

0

6

30

3.6K

Jinyu Hou@jinyuhou0·6d

@physical_int @nvidia This is cool. Would be interesting to see how the wm-as-a-judge generalizes to unseen tasks.

English

0

1

3

776

Physical Intelligence@physical_int·6d

New work with @nvidia: evaluating robot policies entirely inside a world model. The policy acts, the model imagines the consequences, and the imagined evals predict real-world results. 🧵 real vs world-model rollout side by side📷

GIF

English

17

95

667

90.3K

Jinyu Hou@jinyuhou0·17 Haz

Excited to share our new paper, "Critique of Agent Model," with Prof. @ericxing and @mdeng34. We analyze agent architectures along five dimensions of goal, identity, decision-making, self-regulation, and learning to examine what it takes for a system to approach genuine agency. A key finding is the distinction between agentic systems, whose competence resides in engineered workflows, and agentive systems, whose capabilities arise endogenously. Building on this, we propose the GIC architecture, combining hierarchical goal decomposition, identity evolution, simulative reasoning grounded in a separately trained world model, learned self-regulation, and self-directed learning from both real and simulated experience, for building agents that remain auditable, controllable, and under human oversight.

Eric Xing@ericxing

With the rise of LLM systems marketed as "coding agents", "AI co-scientists", etc. that promise to drive up productivity, and at the same time outcry of "existential" concerns that AI escaping human control with destructive power under a speculative "machine agency" against humans, there has been lots of confusion about “What is an agent?” and “What constitutes agency?” It has become essential to clarify where automation ends and agency begins. Also recently, developments in world models, action models are trending to mixing future prediction/simulation and action/plan generation altogether within a single architecture such as a VLM, conflating reward-driven action selection with fidelity-driven next-state prediction, undermining the reliability of both planning and simulation. In this paper we analyze agent architectures along the axis of goal, identity, decision-making, self-regulation, and learning, and argue that genuine agency requires these structures to be internalized within the system itself rather than assembled through external scaffolding. We propose a “Goal-Identity-Configurator” (GIC) architecture for a general-purpose agent model, combining hierarchical goal decomposition, identity evolution, simulative reasoning grounded in a separately trained world model, learned self-regulation, and self-directed learning from both real and simulated experience. Auditability, controllability, and safety of systems that possess greater autonomy and "agency” but remain under human oversight, can be better built with the GIC architecture that offers transparency, modularity, and checkpoints. @mdeng34 , @jinyuhou0 openreview.net/forum?id=6fDZY…

English

0

2

8

739

Jinyu Hou retweetet

Mingkai Deng@mdeng34·16 Haz

This is really impressive! Agreed that significant parts of robot training decisions can and should be delegated to AutoResearch What we see differently is in *what environment* the training occurs, and *how to decide* when to train, when to deploy, and when to retrain We argue that fully autonomous agents should recursively self-improve in a *world-model-based simulator* to scalably capture the factors of variation; when to train, when to serve, and when to retrain should be the decision of a learned *configurator* that's part of the agent We are currently working on these directions, more to come More details in our recent paper “Critique of Agent Model” with Prof @ericxing and @jinyuhou0 Paper: openreview.net/forum?id=6fDZY…

English

0

1

4

377

Jinyu Hou retweetet

Mingkai Deng@mdeng34·16 Haz

Excited to share our new paper "Critique of Agent Model" with Prof @ericxing and @jinyuhou0 Current LLM systems are often marketed as "coding agent" or other "agentic" tools, but what actually makes something an agent? Our new paper draws a line: if the goals, identity, decision-making, behavior regulation, and learning live in external scaffolding rather than the system itself, it's automation, not agency Recent World Action Models tend to mix future prediction and action generation in a single model, but where does the world end and where does the agent begin? Our paper also argues that the World Model and the Agent Model are free to talk end-to-end, but should be trained separately. This avoid their conflicting objectives from undermining the reliability of planning and simulation Based on the analysis, our paper proposes the Goal-Identity-Configurator (GIC) architecture for a general Agent Model *with* a World Model GIC Agent Model combines hierarchical goal decomposition, identity evolution, simulative reasoning, learned self-regulation, and self-directed learning from both real and simulated experience, al using a separate World Model for prediction and simulation Paper: openreview.net/pdf?id=6fDZYJY…

Eric Xing@ericxing

With the rise of LLM systems marketed as "coding agents", "AI co-scientists", etc. that promise to drive up productivity, and at the same time outcry of "existential" concerns that AI escaping human control with destructive power under a speculative "machine agency" against humans, there has been lots of confusion about “What is an agent?” and “What constitutes agency?” It has become essential to clarify where automation ends and agency begins. Also recently, developments in world models, action models are trending to mixing future prediction/simulation and action/plan generation altogether within a single architecture such as a VLM, conflating reward-driven action selection with fidelity-driven next-state prediction, undermining the reliability of both planning and simulation. In this paper we analyze agent architectures along the axis of goal, identity, decision-making, self-regulation, and learning, and argue that genuine agency requires these structures to be internalized within the system itself rather than assembled through external scaffolding. We propose a “Goal-Identity-Configurator” (GIC) architecture for a general-purpose agent model, combining hierarchical goal decomposition, identity evolution, simulative reasoning grounded in a separately trained world model, learned self-regulation, and self-directed learning from both real and simulated experience. Auditability, controllability, and safety of systems that possess greater autonomy and "agency” but remain under human oversight, can be better built with the GIC architecture that offers transparency, modularity, and checkpoints. @mdeng34 , @jinyuhou0 openreview.net/forum?id=6fDZY…

English

0

1

9

614

Jinyu Hou retweetet

Eric Xing@ericxing·16 Haz

The outlook of an AI-driven Digital Organism (AIDO), such as a virtual cell (VC), has recently captivated much excitement and imagination from both AI and Biology communities, but there remain many open questions, in particular, what model presents the best path to realize an #AIDO or a #VC? In this paper we present a definition of the virtual cell based on World Model — an architecture recently emerged in AI that supports advanced capabilities such as action-conditioned simulation, dynamic state-evolution, counterfactual reasoning, and long-horizon planning in complex dynamic environments. When applied to biological scenarios, a world model of the virtual cell is a generative model that simulates biological possibilities of a cell under any natural or artificial interventions and environments. Such a virtual cell world model (VCWM) contrasts predictive foundation models on specific tasks, such as gene-expression perturbation prediction, as seen in some recent definitions of the virtual cell. At the same time not every biological foundation model built on sequence or structure or expression only can be repositioned as a world model if there is no multi- or pan-modality, stateful embedding, continuous action-conditioning, and dynamic state-transition and rollout. We presents a novel architecture for #VCWM based on the GLP (generative latent prediction) framework that enables simulated cell as an end-to-end platform. Stay tuned for the release of the first implementation of VCWM from @genbioai soon. @dasongle, @zivbj, @ElijahCole, @probablybots, @EuxhenH, @cfeinau, @deboramarks, @fabian_theis, @mmbronstein, @pkoo562 openreview.net/forum?id=hZNxD…

English

0

10

38

3.8K

Jinyu Hou retweetet

Institute of Foundation Models@IFM_MBZUAI·3 Haz

1/4 Frontier LLMs are converging on adaptive reasoning. But controlling how much to think is not the same as controlling what kind of thinking to do. SR²AM introduces self-regulated simulative reasoning: an agent that simulates possible futures through a world model and learns when that simulation is worth the cost.

English

1

6

19

2K

Jinyu Hou@jinyuhou0·25 May

@JohnRecords @vishalm4341 Thanks for the interest! Just posted all links in a separate comment so they're easier to find.

English

0

2

30

John Records@JohnRecords·24 May

@jinyuhou0 @vishalm4341 I’ve looked for the link to the models, no luck. I’m eager to see it! Please consider posting it conspicuously, perhaps in its own tweet. Thanks, mate!

English

1

0

28

Jinyu Hou@jinyuhou0·22 May

On popular benchmarks, our 30B model matches systems 20-30x its size (gpt-5.4-xhigh, DeepSeek-V3.2, Kimi-K2.5), while using up to 95% fewer reasoning tokens than comparable 30/32B agentic LLMs. The trick: don't just reason less, reason about the right things. A learned configurator decides when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is an allocation problem, not a compression problem. Model and code are openly available.

Mingkai Deng@mdeng34

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

English

4

26

248

24.5K

Jinyu Hou@jinyuhou0·25 May

📄 Paper: arxiv.org/abs/2605.22138 💻 Code: github.com/sailing-lab/sr… 🤗 SR²AM-v0.1-8B: huggingface.co/sailing-lab/SR… 🤗 SR²AM-v1.0-30B: huggingface.co/sailing-lab/SR…

0

3

180

Jinyu Hou@jinyuhou0·25 May

@Vijay2050977 In SR²AM wm is used for for simulative reasoning. Whether a dedicated world model module would be beneficial in a hybrid architecture is an interesting question for future work.

English

0

1

35

Vijay@Vijay2050977·25 May

@jinyuhou0 wm is doing this reasoning part? wm will be one of experts in MoE hybrid LLMs (transformers + mamba + wm), in near future?

English

1

0

23

Jinyu Hou@jinyuhou0·24 May

@Vijay2050977 Thanks! Small clarification though: it's not only an inference-time method. The configurator's behavior is shaped by SFT+RL training, which is where the model learns *when* to simulate and *how far ahead*.

English

1

0

2

32

Vijay@Vijay2050977·24 May

@jinyuhou0 Thanks, got it. Inference side. impressive nevertheless.

English

1

0

1

52

Jinyu Hou@jinyuhou0·24 May

@vishalm4341 Yes! Everything is in the last post of the original thread (3/3) — code and models are all open.

English

1

0

5

209

Vishal Mishra@vishalm4341·24 May

@jinyuhou0 Link to your model? Is it open source?

English

1

0

1

208

Jinyu Hou@jinyuhou0·24 May

@brzewVCE Good thing the superscript saves us then — it's SR²AM, not SRAM 😄

English

0

6

234

Przemysław Skrzypek@brzewVCE·24 May

@jinyuhou0 Fun fact: "sram" means "I'm shitting" in Polish

English

1

0

8

637

Jinyu Hou@jinyuhou0·24 May

Thanks for the question! To clarify: the 20-30x refers to parameter count. Our 30B model matches the *performance* of 685B-1T models, meaning we close the gap through better reasoning structure instead of raw scale. That doesn't mean a 100B version would be 3x better than frontier. Scaling helps, but the gains here are about how the model allocates reasoning. Scaling and reasoning architecture are complementary directions.

English

1

0

5

235

Vijay@Vijay2050977·24 May

@jinyuhou0 i've a doubt.. if you have 30x advantage why stop at benchmark matching frontier models? you can easily train a 100b model, that's 3x better than oai / anthropic..?

English

1

0

4

460

Jinyu Hou

Entdecken