Jinyu Hou

25 posts

Jinyu Hou

@jinyuhou0

PhDing @LTIatCMU || MS @MLDCMU || HBSc @UofT Interested in agent, world model, RL

Pittsburgh, PA Katılım Aralık 2017

354 Takip Edilen187 Takipçiler

Sabitlenmiş Tweet

Jinyu Hou@jinyuhou0·1d

On popular benchmarks, our 30B model matches systems 20-30x its size (gpt-5.4-xhigh, DeepSeek-V3.2, Kimi-K2.5), while using up to 95% fewer reasoning tokens than comparable 30/32B agentic LLMs. The trick: don't just reason less, reason about the right things. A learned configurator decides when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is an allocation problem, not a compression problem. Model and code are openly available.

Mingkai Deng@mdeng34

Frontier LLMs are converging on efficient, adaptive reasoning. Opus 4.7 lets the model decide how deeply to reason. GPT-5.5 achieves strong results with fewer reasoning tokens. We study a related but more structural question: what 𝗸𝗶𝗻𝗱 𝗼𝗳 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 should we adapt? Last year in SiRA (upper figure), we showed that simulative reasoning (System II), which uses a 𝘄𝗼𝗿𝗹𝗱 𝗺𝗼𝗱𝗲𝗹 to evaluate consequences of actions, yields up to 124% improvement over reactive baselines (System I), and that strong reasoning models (o1, o3-mini) fail as planners without this structure. In our new paper SR²AM (lower figure), we add a learned 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗼𝗿 (System III) that self-regulates when to simulate, how far ahead, and when to skip planning entirely. Efficient reasoning is not just shorter reasoning: it is better allocation of simulation.

English

190

18.3K

Jinyu Hou@jinyuhou0·13h

Congrats, Benhao! The 2.6% → 84.8% story is striking.

Benhao Huang@huskydogewoof

𝐇𝐨𝐰 𝐝𝐨 𝐰𝐞 𝐠𝐞𝐭 𝐟𝐫𝐨𝐦 𝐚 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐟𝐞𝐞𝐝𝐟𝐨𝐫𝐰𝐚𝐫𝐝 𝐦𝐨𝐝𝐞𝐥 𝐭𝐨 𝐚 𝐜𝐚𝐩𝐚𝐛𝐥𝐞 𝐢𝐭𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐦𝐨𝐝𝐞𝐥? On Sudoku, we traced the exact path of unlocking neural attractors: - Feedforward → 2.6% - Weight-tying → 32.6% - Online Training → 74.7% - Hierarchy → 76.5% - Adaptive Compute → 84.8% Each jump wasn't just a trick. It was a choice about how to shape the attractor landscape. Here is what we learned: 🧵👇 #ICML2026

English

800

Jinyu Hou retweetledi

Lara Sá Neves@larasnevess·1d

SR²AM is out! Thinking longer ≠ thinking smarter. SR²AM knows which one it needs. A configurator regulates internal simulation: when to predict future states, how far, and when to skip. Result: 30B competing with 685B–1T at a fraction of the token cost. Model and code available

Mingkai Deng@mdeng34

English

2.9K

Jinyu Hou retweetledi

Benhao Huang@huskydogewoof·2d

🌀 Introducing 𝐄𝐪𝐮𝐢𝐥𝐢𝐛𝐫𝐢𝐮𝐦 𝐑𝐞𝐚𝐬𝐨𝐧𝐞𝐫𝐬 (𝐄𝐪𝐑) ! Feedforward models and weight-tied models behave very differently on hard reasoning generalization. EqR pushes this difference to the extreme by learning 𝐭𝐚𝐬𝐤-𝐜𝐨𝐧𝐝𝐢𝐭𝐢𝐨𝐧𝐞𝐝 𝐧𝐞𝐮𝐫𝐚𝐥 𝐚𝐭𝐭𝐫𝐚𝐜𝐭𝐨𝐫𝐬 . • Sudoku-Extreme: 99.8% • Maze: 93% #ICML2026

English

278

64.9K

Jinyu Hou retweetledi

Mingkai Deng@mdeng34·11 Nis

Really interesting post -- agreed that our goals should be physical AGI, and goal-driven beats idea-driven. Though we see it differently on a couple of things: 1. If you pick 99% + 1 hour of demonstrated task data as your success criteria, world model will surely look unnecessary. But physical AGI is about dealing with situations you **cannot** demonstrate ahead of time. This is not a methods debate, but a goal debate. World model solves this problem by simulating possible outcomes and generating synthetic experience for unseen tasks. 2. One useful analogy: LLMs aren't strong just because of post-training RL. Self-supervised pretraining is arguably the source of its intelligence. World models play the same role for physical AI -- they're not a training trick you can skip with more data, but an indispensable component for understanding and reasoning. 3. Language is not just a "crutch while we don't have enough robotics data" -- it encodes institutions, social norms, and mental states that physical interaction data can't capture efficiently, regardless of scale. This is what led us to the GLP (Generative Latent Prediction) world model architecture. It includes an enhanced LLM dynamics backbone and mixed continuous/discrete latent states. Language and physical commonsense aren't A or B, but complementary abstractions the world model should unify. PAN, a world model built on GLP, is trained on internet data but already enables open-domain action simulation that transfers to robotic policies. More on GLP: arxiv.org/abs/2507.05169 More on PAN: arxiv.org/abs/2511.09057

English

179

Jinyu Hou@jinyuhou0·11 Nis

Really thought-provoking post — the goal-driven vs. idea-driven distinction resonates a lot. It got me wonder though: perhaps goal-driven research doesn't have to be agnostic about world model? In our recent work (arxiv.org/abs/2507.05169), we argue that the field has gotten too focused on world models as video generators, when their real value should be as reasoning engines — specifically, simulating counterfactual action outcomes to enable planning, which seems closely aligned with the zero-shot physical AGI goal outlined here. These aren't mutually exclusive with data scaling — if anything, a good world model should amplify the value of the data you already have by enabling generalization beyond its empirical coverage. Would love to hear your thoughts.

English

165

Pete Florence@peteflorence·7 Nis

x.com/i/article/2041…

ZXX

161

1.1K

324.6K

Jinyu Hou retweetledi

LLM360@llm360·5 Ara

To mark the 2nd anniversary of LLM360, we are proud to release K2-V2: a 70B reasoning-centric foundation model that delivers frontier capabilities. As a push for "360-open" transparency, we are releasing not only weights, but the full recipe: data composition, training code, logs, and intermediate checkpoints. About K2-V2: 🧠 70B params, reasoning-optimized 🧊 512K context window 🔓 "360-Open" (Data, Logs, Checkpoints) 📈 SOTA on olympiad math and complex logic puzzles

English

21.8K

Jinyu Hou retweetledi

Eric Xing@ericxing·5 Ara

Now you have an alternative to the super popular but unfortunately not so transparent (you have no idea how it was trained, what data was used, is it safe …) base LLMs such as Qwen 2.5 or 3, to build your own reasoning or general purpose LLMs through post-train, SFT, RL, etc. It is 360-open and reproducible.

MBZUAI@mbzuai

Today, we are releasing a new version of K2 (K2-V2), a 360-open LLM built from scratch as a superior base for reasoning adaptation, while still excelling at core LLM capabilities like conversation, knowledge retrieval, and long-context understanding. K2 fills a major gap: highly capable models with no transparency. Instead of releasing only weights, we’re sharing the full training story — dataset recipes, mid-training checkpoints, logs, code, and evaluation tools. That’s 360-open. What’s inside: • 70B dense transformer engineered as a reasoning-enhanced base model • Native 512K context (extendable via RoPE scaling) • Mid-training reasoning phase • Strong tool-use scaffolding What we’re open-sourcing: • 250M+ reasoning traces (math, planning, multi-step logic) • Full pre- & mid-training data compositions • All mid-training checkpoints • Training logs, code, Eval360 Performance: • GPQA-Diamond: 55.1% mid-training → 69.3% after SFT (strongest fully open 70B model) • KK-8 Logic Puzzles: 83% — competitive with DeepSeek-R1 & OpenAI o3-mini-high • ArenaHard V2: 62.1% — close to Qwen3 235B • Outperforms Qwen2.5-72B and approaches Qwen3-235B despite being smaller and fully transparent. 🔗 The Model: bit.ly/3KIYwuo 🔗Technical Report: bit.ly/49V8h2U 🔗Blog: bit.ly/49V7gb6

English

9.4K

Jinyu Hou retweetledi

Eric Xing@ericxing·14 Kas

In this paper we present the first full implementation of the Generative Latent Prediction (GLP) architecture of world modeling, that brings perception, state, action, and causality into a single, coherent world model that can plan, imagine, and reason through language, interaction, and thought experiment. arxiv.org/abs/2511.09057 @szxiangjn, @YiGu025, @guangyi_l, @waterluffy, @ZhitingHu

English

15.1K

Jinyu Hou retweetledi

Zhiting Hu@ZhitingHu·14 Kas

🔥Really excited to see the release of PAN world model, a project I had been working over the past years. PAN is a general world model capable of simulating physical, agentic, and nested worlds, synthesizing infinite interactive experiences for training AI agents. Building on top of pretrained LLMs and video diffusion models, PAN connects language, perception, action, and latent thoughts, for long-horizon simulation and reasoning. PAN shows overwhelming performance gains over JEPA-2, Cosmos-2, and other prior models. More in the thread👇 ... 1/

English

240

31.1K

Jinyu Hou retweetledi

Mingkai Deng@mdeng34·8 Tem

Honored to co-lead this paper with @ericxing & team - Formally showed WM as part of optimal, general agent - Reviewed several schools of WM towards this goal - Outlined an new PAN architecture for general WM Excited for the upcoming release of 27B PAN v1! arxiv.org/abs/2507.05169

Eric Xing@ericxing

I have been long arguing that a world model is NOT about generating videos, but IS about simulating all possibilities of the world to serve as a sandbox for general-purpose reasoning via thought-experiments. This paper proposes an architecture toward that arxiv.org/abs/2507.05169

English

4.1K

Jinyu Hou retweetledi

Eric Xing@ericxing·8 Tem

English

514

46.6K

Jinyu Hou@jinyuhou0·12 Şub

Check out our work that has won the 2nd place at the Fundamental Track!

Dawn Song@dawnsongtweets

🎉 Excited to announce the winning teams of LLM Agents MOOC Hackathon! We’re thrilled by the amazing participation and enthusiasm from the global AI community: 🌍 ~3,000 participants from 127 countries 🎓 1,100+ universities participated 💼 800+ companies represented • Top countries represented: 🇺🇸 US, 🇮🇳 India, 🇨🇳 China • Top schools represented: @UCBerkeley @UofIllinois @Stanford @CarnegieMellon @Northeastern • Top companies represented by participants: @Amazon @Microsoft @Samsung @salesforce

English

1.8K

Jinyu Hou@jinyuhou0·6 Şub

Check out our latest work -- A great effort from the team, excited to see this come to life! 👇

Maitrix.org@MaitrixOrg

🤖Thrilled to introduce _ReasonerAgent_ - A fully open source, ready-to-run agent that does research🧐 in a web browser and answers your queries Use ReasonerAgent to help you: ✈️search for flights, 🛍️compile shopping options, 🗞️research news coverage, etc. 📘Check out more 👇 1/6

English

449

Jinyu Hou@jinyuhou0·25 Eki

@savvyRL Hi Rosanne, I applied for the MS position and sent you a message in DM.

English

153

Rosanne Liu@savvyRL·25 Eki

Potentially hiring Student Researchers to work on fundamental research from studying small-scale transformers to better understand training, to operating brain surgery on existing LLMs. Apply, and get in touch! BS/MS: google.com/about/careers/… PhD: google.com/about/careers/…

English

304

60.9K

Jinyu Hou retweetledi

Caleb Ellington@probablybots·20 Eki

The Contextualized Machine Learning White Paper arxiv.org/abs/2310.11340 w/ @ben_lengerich Intuition, applications, algorithms, and extensions for contextualized models: models that understand heterogeneity in real data, adapt to new environments, and are explainable by design.

English

4.6K

Jinyu Hou retweetledi

Sang Choe@sangkeun_choe·23 Eki

High-quality data is a key to successful pretrain/finetuning in the GPT era, but manual data curation is expensive💸 We tackle data quality challenges involving large models and datasets with ScAlable Meta leArning (SAMA) #NeurIPS2023💫 Arxiv: arxiv.org/abs/2310.05674 🧵 (1/n)

English

13.7K

Jinyu Hou retweetledi

Vahid Balazadeh@vahidbalazadeh·29 Kas

There's been a lot of success in causal effect estimation using machine learning. But what if point identification is impossible? Our NeurIPS 2022 paper, "Partial Identification of Treatment Effects with Implicit Generative Models," estimates bounds on causal effects instead. 🧵

English

Jinyu Hou@jinyuhou0·20 Eyl

Working on the project was a great experience from which I learned a lot. Many thanks to @kieranrcampbell for all the instructions and thank everyone else on the project for the great collaboration!

Kieran Campbell@kieranrcampbell

My group's first research paper on automated cell type assignment for highly multiplexed imaging data now published in @CellSystemsCP Paper: authors.elsevier.com/a/1dmB38YyDffJ… Tool: github.com/camlab-bioml/a… Some thoughts and updates:

English

Jinyu Hou retweetledi

Kieran Campbell@kieranrcampbell·18 Şub

Our first research paper as a group was preprinted today: automated cell assignment for highly multiplexed imaging and proteomic data Paper: biorxiv.org/content/10.110…

English

189

Keşfet

@szxiangjn @YiGu025 @guangyi_l @waterluffy @ZhitingHu @ericxing @savvyRL @ben_lengerich