Shangmin Guo

116 posts

Shangmin Guo

@ShangminGuo

Co-founder @ ConteXeed AI | Compounding agents into aligned&reliable collaborators | ex-Google DeepMind and Cohere | PhD, Edinburgh

Edinburgh, Scotland Katılım Ağustos 2018

192 Takip Edilen256 Takipçiler

Sabitlenmiş Tweet

Shangmin Guo@ShangminGuo·10 Nis

Most agent systems disappoint not because they are not smart enough but because they keep making humans teach the same lesson twice. A boundary explained five times should become a default, not a recurring reminder. That missing mechanism is Harness Learning. We wrote up how we think this will define the next generation of collaboration-aligned agent systems.

ConteXeed@contexeed

x.com/i/article/2042…

English

205

Shangmin Guo@ShangminGuo·20 Nis

You’re describing the problem we’ve been building against. Once you treat hierarchy as an information-routing layer, the real question becomes: what is the new coordination substrate for humans, agents, and projects working together? Our answer is Harness Bus with explicit addressing, pub-sub style flow, and bridges across hierarchies. Attaching a sketch because the architecture matters here.

English

jack@jack·31 Mar

x.com/i/article/2038…

ZXX

564

1.7K

11K

Shangmin Guo@ShangminGuo·14 Nis

People still haven’t realised this: agent harness is no longer just a static layer around the model. It is starting to behave like a living organism. As August Schleicher wrote of language: “they rose, and developed themselves... they grew old, and died out.” The same is now becoming true of harnesses. Some will adapt. Some will ossify. Some will quietly die.

ConteXeed@contexeed

x.com/i/article/2042…

English

127

Shangmin Guo@ShangminGuo·13 Nis

The real cost in human-AI collaboration is not the first mistake. It’s paying for the same correction multiple times! That’s the problem harness learning is meant to solve.

Garry Tan@garrytan

x.com/i/article/2042…

English

116

Shangmin Guo@ShangminGuo·13 Nis

We’re seeing the same failure mode: if you have to ask for something twice, the system didn’t learn. This signal matters because repeat correction shows where human feedback never becomes default behaviour. The miss is no longer about one bad run. It reveals that the system has no way to inherit the feedback. Our view is that the next step is a harness that can learn from human feedback and evolve the system’s default policy: what it prioritises, when it asks for review, what it escalates, and how it behaves by default. We’ve been calling this harness learning: x.com/contexeed/stat…

English

Garry Tan@garrytan·11 Nis

x.com/i/article/2042…

ZXX

132

431

1.4M

Shangmin Guo@ShangminGuo·11 Nis

@contexeed A stronger model inside a brittle harness is still a brittle system.

English

Shangmin Guo retweetledi

ConteXeed@contexeed·10 Nis

x.com/i/article/2042…

ZXX

562

Shangmin Guo retweetledi

ConteXeed@contexeed·10 Nis

The agent market is measuring the wrong thing. It keeps asking whether a model can complete a task. The real question is whether a system can stop wasting human coordination. That is where the next battle will be won.

ConteXeed@contexeed

x.com/i/article/2042…

English

Shangmin Guo retweetledi

ConteXeed@contexeed·10 Nis

The next real moat in AI is the harness that learns Most agents can do tasks. Very few become better collaborators through use The missing layer is a harness that turns repeated human correction into lasting system behaviour. We call this Harness Learning

ConteXeed@contexeed

x.com/i/article/2042…

English

Shangmin Guo@ShangminGuo·10 Mar

Agent-native companies won’t first kill jobs. They’ll kill meetings. Companies today run on humans syncing context with other humans. AI-native companies will run on humans creating context with agents, and agents consuming it directly. #AIAgents #FutureOfWork #OrgDesign #AI

English

Shangmin Guo@ShangminGuo·18 Oca

ZXX

Shangmin Guo@ShangminGuo·18 Oca

The era of AI Agents demands a reconstruction of time-saving apps. My two predictions: 1. the hard-coded GUI is becoming obsolete. 2. agents don't need to imitate human GUI interactions. Full deep dive 👇 shangminguo.notion.site/time-saving-ap… #AgenticAI #ux #ui

English

Shangmin Guo@ShangminGuo·26 Kas

@dwarkesh_sp @ilyasut Great interview! To answer the question about public proposals of self-play (01:30:26): we actually published a self-play post-training algorithm in early 2024. It covers how LLMs can improve iteratively through self-play. Link: arxiv.org/abs/2402.04792

English

396

Dwarkesh Patel@dwarkesh_sp·25 Kas

The @ilyasut episode 0:00:00 – Explaining model jaggedness 0:09:39 - Emotions and value functions 0:18:49 – What are we scaling? 0:25:13 – Why humans generalize better than models 0:35:45 – Straight-shotting superintelligence 0:46:47 – SSI’s model will learn from deployment 0:55:07 – Alignment 1:18:13 – “We are squarely an age of research company” 1:29:23 – Self-play and multi-agent 1:32:42 – Research taste Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!

English

404

1.3K

8.6K

4.1M

Shangmin Guo@ShangminGuo·4 Kas

No need to worship benchmarks. People remember features, not charts. A spark that captures imagination drives attention and conversion far more than a 3-point bump, e.g. Nano Banana generating 3D figurines. #AI #ProductThinking #Growth

English

Shangmin Guo@ShangminGuo·26 Eyl

@NeelRajani_ @BlackHC @giffmana Here you go

Shangmin Guo@ShangminGuo

📢After months of work, I can finally share our latest research, couldn’t be more thrilled and excited. 🎉 We unify a policy 🤖 and a world model 🌍 into a single LLM, thus no external dynamics model needed! Why does this matter? Because now, the policy can plan based on its internal world model! And this planning boosts tool-use success rates to >90%, on top of SFT + RL. 📄: arxiv.org/abs/2506.02918 🧵[1/8]

English

Neel Rajani@NeelRajani_·26 Eyl

@BlackHC @giffmana They do, and it works super well! For e.g. in this recent paper by @ShangminGuo arxiv.org/abs/2506.02918 !

English

413

Lucas Beyer (bl16)@giffmana·26 Eyl

I think this project could be one of those "why have we ever done this differently?!" kind of moments. Instead of doing code training by just predicting the next token in the source file, interleave that with interpreter state which also have to be predicted! Devil's in the detail to get this working, of course, but in hindsight, this seems like the right way to force the model to actually understand code.

Gabriel Synnaeve@syhw

4/ Here is an example of the Code World Model tracing the execution of the piece of code counting the "r"s in "strawberry". Think of it like a neural `pdb` that you can set to any initial frame state, and that reasoning can query as a tool in token space.

English

724

110.1K

Shangmin Guo@ShangminGuo·26 Eyl

@AIatMeta Great to see Code World Model mid-trained on execution trajectories and post-trained with multi-task RL. We explored the same idea on tool-use earlier, see our thread👇 x.com/ShangminGuo/st…

Shangmin Guo@ShangminGuo

English

130

AI at Meta@AIatMeta·25 Eyl

New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower the community to build upon our work. ➡️ Read the technical report: ai.meta.com/research/publi… ➡️Download the open weights: huggingface.co/facebook/cwm ➡️Download the code: github.com/facebookresear…

English

225

1.4K

312.1K

Shangmin Guo@ShangminGuo·17 Ağu

How to compare neural architectures for language modelling? Use efficiency: FLOPs per unit perplexity drop. Score = FLOPs / ΔPPL (lower is better). #ML #NLP

English

157

Shangmin Guo@ShangminGuo·5 Ağu

Back in 2023, we explored using Economics Games to evaluate LLMs’ intelligence and reasoning Feel free to check out our paper here: arxiv.org/abs/2401.01735. Excited to see the community embracing games as a way to measure these capabilities!

Google DeepMind@GoogleDeepMind

We have a long history of using games to measure progress in AI. 🎮 That’s why we’re helping unveil the @Kaggle Game Arena: an open-source platform where models go head-to-head in complex games to help us gauge their capabilities. 🧵

English

403

Shangmin Guo@ShangminGuo·4 Haz

@jonathanrichens @dabelcs Ah, I was wrong. I did hear about this from Dave 😂

English

Shangmin Guo@ShangminGuo·4 Haz

@jonathanrichens I totally agree, and we intentionally trained the policy to do world modelling. In the meantime, I’m surprised that I didn’t hear about this project from @dabelcs 🤣

Shangmin Guo@ShangminGuo

English

1.4K

Jon Richens@jonathanrichens·4 Haz

Are world models necessary to achieve human-level agents, or is there a model-free short-cut? Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵

English

173

1.1K

185.6K

Keşfet

@contexeed @dwarkesh_sp @ilyasut @NeelRajani_ @BlackHC @giffmana @AIatMeta @elonmusk