Shangmin Guo

116 posts

Shangmin Guo banner
Shangmin Guo

Shangmin Guo

@ShangminGuo

Co-founder @ ConteXeed AI | Compounding agents into aligned&reliable collaborators | ex-Google DeepMind and Cohere | PhD, Edinburgh

Edinburgh, Scotland Katılım Ağustos 2018
192 Takip Edilen256 Takipçiler
Sabitlenmiş Tweet
Shangmin Guo
Shangmin Guo@ShangminGuo·
Most agent systems disappoint not because they are not smart enough but because they keep making humans teach the same lesson twice. A boundary explained five times should become a default, not a recurring reminder. That missing mechanism is Harness Learning. We wrote up how we think this will define the next generation of collaboration-aligned agent systems.
ConteXeed@contexeed

x.com/i/article/2042…

English
0
0
4
205
Shangmin Guo
Shangmin Guo@ShangminGuo·
You’re describing the problem we’ve been building against. Once you treat hierarchy as an information-routing layer, the real question becomes: what is the new coordination substrate for humans, agents, and projects working together? Our answer is Harness Bus with explicit addressing, pub-sub style flow, and bridges across hierarchies. Attaching a sketch because the architecture matters here.
Shangmin Guo tweet media
English
0
0
0
33
Shangmin Guo
Shangmin Guo@ShangminGuo·
People still haven’t realised this: agent harness is no longer just a static layer around the model. It is starting to behave like a living organism. As August Schleicher wrote of language: “they rose, and developed themselves... they grew old, and died out.” The same is now becoming true of harnesses. Some will adapt. Some will ossify. Some will quietly die.
ConteXeed@contexeed

x.com/i/article/2042…

English
0
0
2
127
Shangmin Guo
Shangmin Guo@ShangminGuo·
We’re seeing the same failure mode: if you have to ask for something twice, the system didn’t learn. This signal matters because repeat correction shows where human feedback never becomes default behaviour. The miss is no longer about one bad run. It reveals that the system has no way to inherit the feedback. Our view is that the next step is a harness that can learn from human feedback and evolve the system’s default policy: what it prioritises, when it asks for review, what it escalates, and how it behaves by default. We’ve been calling this harness learning: x.com/contexeed/stat…
English
0
0
0
1K
Shangmin Guo
Shangmin Guo@ShangminGuo·
@contexeed A stronger model inside a brittle harness is still a brittle system.
English
0
0
0
7
Shangmin Guo retweetledi
ConteXeed
ConteXeed@contexeed·
The agent market is measuring the wrong thing. It keeps asking whether a model can complete a task. The real question is whether a system can stop wasting human coordination. That is where the next battle will be won.
ConteXeed@contexeed

x.com/i/article/2042…

English
0
1
3
75
Shangmin Guo retweetledi
ConteXeed
ConteXeed@contexeed·
The next real moat in AI is the harness that learns Most agents can do tasks. Very few become better collaborators through use The missing layer is a harness that turns repeated human correction into lasting system behaviour. We call this Harness Learning
ConteXeed@contexeed

x.com/i/article/2042…

English
0
1
2
67
Shangmin Guo
Shangmin Guo@ShangminGuo·
Agent-native companies won’t first kill jobs. They’ll kill meetings. Companies today run on humans syncing context with other humans. AI-native companies will run on humans creating context with agents, and agents consuming it directly. #AIAgents #FutureOfWork #OrgDesign #AI
Shangmin Guo tweet media
English
0
0
0
70
Shangmin Guo
Shangmin Guo@ShangminGuo·
@dwarkesh_sp @ilyasut Great interview! To answer the question about public proposals of self-play (01:30:26): we actually published a self-play post-training algorithm in early 2024. It covers how LLMs can improve iteratively through self-play. Link: arxiv.org/abs/2402.04792
English
0
0
4
396
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
The @ilyasut episode 0:00:00 – Explaining model jaggedness 0:09:39 - Emotions and value functions 0:18:49 – What are we scaling? 0:25:13 – Why humans generalize better than models 0:35:45 – Straight-shotting superintelligence 0:46:47 – SSI’s model will learn from deployment 0:55:07 – Alignment 1:18:13 – “We are squarely an age of research company” 1:29:23 – Self-play and multi-agent 1:32:42 – Research taste Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!
English
404
1.3K
8.6K
4.1M
Shangmin Guo
Shangmin Guo@ShangminGuo·
No need to worship benchmarks. People remember features, not charts. A spark that captures imagination drives attention and conversion far more than a 3-point bump, e.g. Nano Banana generating 3D figurines. #AI #ProductThinking #Growth
English
0
0
0
93
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
I think this project could be one of those "why have we ever done this differently?!" kind of moments. Instead of doing code training by just predicting the next token in the source file, interleave that with interpreter state which also have to be predicted! Devil's in the detail to get this working, of course, but in hindsight, this seems like the right way to force the model to actually understand code.
Gabriel Synnaeve@syhw

4/ Here is an example of the Code World Model tracing the execution of the piece of code counting the "r"s in "strawberry". Think of it like a neural `pdb` that you can set to any initial frame state, and that reasoning can query as a tool in token space.

English
24
52
724
110.1K
AI at Meta
AI at Meta@AIatMeta·
New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code. We believe in advancing research in world modeling and are sharing CWM under a research license to help empower the community to build upon our work. ➡️ Read the technical report: ai.meta.com/research/publi… ➡️Download the open weights: huggingface.co/facebook/cwm ➡️Download the code: github.com/facebookresear…
English
90
225
1.4K
312.1K
Shangmin Guo
Shangmin Guo@ShangminGuo·
How to compare neural architectures for language modelling? Use efficiency: FLOPs per unit perplexity drop. Score = FLOPs / ΔPPL (lower is better). #ML #NLP
English
0
0
0
157
Shangmin Guo
Shangmin Guo@ShangminGuo·
Back in 2023, we explored using Economics Games to evaluate LLMs’ intelligence and reasoning Feel free to check out our paper here: arxiv.org/abs/2401.01735. Excited to see the community embracing games as a way to measure these capabilities!
Google DeepMind@GoogleDeepMind

We have a long history of using games to measure progress in AI. 🎮 That’s why we’re helping unveil the @Kaggle Game Arena: an open-source platform where models go head-to-head in complex games to help us gauge their capabilities. 🧵

English
0
1
1
403
Jon Richens
Jon Richens@jonathanrichens·
Are world models necessary to achieve human-level agents, or is there a model-free short-cut? Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵
Jon Richens tweet media
English
35
173
1.1K
185.6K