alex zhang@a1zhang
Fundamentally, what really is the difference between an RLM and S={context folding, Codex, Claude Code, Terminus, agents, etc.}?
This is the last and most important RLM post I'll make for a while to finally answer all the "this is trivially obvious" from HackerNews, Reddit, X, etc. I know there's a lot of noise rn, but this is the one thread I'd rly ask you not to skip!
For a while I didn't have a super clear answer to this. and no, it's NOT that:
1. CC sub-agents are user-defined while the LM defines the sub-agent in RLMs. this is a minor difference that I suspect Anthropic will phase out at some point
2. Coding scaffolds use a file system while the original paper uses a Python REPL. In fact, FS is a REPL.
3. RLMs offload context into a variable. CC / Codex implicitly do this by saving to files, and yes, I know that people have been doing this for time
But I think after some long convos with @lateinteraction @zli11010 and @ChenSun92, I can articulate a lot better that all of these things are important but are missing what actually matters.
RLMs enable **symbolic recursion** -- this means the RLM can spawn recursive calls embedded in symbolic logic. In simple terms, the recursive LM call lives inside the REPL. While for CC / Codex, the sub-agent call is spawned directly as a tool by the main model. This is a subtle difference but extremely significant.
Consider the following example. Say I want my agent to ingest 1M files and find a function I'm looking for.
1. The CC model will hopefully sequentially launch 1M JSON-like sub-agent tool calls per file in its main context, get the answer for each (maybe save to a file), and return.
2. The RLM will write a for-loop / parallel map over each sub-agent call to open each file, save to a var / file, and grab the answer.
The problem here is that we rely on Opus 4.5 to perform a programmatic action (launch 1M sub-agent calls) without the guarantee that it'll actually do it. Now maybe it's good enough to do this, but consider a nastier task, where we want to launch sub-calls only for files satisfying some weird property P (this is quite common for say looking at databases or complicated monorepos). In fact, all tool calls are launched this way, which is fundamentally limiting (we already write code to programmatically perform operations (e.g. search if XYZ)!
The tldr; here is that the REPL and sub-calling tool being *separate* is not a good thing. It's such a subtle / simple point but it lends itself extremely well to more robust / programmatic model reasoning through training. Beyond the fact that CC / Codex are trained specifically for coding tasks, this minor difference leads to a whole class of new solutions that RLMs can solve.
(Thanks to @zli11010 for this point) From a PL perspective, the way Codex / CC handle sub-agents is almost *silly*. If we think of the REPL that the RLM uses as a "language" or sorts, sub-calling should be a feature of this language. It shouldn't be separate, and is strictly less expressive than the RLM design.
Hope this clears things up, happy to answer more questions but I plan on updating the paper to articulate this better and make things clearer :)