
Adam Gold
700 posts

Adam Gold
@AdamGolds
building long running sandboxes @ https://t.co/NQ9yehAYVd (ex-CEO @ Kypso, acquired)



Docker in Docker is something almost no sandbox provider supports. For RL workloads specifically, being able to spin up a Docker Compose or a K3S cluster inside a sandbox unlocks an enormous range of workflows that simply don't work anywhere else. That alone has been a meaningful wedge into the research + RL customer segment.

The sandboxing debate is becoming more widespread, but I fear it is focusing on the wrong things. Given too many people are ranking startup times, I decided to write this rebuttal. From Peter’s tweet, his postulation is the correct one: "How would we build software if tokens don’t matter?". In this future, sandboxing will not be about shaving milliseconds off boot. Timings will still matter, but there will be many more layers to this cake. Most critically, sandboxing will be about giving agents a high-fidelity operating environment. I think the winners will have to get these three things right: 1. Full-capability sandboxes Most sandboxes people are hyping are not actually capable enough for serious agentic work. A useful coding agent needs to clone arbitrary repos, install dependencies, run tests, spin up services, debug failures, and iterate. For example, many production repos depend on Docker. If your sandbox cannot run Docker, it is probably not ready for the workflows agents are about to own. 2. Trusted credential handling The moment agents do real work, they need access to real systems. GitHub, Linear, Slack, GMail, cloud accounts, internal dashboards, deployment tools. You cannot just hand the model your secrets and hope for the best. The right abstraction is some version of a trusted proxy: the agent can request actions, but secrets stay controlled, scoped, audited, and revocable. 3. Agent-to-agent communication Today, most sandboxed agents are isolated workers. I do not think that will last for long. The future probably looks more like fleets of specialized agents, each with different tools, permissions, memories, and objectives. Some review code. Some reproduce bugs. Some run benchmarks. Some test security. Some interact with external apps. Those agents will need to talk to each other, delegate, verify, and coordinate. So the sandbox stops being a box, and it starts to become more of a network substrate. That is why I think this market is still wide open. Nobody really knows what the agentic software stack will look like yet. But I am absolutely confident that the winning sandbox provider will not be the one that starts fastest. It will be the one that safely enables agents to do real work.

Whenever I investigate a bug, I let codex recreate the exact state in an emphemeral crabbox, verify the bug, fix it, verify the fix. No messy state because local system might be polluted, and no slowdown because I run 10 sessions in parallel. crabbox.sh



@QuinnyPig It'll be ▲. Would love your feedback. This is our primary focus!







Scion is a new multi-agent orchestration tool that orchestrates agents (Claude Code, Gemini CLI, Codex, and others) as isolated, concurrent processes. Each agent gets its own container, git worktree, and credentials — so they can work on different parts of your project without stepping on each other. github.com/GoogleCloudPla…












