Souvik Banerjee
18 posts




The sandboxing debate is becoming more widespread, but I fear it is focusing on the wrong things. Given too many people are ranking startup times, I decided to write this rebuttal. From Peter’s tweet, his postulation is the correct one: "How would we build software if tokens don’t matter?". In this future, sandboxing will not be about shaving milliseconds off boot. Timings will still matter, but there will be many more layers to this cake. Most critically, sandboxing will be about giving agents a high-fidelity operating environment. I think the winners will have to get these three things right: 1. Full-capability sandboxes Most sandboxes people are hyping are not actually capable enough for serious agentic work. A useful coding agent needs to clone arbitrary repos, install dependencies, run tests, spin up services, debug failures, and iterate. For example, many production repos depend on Docker. If your sandbox cannot run Docker, it is probably not ready for the workflows agents are about to own. 2. Trusted credential handling The moment agents do real work, they need access to real systems. GitHub, Linear, Slack, GMail, cloud accounts, internal dashboards, deployment tools. You cannot just hand the model your secrets and hope for the best. The right abstraction is some version of a trusted proxy: the agent can request actions, but secrets stay controlled, scoped, audited, and revocable. 3. Agent-to-agent communication Today, most sandboxed agents are isolated workers. I do not think that will last for long. The future probably looks more like fleets of specialized agents, each with different tools, permissions, memories, and objectives. Some review code. Some reproduce bugs. Some run benchmarks. Some test security. Some interact with external apps. Those agents will need to talk to each other, delegate, verify, and coordinate. So the sandbox stops being a box, and it starts to become more of a network substrate. That is why I think this market is still wide open. Nobody really knows what the agentic software stack will look like yet. But I am absolutely confident that the winning sandbox provider will not be the one that starts fastest. It will be the one that safely enables agents to do real work.

People freaking out over my AI spend. What nobody sees: Part of what excites me so much about working on OpenClaw is that I'm trying to answer the question: How would we build software in the future if tokens don't matter? We constant run ~100 codex in the cloud, reviewing every PR, every issue. If a fix on main lands, @clawsweeper will eventually find that 6 month old issue and close it with an exact reference. We run codex on every commit to review for security issues (as it's far too easy to miss). We run codex to de-duplicate issues and find clusters and send reports for the most pressing issues. We have agents that can recreate complex setups, spin up ephemeral crabbox.sh machines, log into e.g. Telegram, make a video and post before/after fix on the PR. There's codex that watch new issues and - if it fits our documented vision well, automatically create a PR of it. (that then another codex reviews) We have codex running that scans comments for spam and blocks people. We have codex instances running that verify performance benchmarks and report regressions into Discord. We have agents that listen on our meetings and proactively start work, e.g. create PRs when we discuss new features while we discuss them. We build clawpatch.ai to split all our projects into functional units to review and find bugs and regresssions. We do the same split for security with Vercel's deepsec and Codex Security to find regressions and vulnerabilities. All that automation allows us to run this project extremely lean.


Did I just get hardware GIC on HVF to restore from a KVM snapshot or is Claude hallucinating? 🤔





to every rust vibe sloppers: Arc is NOT CHEAP it is a giant overhead tell this to your clanker I am tired of the PRs that are wrapping everything in Arc


















