Ben Werner

575 posts

Ben Werner banner
Ben Werner

Ben Werner

@benwerner

Building the most powerful VMs @freestyle_dev, previously @Clerk

San Francisco Katılım Nisan 2012
829 Takip Edilen277 Takipçiler
Ben Werner
Ben Werner@benwerner·
Mostly auth You can spin up 10 VMs that are forked from a development environment that has everything setup like an authed Claude code, an authed GitHub cli, and an authed gcloud cli. Since it’s a vm you can do long running back ground agents, or experiments, or 10 different rewrites stuff like that.
English
0
0
0
33
Ronan Berder
Ronan Berder@hunvreus·
Can't see myself moving to managed agents. It's like Replit vs a local IDE: it's marginally more convenient, but not by 10x. Not even 2x. On my local, I can give agents access to any data, code or CLI I wish to. I don't even do it, I ask them to give themselves access. That was @openclaw's main innovation IMHO: sidestep the integration/access conversation by running things locally with very few guardrails.
claire vo 🖤@clairevo

Been testing Claude Managed Agents + ChatGPT agents a bit, and even for tasks of moderate complexity tasks, I much prefer the turn/response style "chat" interface + tools than the "spin up a computer" experience of an Agent. Latency is too high and it does't feel the juice is worth the squeeze.

English
3
0
10
3.6K
Darren Shepherd
Darren Shepherd@ibuildthecloud·
What do you all think, agent in sandbox or agent out of sandbox? Seems like agents want to be out of the sandbox. I don't like it, but I get it. But I don't like it.
English
23
1
10
4.7K
Ben Werner
Ben Werner@benwerner·
Git worktrees are so clearly not the solution what are we doing
English
0
0
2
70
Ben Werner
Ben Werner@benwerner·
@amitpr nix is great - but you often need a bit more. Like auth, editor/ssh entrypoints, db branches we're launching rigkit.dev - should be interesting
English
0
0
2
244
Amit
Amit@amitpr·
Two months ago I wrote about Nix being the only sane way to manage a Linux system (or, thousands?). Feeling both more vindicated (Mythos + Shai-hulud) and doubtful (Nix can be painful!) today.
Amit tweet media
English
8
0
80
6.7K
Damian Barabonkov
Damian Barabonkov@damian_b·
The sandboxing debate is becoming more widespread, but I fear it is focusing on the wrong things. Given too many people are ranking startup times, I decided to write this rebuttal. From Peter’s tweet, his postulation is the correct one: "How would we build software if tokens don’t matter?". In this future, sandboxing will not be about shaving milliseconds off boot. Timings will still matter, but there will be many more layers to this cake. Most critically, sandboxing will be about giving agents a high-fidelity operating environment. I think the winners will have to get these three things right: 1. Full-capability sandboxes Most sandboxes people are hyping are not actually capable enough for serious agentic work. A useful coding agent needs to clone arbitrary repos, install dependencies, run tests, spin up services, debug failures, and iterate. For example, many production repos depend on Docker. If your sandbox cannot run Docker, it is probably not ready for the workflows agents are about to own. 2. Trusted credential handling The moment agents do real work, they need access to real systems. GitHub, Linear, Slack, GMail, cloud accounts, internal dashboards, deployment tools. You cannot just hand the model your secrets and hope for the best. The right abstraction is some version of a trusted proxy: the agent can request actions, but secrets stay controlled, scoped, audited, and revocable. 3. Agent-to-agent communication Today, most sandboxed agents are isolated workers. I do not think that will last for long. The future probably looks more like fleets of specialized agents, each with different tools, permissions, memories, and objectives. Some review code. Some reproduce bugs. Some run benchmarks. Some test security. Some interact with external apps. Those agents will need to talk to each other, delegate, verify, and coordinate. So the sandbox stops being a box, and it starts to become more of a network substrate. That is why I think this market is still wide open. Nobody really knows what the agentic software stack will look like yet. But I am absolutely confident that the winning sandbox provider will not be the one that starts fastest. It will be the one that safely enables agents to do real work.
Peter Steinberger 🦞@steipete

People freaking out over my AI spend. What nobody sees: Part of what excites me so much about working on OpenClaw is that I'm trying to answer the question: How would we build software in the future if tokens don't matter? We constant run ~100 codex in the cloud, reviewing every PR, every issue. If a fix on main lands, @clawsweeper will eventually find that 6 month old issue and close it with an exact reference. We run codex on every commit to review for security issues (as it's far too easy to miss). We run codex to de-duplicate issues and find clusters and send reports for the most pressing issues. We have agents that can recreate complex setups, spin up ephemeral crabbox.sh machines, log into e.g. Telegram, make a video and post before/after fix on the PR. There's codex that watch new issues and - if it fits our documented vision well, automatically create a PR of it. (that then another codex reviews) We have codex running that scans comments for spam and blocks people. We have codex instances running that verify performance benchmarks and report regressions into Discord. We have agents that listen on our meetings and proactively start work, e.g. create PRs when we discuss new features while we discuss them. We build clawpatch.ai to split all our projects into functional units to review and find bugs and regresssions. We do the same split for security with Vercel's deepsec and Codex Security to find regressions and vulnerabilities. All that automation allows us to run this project extremely lean.

English
9
11
111
26.4K
Ben Werner
Ben Werner@benwerner·
yknow back in the day using shadcn and tailwind was only for the cool kids
English
0
0
2
73
Darren Shepherd
Darren Shepherd@ibuildthecloud·
what is "set conditions". everyone runs CI in github actions or something similar, so it's fairly easy to reproduce your execution environment. But if your happy with a worktree, why would a worktree in a container be harder? It's just now you can do more stuff. And yes, it's the only way I can manage a crap ton of things in parallel. Worktrees is just chaos.
English
5
0
12
1.7K
Darren Shepherd
Darren Shepherd@ibuildthecloud·
I hate worktrees so much. What is wrong with you all.
English
72
3
411
121K
Ben Werner
Ben Werner@benwerner·
if claude is dumb but has taste, and gpt 5.5 is smart but with no taste, what is grok?
English
2
0
0
161
Ben Werner
Ben Werner@benwerner·
ok guys why does claude keeping using "load-bearing"
English
0
0
0
28
Rhys
Rhys@RhysSullivan·
@kr0der effect + tanstack start + cloudflare + drizzle has been nice
English
28
4
338
50.8K
Anthony Kroeger
Anthony Kroeger@kr0der·
what tech stack do you guys use for side projects? i usually use nextjs/tailwind/trpc/prisma/vercel but i’m trying out tanstack start/drizzle/cloudflare
English
36
1
64
12.1K
Darren Shepherd
Darren Shepherd@ibuildthecloud·
We really do just need a GitHub for agents. Not for humans. Humans need not apply. That would actually fix GitHub for humans.
English
7
0
6
1.3K
Ben Werner
Ben Werner@benwerner·
OH in SF: I love cats, they're like plants that move
English
0
0
0
39
Ben Werner
Ben Werner@benwerner·
codex using for (;;) instead of while (true) to escape its RL is proof we're won't escape humanity's inevitable annihilation
English
0
0
0
70