Ben Werner

575 posts

Ben Werner

@benwerner

Building the most powerful VMs @freestyle_dev, previously @Clerk

San Francisco Katılım Nisan 2012

829 Takip Edilen277 Takipçiler

Ben Werner@benwerner·3d

Mostly auth You can spin up 10 VMs that are forked from a development environment that has everything setup like an authed Claude code, an authed GitHub cli, and an authed gcloud cli. Since it’s a vm you can do long running back ground agents, or experiments, or 10 different rewrites stuff like that.

English

Ronan Berder@hunvreus·5d

@benwerner What pain point does this address?

English

Ronan Berder@hunvreus·6d

Can't see myself moving to managed agents. It's like Replit vs a local IDE: it's marginally more convenient, but not by 10x. Not even 2x. On my local, I can give agents access to any data, code or CLI I wish to. I don't even do it, I ask them to give themselves access. That was @openclaw's main innovation IMHO: sidestep the integration/access conversation by running things locally with very few guardrails.

claire vo 🖤@clairevo

Been testing Claude Managed Agents + ChatGPT agents a bit, and even for tasks of moderate complexity tasks, I much prefer the turn/response style "chat" interface + tools than the "spin up a computer" experience of an Agent. Latency is too high and it does't feel the juice is worth the squeeze.

English

3.6K

Ben Werner@benwerner·3d

@ibuildthecloud I recently flipped to agent in sandbox. Curious to see what you think of rigkit.dev

English

Darren Shepherd@ibuildthecloud·3d

What do you all think, agent in sandbox or agent out of sandbox? Seems like agents want to be out of the sandbox. I don't like it, but I get it. But I don't like it.

English

4.7K

Ben Werner@benwerner·4d

Git worktrees are so clearly not the solution what are we doing

English

Ben Werner@benwerner·6d

@amitpr nix is great - but you often need a bit more. Like auth, editor/ssh entrypoints, db branches we're launching rigkit.dev - should be interesting

English

244

Amit@amitpr·6d

Two months ago I wrote about Nix being the only sane way to manage a Linux system (or, thousands?). Feeling both more vindicated (Mythos + Shai-hulud) and doubtful (Nix can be painful!) today.

English

6.7K

Ben Werner@benwerner·17 May

@damian_b Yea there is a reason employees get $2000 MacBooks. We are figuring all of this at @freestyle_dev As for auth we think we need something like rig.freestyle.sh to solve that weird problem

English

Damian Barabonkov@damian_b·16 May

The sandboxing debate is becoming more widespread, but I fear it is focusing on the wrong things. Given too many people are ranking startup times, I decided to write this rebuttal. From Peter’s tweet, his postulation is the correct one: "How would we build software if tokens don’t matter?". In this future, sandboxing will not be about shaving milliseconds off boot. Timings will still matter, but there will be many more layers to this cake. Most critically, sandboxing will be about giving agents a high-fidelity operating environment. I think the winners will have to get these three things right: 1. Full-capability sandboxes Most sandboxes people are hyping are not actually capable enough for serious agentic work. A useful coding agent needs to clone arbitrary repos, install dependencies, run tests, spin up services, debug failures, and iterate. For example, many production repos depend on Docker. If your sandbox cannot run Docker, it is probably not ready for the workflows agents are about to own. 2. Trusted credential handling The moment agents do real work, they need access to real systems. GitHub, Linear, Slack, GMail, cloud accounts, internal dashboards, deployment tools. You cannot just hand the model your secrets and hope for the best. The right abstraction is some version of a trusted proxy: the agent can request actions, but secrets stay controlled, scoped, audited, and revocable. 3. Agent-to-agent communication Today, most sandboxed agents are isolated workers. I do not think that will last for long. The future probably looks more like fleets of specialized agents, each with different tools, permissions, memories, and objectives. Some review code. Some reproduce bugs. Some run benchmarks. Some test security. Some interact with external apps. Those agents will need to talk to each other, delegate, verify, and coordinate. So the sandbox stops being a box, and it starts to become more of a network substrate. That is why I think this market is still wide open. Nobody really knows what the agentic software stack will look like yet. But I am absolutely confident that the winning sandbox provider will not be the one that starts fastest. It will be the one that safely enables agents to do real work.

Peter Steinberger 🦞@steipete

People freaking out over my AI spend. What nobody sees: Part of what excites me so much about working on OpenClaw is that I'm trying to answer the question: How would we build software in the future if tokens don't matter? We constant run ~100 codex in the cloud, reviewing every PR, every issue. If a fix on main lands, @clawsweeper will eventually find that 6 month old issue and close it with an exact reference. We run codex on every commit to review for security issues (as it's far too easy to miss). We run codex to de-duplicate issues and find clusters and send reports for the most pressing issues. We have agents that can recreate complex setups, spin up ephemeral crabbox.sh machines, log into e.g. Telegram, make a video and post before/after fix on the PR. There's codex that watch new issues and - if it fits our documented vision well, automatically create a PR of it. (that then another codex reviews) We have codex running that scans comments for spam and blocks people. We have codex instances running that verify performance benchmarks and report regressions into Discord. We have agents that listen on our meetings and proactively start work, e.g. create PRs when we discuss new features while we discuss them. We build clawpatch.ai to split all our projects into functional units to review and find bugs and regresssions. We do the same split for security with Vercel's deepsec and Codex Security to find regressions and vulnerabilities. All that automation allows us to run this project extremely lean.

English

111

26.4K

Ben Werner@benwerner·16 May

yknow back in the day using shadcn and tailwind was only for the cool kids

English

Ben Werner@benwerner·15 May

@ibuildthecloud @ifeanyi_we We’re launching rig.freestyle.sh that fixes everyone’s problems here

English

114

Darren Shepherd@ibuildthecloud·15 May

what is "set conditions". everyone runs CI in github actions or something similar, so it's fairly easy to reproduce your execution environment. But if your happy with a worktree, why would a worktree in a container be harder? It's just now you can do more stuff. And yes, it's the only way I can manage a crap ton of things in parallel. Worktrees is just chaos.

English

1.7K

Darren Shepherd@ibuildthecloud·15 May

I hate worktrees so much. What is wrong with you all.

English

411

121K

Ben Werner@benwerner·14 May

if claude is dumb but has taste, and gpt 5.5 is smart but with no taste, what is grok?

English

161

Ben Werner@benwerner·13 May

Giving a like to the bait articles which use photoshop over ai for the craft

DiscussingFish@DiscussingFish

Antony Starr has seemingly been spotted in-costume as Homelander on the set of 'MAN OF TOMORROW.' In theaters on July 9, 2027. (Source: @JustJared)

English

204

Ben Werner@benwerner·13 May

ok guys why does claude keeping using "load-bearing"

English

Ben Werner@benwerner·10 May

@RhysSullivan @dillon_mulroy @kr0der Goated stack

English

209

Rhys@RhysSullivan·10 May

@kr0der effect + tanstack start + cloudflare + drizzle has been nice

English

338

50.8K

Anthony Kroeger@kr0der·10 May

what tech stack do you guys use for side projects? i usually use nextjs/tailwind/trpc/prisma/vercel but i’m trying out tanstack start/drizzle/cloudflare

English

12.1K

Ben Werner@benwerner·5 May

wake up babe, Claude shipped more dots

PostHog@posthog

Introducing PostHog Code, the product editor that: - Understands your product - Identifies usage patterns - Triages bugs and errors for you - Creates PRs to fix them - Continuously monitors and improves your product Join the waitlist: posthog.com/code

English

123

Ben Werner@benwerner·5 May

@ibuildthecloud We have just the solution: freestyle.sh/products/git

English

Darren Shepherd@ibuildthecloud·5 May

We really do just need a GitHub for agents. Not for humans. Humans need not apply. That would actually fix GitHub for humans.

English

1.3K

Ben Werner@benwerner·5 May

Shocker, people are skeptical when someone named “Mr. Wonderful” tries to move in next door

More Perfect Union@MorePerfectUS

Kevin O'Leary's massive data center was approved by a county commission in Utah last night. At 40,000 acres, it would be 2.5x the size of Manhattan. The commission approved the proposal despite opposition from hundreds of locals.

English

101

Ben Werner@benwerner·5 May

Why is this article framed like this? Is this implying the random VC who threw in cash on an early round deserve it more than the guy who made the thing?

Forbes@Forbes

Greg Brockman Testifies Stake In OpenAI Worth Nearly $30 Billion—Despite Investing Nothing go.forbes.com/RN1F1T

English