Adam Gold

700 posts

Adam Gold

@AdamGolds

building long running sandboxes @ https://t.co/NQ9yehAYVd (ex-CEO @ Kypso, acquired)

Katılım Nisan 2009

276 Takip Edilen284 Takipçiler

Adam Gold@AdamGolds·30m

Big update for islo.dev: we now support role assumption for AWS and GCP. That means agents can connect to your cloud providers without relying on long-lived static credentials. The days of giving API keys to your agents and worrying it might leak them are over. This is especially useful for organizations running across multiple AWS accounts or GCP projects, where credential management can quickly become messy.

English

Adam Gold@AdamGolds·3h

@mathemagic1an Wrong way to look at it - docker in docker is definitely not the solution. Enterprises need to spin up much more complex envs than can be done with docker in docker

English

Jay Hack@mathemagic1an·15h

Underrated blocker to getting async code agents working for enterprise codebases Many have onerous env requirements not supported by any viable sandbox provider

Ivan Burazin@ivanburazin

Docker in Docker is something almost no sandbox provider supports. For RL workloads specifically, being able to spin up a Docker Compose or a K3S cluster inside a sandbox unlocks an enormous range of workflows that simply don't work anywhere else. That alone has been a meaningful wedge into the research + RL customer segment.

English

Adam Gold@AdamGolds·14h

You have to build sandboxes with a security mindset. We do all of those at islo.dev. More on that soon.

Damian Barabonkov@damian_b

The sandboxing debate is becoming more widespread, but I fear it is focusing on the wrong things. Given too many people are ranking startup times, I decided to write this rebuttal. From Peter’s tweet, his postulation is the correct one: "How would we build software if tokens don’t matter?". In this future, sandboxing will not be about shaving milliseconds off boot. Timings will still matter, but there will be many more layers to this cake. Most critically, sandboxing will be about giving agents a high-fidelity operating environment. I think the winners will have to get these three things right: 1. Full-capability sandboxes Most sandboxes people are hyping are not actually capable enough for serious agentic work. A useful coding agent needs to clone arbitrary repos, install dependencies, run tests, spin up services, debug failures, and iterate. For example, many production repos depend on Docker. If your sandbox cannot run Docker, it is probably not ready for the workflows agents are about to own. 2. Trusted credential handling The moment agents do real work, they need access to real systems. GitHub, Linear, Slack, GMail, cloud accounts, internal dashboards, deployment tools. You cannot just hand the model your secrets and hope for the best. The right abstraction is some version of a trusted proxy: the agent can request actions, but secrets stay controlled, scoped, audited, and revocable. 3. Agent-to-agent communication Today, most sandboxed agents are isolated workers. I do not think that will last for long. The future probably looks more like fleets of specialized agents, each with different tools, permissions, memories, and objectives. Some review code. Some reproduce bugs. Some run benchmarks. Some test security. Some interact with external apps. Those agents will need to talk to each other, delegate, verify, and coordinate. So the sandbox stops being a box, and it starts to become more of a network substrate. That is why I think this market is still wide open. Nobody really knows what the agentic software stack will look like yet. But I am absolutely confident that the winning sandbox provider will not be the one that starts fastest. It will be the one that safely enables agents to do real work.

English

Adam Gold@AdamGolds·17h

@acadictive Let's do it!

English

Ehsan@acadictive·17h

@AdamGolds love to follow that journey ... lets connect.

English

Adam Gold@AdamGolds·9 May

Notice a bug -> crabbox on islo.dev -> merge good times

Peter Steinberger 🦞@steipete

Whenever I investigate a bug, I let codex recreate the exact state in an emphemeral crabbox, verify the bug, fix it, verify the fix. No messy state because local system might be polluted, and no slowdown because I run 10 sessions in parallel. crabbox.sh

English

Adam Gold@AdamGolds·18h

@jjackyliang Maybe @HeyGarrison

English

Adam Gold@AdamGolds·18h

@jjackyliang Is anyone here objective?

English

jacky@jjackyliang·1d

what's the best platform to host vms for agents? ideally with spin down when not used

English

Adam Gold@AdamGolds·21h

Rust

David Uchenna@callmidavid

Uber uses Go. Google uses Go. Twitch uses Go. Dropbox uses Go. SoundCloud uses Go. PayPal uses Go. TikTok uses Go. Netflix uses Go. What’s stopping you from learning Go?

English

Adam Gold@AdamGolds·21h

@callmidavid Rust

English

David Uchenna@callmidavid·1d

Uber uses Go. Google uses Go. Twitch uses Go. Dropbox uses Go. SoundCloud uses Go. PayPal uses Go. TikTok uses Go. Netflix uses Go. What’s stopping you from learning Go?

English

147

564

45.1K

Adam Gold@AdamGolds·22h

@BenjDicken EC2? So you expect the agent to set up their own environment, and you're going to pay when the agent doesn't work?

English

Ben Dicken@BenjDicken·1d

The essential engineering cheatsheet of 2026: agent → while loop subagent → nested while loop agent harness → the rest of the code cloud agent → all the above, on EC2

English

967

33.5K

Adam Gold@AdamGolds·1d

@QuinnyPig @vercel @Cloudflare @awscloud We already built 3, 4, 5 at islo.dev. Working on 6, 7 and 10

English

917

Corey Quinn@QuinnyPig·1d

Been thinking about what an "agent-native cloud" actually needs to look like. Mentioned this, and @Vercel's CEO replied that it'll be them. Cool! Here's the spec they (or @Cloudflare, or some startup not yet invented) actually have to hit. It won't be @awscloud. Thread...

Guillermo Rauch@rauchg

@QuinnyPig It'll be ▲. Would love your feedback. This is our primary focus!

English

394

118.5K

Adam Gold@AdamGolds·3d

@kapilansh_twt We don't share env variables.. we share sandboxes with real environments inside

English

112

kapilansh@kapilansh_twt·4d

how do teams actually share .env variables securely because the options I see are - Slack DM (terrible) - email (worse) - shared Notion doc (somehow even worse) - 1Password or similar - something I'm missing

English

1.1K

1.6K

445K

Adam Gold@AdamGolds·3d

לא כדאי

Rotem Tamir@rotemtam

המיזם הבא. טוויטר יקר, אחרי עשור בdevtools/infra ובחינה מעמיקה של ההזדמנויות בתחום החלטתי שלא בא לי על זה והמיזם הבא שלי הולך להיות בתחום אחר לגמרי. יצאתי לדרך עם מיזם חדש להקים פירמות ראיית חשבון AI Native. בישראל 🇮🇱 (למי שעוקב אחרי תקופה, זה הרגע שמגרדים בראש ואומרים , what?!) אז מכל מיני סיבות (ארחיב בפוסטים הבאים) אני חושב שזה הזדמנות ענקית וגם מאוד מעניינת. אני כבר עובד צמוד עם שתי פירמות ומגבש את החזון המוצרי והעסקי . זה מרגיש שאני נוגע במשהו מאוד גדול מה אני מחפש: - רו״חים שרוצים לעשות reboot לפירמה שלהם AI native - שותף מהתחום (רו״ח/יזם שחי AI) אם אתם כאלה או מכירים אנשים שכדאי לי לדבר איתם, בבקשה חברו אותי אם לא, תהיו חברים ועשו ריטוויט , תודה 🙏 🙏🙏

עברית

Adam Gold retweetledi

Jackson Stokes@jackson_stokes·4d

We partnered with @mercor_ai to test a simple idea: What if knowledge-work agents were just… coding agents? Result: +25% performance, 2x faster, cheaper, and new SOTA on APEX-Agents. @josancamon19

English

100

15.9K

Adam Gold@AdamGolds·7 May

@gm_mertd have you tried out islo.dev?

English

Mert Deveci@gm_mertd·6 May

Still surprised there have not been any serious attempts to launch similar things to exe.dev or sprites in sandboxes

English

565

Adam Gold retweetledi

Alex Shaw@alexgshaw·6 May

TB2.1

terminalbench@terminalbench

We're releasing Terminal-Bench 2.1 to patch 28 of the 89 tasks in Terminal-Bench 2.0 TB2.1 includes • recalibrated limits • fixed solutions • realigned verifiers Per-task breakdowns in 🧵 We'll continue to support TB2 and TB2.1 leaderboards (new submission process 🔜)

QST

4.4K

Adam Gold@AdamGolds·6 May

@jyangballin @KLieret it's still running on islo.dev since yesterday...

English

John Yang@jyangballin·5 May

How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access. Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵

English

102

246

1.5K

708.4K

Adam Gold@AdamGolds·6 May

Very soon they are going to understand containers are not enough to run software. You need real computers. Try to run a whole cluster on a container...

Gemini CLI@geminicli

Scion is a new multi-agent orchestration tool that orchestrates agents (Claude Code, Gemini CLI, Codex, and others) as isolated, concurrent processes. Each agent gets its own container, git worktree, and credentials — so they can work on different parts of your project without stepping on each other. github.com/GoogleCloudPla…

English

Adam Gold@AdamGolds·6 May

@geminicli Very soon they are going to understand containers are not enough to run software. You need real computers. Try to run a whole cluster on a container...

English

1.3K

Gemini CLI@geminicli·5 May

English

458

55.6K

Gemini CLI@geminicli·5 May

Alway-on Agentic Life Cycle 🤖🔄 Learn how to orchestrate multiple Gemini CLI agents as team members with different roles and personas using Scion🌱 Watch the session from Cloud Next 👇 youtube.com/watch?v=ZxFDpm…

YouTube

English

233

29.8K

Adam Gold@AdamGolds·5 May

@adithya_s_k Awesome! I'm planning on publishing agent-benchmarks.com soon, similar domain to RL

English

1.2K

Adithya S K@adithya_s_k·5 May

Excited to release the Ultimate guide to RL environments! Definitions of RL environments differ wildly in the LLM era, so we spent the last month building several RL environments across 6 different frameworks, domains and complexities to map out which are easiest to build with and which can be scaled to 1000s.

English

158

1.2K

220.8K

Keşfet

@mathemagic1an @acadictive @jjackyliang @HeyGarrison @callmidavid @BenjDicken @QuinnyPig @vercel