@dzhng Runloop humbly throws its hat into the ring 🪖 Happy to get you set up with some credits. If you're running benchmarks or need OCI compliance we have the best ergonomics
Don't want to name names but holy shit some of the sandbox providers out there are so ridiculously unreliable, currently dealing with the 3rd multi-hr downtime this month.
Is it really that hard to start a bunch of microVMs? Or is every infra provider just ripping at the seams?
@dominikkoch Late but opportune plug for runloop.ai devboxes! We'll give you credits & also eng time to help you onboard & build quickly. in the arena 😎
Prompts and vibes won't stop your agent from deleting your database.
If you work in a highly regulated environment, compliance isn't a choice. Learn how to deploy trustless agents in Runloop, including full Deploy to VPC for the strongest security posture.
Learn more: runloop.ai/blog/securing-…
Auditors haven't figured out AI agents yet. But they will.
When they start asking questions like who authorized this action, what did the agent have access to, where's the audit trail, you need answers ready fast. And excuses about it being AI simply won';t cut it.
Security engineers tend to front-run compliance. That's the play here: get your logging, isolation, and access controls in place now so you have good answers before the questions arrive.
Right now the industry is in a "log everything" phase. That's table stakes. The next question is: can you prove what went wrong and whose fault it was when something breaks?
Full conversation: our CEO Jonathan Wall on Techstrong TV — techstrong.tv/videos/latest-…#AIAgents#AISecurity#Compliance#AuditTrail#GRC#DevSecOps#AIEngineering#Runloop
Today we're launching @mastra remote sandboxes
Give your agent a secure, isolated environment to run untrusted user code... like a vibe coding agent...
Hot take: AI benchmarks are mostly theater.
Everyone cherry-picks results, runs them locally on different hardware, and calls it a "fair comparison."
We just shipped something that kills that excuse.
One command. Same environment. Claude vs GPT-4o. AIME math benchmark. 60 parallel trials. Zero setup.
npm install -g @runloop/rl-cli
rli benchmark-job run \
--agent "claude-code:claude-haiku-4-5" \
--agent "codex:gpt-4o" \
--benchmark "AIME" \
--n-concurrent-trials 60
The results don't lie when the infrastructure is identical. Are your benchmarks actually measuring the model? Try it out at runloop.ai/blog/cloud-orc…
We’re launching cloud-orchestrated Benchmark Jobs on Runloop.ai.
Benchmarks that used to take days can now run in minutes.
Instead of writing orchestration scripts, managing environments, and babysitting runs, you can execute compatible benchmarks across thousands of isolated sandboxes in parallel while Runloop handles the infrastructure.
Benchmark authors care about measurement.
Runloop handles execution.
Learn More:
runloop.ai/blog/cloud-orc…
Running an agent that uses MCP is inherently unsafe. Until now.
Protect your API keys and stop MCP-context bloat before it starts with MCP Hub: the first L7 proxy to protect your secrets and consolidate MCP endpoints.
Learn more: runloop.ai/blog/use-mcp-f…
AIOps is growing in important by the day, now mission-critical for enterprise agents in production 🛡️
In this new AI Ops post, @ITBrew highlights why AIOps is more important than ever.
As AI agents move into production systems and operational risk scales fast, isolated execution environments are becoming foundational to safe deployment.
Agents in production is constantly evolving the risk profile and the underlying infrastructure has to evolve with it.
Read on to find out why AI Ops is something you should know in 2026👇
itbrew.com/stories/2026/0…#AIInfrastructure#AIAgents
Don't let your agent leak your API keys!
Introducing Agent Gateway: an L7 proxy that protects your authentication keys at the infrastructure layer. API keys never enter the runtime environment, so your agent can't leak them. Instead, agents receive one-time tokens and can be restricted to approved endpoints with Network Policies.
Zero-trust agents are now possible.
Read more:
runloop.ai/blog/protect-a…
@AnthropicAI - We ran into this at Runloop as well - the filter-js-from-html and gpt2-codegolf Terminal-Bench-2 tasks underspecify their required RAM, and fail even the no-agent 'oracle' solutions if this is enforced. These happen to work when testing locally with Docker, since the default Docker config allows arbitrary swap space (no hard memory limit).
We have also encountered flakiness and benchmark behavior regressions over time due package version issues. Eg, the mteb-leaderboard oracle expects the Pillow python package to be installed. It is likely that Pillow was brought in as a dependency of some other library, but a more recent version fails to include it.
If others run into these issues, you can patch in this PR to get more reliable behavior on these Terminal-Bench-2 tasks:
github.com/laude-institut…
New on the Engineering Blog: Quantifying infrastructure noise in agentic coding evals.
Infrastructure configuration can swing agentic coding benchmarks by several percentage points—sometimes more than the leaderboard gap between top models.
Read more: anthropic.com/engineering/in…
What to expect from Tunnels v2 for Runloop devboxes:
→ Instant availability — no waiting on DNS
→ Built-in bearer token authentication
→ Seamless port switching across services on the same devbox
→ Better performance and reliability
Enable a tunnel at devbox creation or on a running devbox and it's ready immediately.
Full writeup on how it works and how to get started at: runloop.ai/blog/tunnels-v…
We're releasing a new version of our popular tunnels feature. Set up a tunnel to instantly open a port and communicate with any devbox. Now with support for bearer token authentication, network policy & port config.
Learn more & get started:
runloop.ai/blog/tunnels-v…
The shift from "CLI integration" to "dedicated package" is worth noting.
It signals that sandboxed execution environments are becoming a core dependency for agent frameworks, not an afterthought.
Python: pip install langchain-runloop
JS: available in the DeepAgents sandbox providers
Python docs: docs.langchain.com/oss/python/int…
JS docs: docs.langchain.com/oss/javascript…#DeepAgents#DevTools
@LangChainAI just moved the Runloop integration into a dedicated package inside the DeepAgents SDK.🧩
Our devboxes are now a first-class sandbox backend.
Spin up isolated environments, execute code, tear down cleanly: all from within the framework.
Docs: docs.langchain.com/oss/python/int…