← Blog

AI Agent Sandbox: The Runtime Isolation Model That Contains Blast Radius When the Prompt Turns Hostile

An AI agent that executes tool calls, writes files, runs shell commands, and reaches network endpoints from inside the same process as the calling application inherits the caller''s ambient authority. A sandbox around the agent runtime confines that authority so a successful prompt injection cannot escalate beyond what the sandbox permits. This covers the sandbox properties, the process versus container versus VM trade-offs, and the audit signal the sandbox produces at each boundary crossing.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Problem-Awareai-agent-securityagentic-aisandboxruntime-isolationblast-radiusdefense-in-depth
AI Agent Sandbox: The Runtime Isolation Model That Contains Blast Radius When the Prompt Turns Hostile

An AI agent running inside the calling application's process, on the calling application's file system, with the calling application's network access, inherits every capability the calling application has. When a prompt injection succeeds against the agent (Microsoft's May 7, 2026 disclosure covered the pattern), the escalation runs against whatever the calling application can reach. Filesystem, credential stores, adjacent services, egress paths. A sandbox around the agent runtime confines the escalation to what the sandbox permits, which is what the operator explicitly grants rather than what the ambient environment happens to allow. I want to walk through the sandbox properties, the deployment trade-offs across process, container, and VM isolation, and the audit signal each boundary crossing produces.

The ambient authority the agent runtime inherits is the attack surface a sandbox subtracts.

Sandbox properties

Five properties define an AI agent sandbox useful for blast radius containment.

Filesystem confinement. The sandbox restricts filesystem access to a defined directory or set of paths. Attempts to read or write outside the confined area are denied at the sandbox boundary. Confinement is stricter than the calling application's own filesystem permissions because the sandbox draws the boundary at the point of the agent runtime, not at the process's uid.

Network egress confinement. The sandbox restricts outbound network access to a defined allowlist of endpoints. The agent's tool calls to allowed endpoints (the LLM provider, a specific internal API, a whitelisted external service) pass through. Attempts to reach any other endpoint are denied. The ai agent egress control piece covers the egress side in depth.

Credential isolation. The sandbox does not inherit the calling application's credentials. Credentials the agent needs (a database service account, an API key for an internal service, a signed identity claim) enter the sandbox explicitly through a controlled interface. When the sandbox terminates, the credentials it held are unreachable.

Resource bounds. CPU, memory, and wall-clock limits confine the agent's resource consumption. A runaway agent, whether from prompt injection or from a benign infinite loop, exhausts its bounds and terminates rather than exhausting the host.

Ephemeral state. The sandbox instance is per-session or per-request. When the session terminates, the sandbox and any state it held (temporary files, in-memory caches, cached retrieval hits) are destroyed. Long-running state that has to survive between sessions lives outside the sandbox in an explicitly-shared store.

Process, container, VM

Three implementation levels produce different trade-offs.

Process-level (seccomp, unshare, gVisor). The sandbox uses OS-level primitives to restrict the agent process's syscall surface, namespace access, and capability set. Startup latency is millisecond-scale. Boundary strength depends on the underlying kernel's isolation guarantees. gVisor adds a user-space kernel that shrinks the trusted kernel surface at moderate performance cost.

Container-level (Docker, containerd, Kata). The sandbox runs the agent inside a container with a defined image, network, and mount set. Startup latency is hundreds of milliseconds. Boundary strength is stronger than process-level (namespace isolation is enforced by the runtime, not the process itself) but shares the host kernel. Kata Containers adds a VM-level boundary while retaining the container interface.

VM-level (Firecracker, QEMU, WebAssembly runtimes). The sandbox runs the agent inside a lightweight VM or WASM sandbox with a dedicated kernel (or no kernel, in the WASM case). Startup latency ranges from tens of milliseconds (Firecracker) to hundreds of milliseconds (heavier VMs). Boundary strength is the strongest of the three, at the cost of higher overhead per session.

The choice depends on session concurrency, latency budget, and the sensitivity of what the agent handles. Deployments running thousands of concurrent agent sessions with sub-second latency budgets typically pick process or container level. Deployments running fewer, higher-stakes sessions pick VM level.

Where the sandbox sits relative to the AI request boundary

The sandbox contains what the agent can do inside the sandbox. The AI request boundary controls what the agent can call outside the sandbox. The two are complementary.

Prompt injection that succeeds against the model produces a set of tool calls. The tool calls that stay inside the sandbox (filesystem reads/writes to the confined area, in-sandbox scripting) are contained by the sandbox. The tool calls that leave the sandbox (HTTP requests to external services, calls to the LLM provider for further inference, calls to internal APIs) pass through the AI request boundary. The boundary evaluates the four authorization predicates (identity, model, data classification, policy) on each outbound call.

The ai request authorization model piece covers the boundary. The ai agent tool scoping piece covers the tool authorization that runs at the boundary.

Audit signal at each boundary

Each sandbox boundary crossing produces an audit signal. The signals feed both operational observability and incident review.

  • Filesystem access: every read and write outside the confined area, whether denied or (for read-only monitoring paths) permitted.
  • Network egress: every outbound connection attempt with destination and outcome.
  • Credential access: every retrieval from the credential interface.
  • Resource limit approach: approaching CPU, memory, or wall-clock limits, with a distinct signal for termination.
  • Session lifecycle: sandbox creation, session identity claim, and sandbox destruction.

The ai agent observability piece covers the OpenTelemetry pattern for these signals. The ai audit logs format spec covers the durable audit record fields.

Regulatory framing

Under EU AI Act Article 14, high-risk AI systems have to be designed and developed so natural persons can effectively oversee them. Sandboxing is one of the technical measures that supports oversight by ensuring the AI system operates within defined runtime bounds.

Under NIST AI RMF's MANAGE function, runtime containment of AI system behavior is one of the controls the function anticipates. The audit signals from the sandbox feed the MANAGE evidence chain.

DeepInspect

This is exactly what DeepInspect does at the AI request boundary. DeepInspect does not run the agent sandbox itself; the sandbox is a runtime concern (Firecracker, Kata, gVisor, or WASM). DeepInspect enforces the outbound boundary. Every HTTP request from inside the sandbox to an external service passes through the DeepInspect gateway, which evaluates the four authorization predicates. The audit record connects the sandbox session identity to the network calls that left the sandbox.

The pattern gives operations teams two layers to reason about independently. The sandbox contains what the agent can do internally. The gateway controls what the agent can reach externally. Compromise of one does not compromise the other.

Book a technical deep dive at deepinspect.ai.

Frequently asked questions

Do we need a sandbox if we have an AI request boundary?

For agents that only call remote APIs (nothing local), the AI request boundary handles the containment. For agents that execute local code (running scripts, writing files, calling shell tools), the sandbox is the layer that contains the local operations the boundary never sees.

Which sandbox technology should we pick?

For process-level, gVisor with a defined seccomp policy. For container-level, Kata Containers with a minimal image. For VM-level, Firecracker with per-session snapshots. The ai agent runtime protection piece covers the current landscape.

What is the latency cost of running each agent session in a sandbox?

Process-level: single-digit milliseconds startup. Container-level: 100-500ms. VM-level with Firecracker: 20-100ms with a pre-warmed pool, 500-2000ms cold start. The pre-warmed pool pattern brings VM-level close to container-level in typical operation.

How does sandboxing interact with LangChain and AutoGen?

The frameworks run inside the sandbox. The framework's tool implementations either stay inside the sandbox (a local Python interpreter tool, a filesystem read) or make outbound calls (HTTP tools, DB tools) that pass through the AI request boundary. Neither framework has to know about the sandbox.

What about MCP servers?

Model Context Protocol servers can run inside the sandbox as local processes or outside the sandbox with the agent reaching them through the request boundary. Local MCP servers give the agent lower-latency access at the cost of a larger sandbox trust surface. Remote MCP servers put the boundary check on every call. The MCP server authentication piece covers the identity binding.

Can we skip the sandbox for read-only agents?

Depends on the read scope. A read-only agent that queries a database is still a network egress path, and the network boundary still applies. A read-only agent that only reads files from a fixed, confined directory has a smaller attack surface and might tolerate a lighter sandbox. The threat model has to justify the choice.