Why not use the framework's built-in tool access controls?

Frameworks vary in what they offer. Some have per-agent tool restriction. None have per-session, per-identity, per-context scoping backed by an external policy engine. And framework-level enforcement runs inside the compromised process when the framework itself is compromised, which is the RCE scenario Microsoft's May 2026 disclosure documented.

How do we handle tools that call other tools?

The tool call chain has to propagate the identity claim. When tool A calls tool B, the request to tool B carries the same identity claim as the request to tool A, plus a lineage attribute that records the invocation chain. The gateway evaluates policy at each step. The ai agent action lineage piece covers the lineage record.

What about MCP servers?

Model Context Protocol servers are tool implementations by another name. The same scoping applies: identity claim on the request, policy evaluation at the gateway boundary, audit record on every call. The MCP server authentication piece covers the identity binding pattern for MCP specifically.

Can we scope tools by prompt content?

Yes, at the transform layer. A tool call that the agent framework selects and the identity is authorized for can still be transformed based on the payload content. For example, an issue_refund tool call where the amount exceeds the identity's threshold gets transformed into a create_approval_request call. The llm response content filter piece covers the transformation patterns.

Does tool scoping slow the agent down?

Adds one policy evaluation per tool call. Sub-millisecond in-process, single-digit milliseconds with a policy sidecar. The agent framework's own overhead per step is 10-100ms; the policy evaluation is a small fraction of that.

How do we test tool scoping policies?

Unit tests over the policy artifacts with known inputs (identity + tool + arguments), integration tests that run the agent framework against a test policy, and canary deployments that ship new policy versions to a subset of traffic. The policy-as-code piece covers the test pipeline pattern.

AI Agent Tool Scoping: The Blast Radius Control That Agent Frameworks Do Not Enforce

An AI agent framework (LangChain, LlamaIndex, AutoGen, Semantic Kernel, the OpenAI Agents SDK) lets the developer register a set of tools the model can call. When the framework evaluates a step in the agent loop, the model selects a tool from that set based on the prompt, the context, and the framework's routing logic. What the framework does not, out of the box, enforce is which tool the calling identity is authorized to use in which context. A prompt injection attack that succeeds against an agent in production is limited only by the intersection of the registered tool set and the authorization the calling application forwarded to the tool call. Tool scoping is the control that shrinks the intersection. I want to walk through the gap the frameworks leave, the scoping model that closes it, and the audit evidence tool scoping produces.

The registered tool set is the framework's answer. The scoped tool set is the security answer.

The gap frameworks leave

Every agent framework's tool registry gives the model access to every registered tool by default. Framework-level access controls (when they exist) live at the level of the whole agent instance, not the level of the individual session.

The failure surfaces in three scenarios. First, the coding agent registered with read_file, write_file, run_shell, and git_push receives a prompt injected via a poisoned pull request. The injection persuades the agent to run_shell a command that exfiltrates credentials. The framework registered the tool. Nothing in the framework asks whether the current user's session should have access to run_shell on this repository.

Second, the customer support agent registered with lookup_order, issue_refund, and create_ticket handles a session for a customer whose complaint contains a payload that mimics a refund request. The agent issues a refund the customer never asked for. The framework registered the tool. Nothing asks whether the specific customer session should be authorized to invoke issue_refund above a specific dollar threshold without human review.

Third, the internal HR agent registered with read_employee_record, write_employee_record, and send_email handles a manager asking about a direct report. The manager's session includes an injected instruction to email the report's compensation history to an external address. The framework registered the tools. Nothing asks whether the manager's session should be authorized to email PII externally.

The Microsoft prompt-to-shell disclosure from May 7, 2026, documented this pattern as a mainstream RCE attack surface. The reframe is that agentic AI is not just a data leak risk; it is a code execution risk shaped by the registered tool set.

Scoping model

The scoping model that closes the gap runs at two layers.

Layer one: identity-to-tool authorization matrix. For each identity in the enterprise, the matrix defines which tools that identity is authorized to invoke in which contexts. The matrix runs alongside the identity provider's role definitions. A support tier one identity can call lookup_order on any customer, issue_refund up to $100 without approval and any amount with approval, and create_ticket freely. A support tier two identity has different thresholds. The matrix expresses the thresholds as policy.

Layer two: per-request context evaluation. At the point of the tool call, the policy engine evaluates the request against the matrix. The identity claim on the request, the tool being invoked, the arguments being passed, and the runtime context (session count of prior tool calls, elapsed time, prior denials) all feed the evaluation. The output is allow, deny, or transform (with modified arguments).

The ai request authorization model piece covers the authorization semantics. The policy-as-code piece covers how the matrix lands as reviewed artifacts.

Where scoping lives in the request path

Scoping enforcement lives at the AI request boundary, not inside the agent framework. Two reasons.

First, the framework is not the security boundary. The framework runs inside the calling application's process (or a sidecar container the calling application manages). Compromising the framework compromises whatever access controls the framework claims to enforce. The ai agent framework RCE piece covers the incident evidence.

Second, framework-level scoping cannot share policy across frameworks. An enterprise running LangChain in one workload, AutoGen in another, and Semantic Kernel in a third has to reimplement scoping three times if the enforcement lives in the framework. A single policy engine at the AI request boundary evaluates all three the same way.

The gateway sits between the agent framework's tool call layer and the underlying HTTP API the tool implementation actually invokes. When the agent framework says "call issue_refund with amount $500", the gateway intercepts the resulting HTTP call to the refund service, evaluates the policy, and permits or denies. The framework's tool implementation does not need to change.

Audit evidence

Tool scoping produces two audit signals that per-request policy enforcement in general does not, on its own, produce.

The denied-tool signal. When the policy denies a tool call, the audit record includes the tool name, the arguments the framework attempted to pass, the identity that would have invoked it, and the policy version that produced the denial. Denied tool calls are frequently the earliest signal of prompt injection attacks or misrouted user intent, so the denied-tool stream feeds incident detection.

The escalation signal. When the policy transforms a tool call (approving a refund but requiring human review, permitting an email but redacting the attachment), the transformation is a signal of policy operating on the edge of the identity's authorization. Escalation events are the events where policy tightening or loosening decisions get made based on real production behavior.

The ai agent lateral movement piece covers how these signals feed detection of multi-step attacks.

Regulatory framing

The EU AI Act's Article 14 (human oversight of high-risk AI) requires that the system be designed and developed in such a way that natural persons can effectively oversee it. Tool scoping is one of the technical mechanisms Article 14 anticipates: the system enforces the authorization boundaries the human oversight framework depends on.

The NIST AI RMF's MANAGE function includes control over the AI system's runtime behavior. Tool scoping is a runtime control, and the policy artifacts plus audit record are the evidence MANAGE looks for. The NIST AI RMF piece covers the function-by-function mapping.

DeepInspect

This is exactly what DeepInspect does. DeepInspect sits inline between the agent framework's tool call layer and the HTTP APIs the tool implementations invoke. For every tool call, the gateway binds the calling identity, evaluates the identity-to-tool authorization matrix against the current context, and permits, denies, or transforms the call. The policy artifacts live in git as reviewed code. The audit record includes tool name, arguments, identity, policy version, and outcome.

The scoping model works across LangChain, LlamaIndex, AutoGen, Semantic Kernel, the OpenAI Agents SDK, and custom agent frameworks, because the enforcement point is HTTP, not framework-specific.

Book a technical deep dive at deepinspect.ai.