AI Gateway Policies for Tool Use: Authorizing Function Calls at the Request Boundary
Tool use turns an LLM call into a sequence of function invocations against the application backend, the file system, third-party APIs, and other tools the model is allowed to call. Each function call has its own authorization scope and its own audit shape. An AI gateway that enforces policy on the model request alone leaves the tool invocations unauthorized. This piece walks through the architecture for authorizing tool use at the request boundary, the per-tool policy shape, and the audit record that captures the full tool-use trace.

Tool use is the default pattern for production LLM applications that do more than text generation. The OpenAI Chat Completions API, the Anthropic Messages API, and most major frameworks support function calling, where the model emits a structured request to invoke a tool, the application runs the tool, and the result feeds back into the model's context. The tool catalog can include database queries, file system reads, HTTP calls to internal services, third-party APIs, and other models. A single LLM request can produce three, five, or fifteen tool invocations in a row.
The model is calling tools on behalf of an authenticated user. Each tool has its own authorization scope. The gateway has to see and enforce policy on every step.
I want to walk through what tool use looks like at the API layer, where a request-only enforcement model fails, and how to authorize tool invocations at the AI request boundary with a per-tool policy and an audit record that captures the full sequence.
How tool use works at the API layer
The application includes a tools field in the request, describing the functions the model is allowed to call.
The model responds with a tool use block requesting an invocation.
The application runs lookup_customer, captures the result, and continues the conversation with a tool_result block. The cycle repeats until the model produces a final message. A single user request can produce ten round trips like this.
Where request-only enforcement fails
A gateway that inspects the inbound LLM request and the outbound model response, and stops there, sees the model's tool use blocks but does not see what the application does with them. The model says "call send_email with this body." The application runs send_email. The gateway has no record of the invocation, no policy check on whether the user is authorized to send email to that recipient, and no audit trail tying the email to the LLM request that produced it.
The model can request tool invocations the user is not authorized to make
The tool catalog describes what the model is allowed to ask for. The user's authorization scope describes what the user is allowed to do. A model that asks for a tool the user is not authorized to invoke produces a request that the application either has to deny or to execute against the user's authority. Without a per-tool policy at the gateway, the decision happens inside the application code, and the audit trail depends on the application logging it.
Tool arguments carry sensitive data
A tool invocation like send_email(to=customer@example.com, body=...) passes the body as a parameter. The body may contain PII, proprietary content, or content the model was not supposed to generate for that recipient. Inspecting the inbound LLM request misses this; the sensitive content shows up in the model's tool use block. Inspecting the outbound response without a per-tool policy applies a generic redaction that does not match the tool's argument schema.
The tool result feeds back into the model context
The tool_result block that the application returns to the model carries data fetched from the backend. That data is now in the model's context for the rest of the conversation. A user with restricted access can drive the model to call a tool that returns data the user could not have queried directly, and the data lands in the model's context window where the next response can echo it back. The gateway has to inspect tool results before they re-enter the model context.
Chained tool calls hide the data flow
Production deployments chain tool calls. A model calls lookup_customer, then lookup_order_history, then summarize_account, then send_email. Each step has its own authorization. Each step produces data that flows to the next step. A gateway that records the LLM request and the LLM response and nothing else has lost the chain.
The architecture for per-tool authorization
A streaming-aware, tool-aware gateway runs four inspection points instead of two.
Inspection point one: the inbound LLM request
The gateway authenticates the request, attaches identity context, and applies policy on the prompt content. The tools catalog is inspected against the user's authorized tool list. Tools the user is not entitled to invoke are stripped from the request or the request is denied, depending on the policy mode.
Inspection point two: the model's tool use block
When the model emits a tool_use block, the gateway inspects it before the application can act. The tool name is checked against the user's authorized tool list for the specific session. The tool arguments are inspected against the tool's argument schema and against the user's authority on the argument values. A send_email(to=...) invocation can be checked against the user's permitted recipient list. The decision is permit, modify, or deny.
Inspection point three: the tool result
When the application returns a tool_result block, the gateway inspects the data before it re-enters the model's context. PII detection runs against the result body. The data classification is applied. Sensitive content that the user was not entitled to receive is redacted at this boundary.
Inspection point four: the final model response
The final response is inspected the same way a non-tool-using response is, against the prompt-level classification and the response classification policies.
The per-tool policy shape
The policy that holds at the tool layer attaches to the tool name and to the user's role.
The policy is a deployment artifact, versioned, and the version goes into every audit record the policy generated. Changes to the policy are change-controlled the same way an IAM policy is.
The audit record for a tool-use sequence
The per-decision audit record carries the LLM request, the model responses, the tool invocations, and the tool results as a single linked structure.
The record reconstructs the full chain. A regulator or an internal investigator can see the LLM request, the tools the model asked for, the decisions the policy made on each one, the data classification on each result, and the final outcome.
DeepInspect
This is the tool-use enforcement pattern DeepInspect was built around. DeepInspect sits at the AI request boundary, inspects every LLM request and every tool invocation, enforces per-tool policy bound to the user's identity and role, and produces a per-decision audit record that captures the full sequence as a linked structure.
For deployments running agentic AI patterns under EU AI Act high-risk classification, the tool-use audit trail is what the regulator inspects under Article 12. The per-tool decisions, the argument-level checks, and the result-level redactions are part of the evidence trail. Application-level audit logs that record "the agent ran" do not produce this evidence.
If your AI deployment uses function calling or tool use and your enforcement model stops at the LLM request, the tool invocations are flowing through unauthorized. Book a demo today.
Frequently asked questions
- Does this architecture apply to agentic AI frameworks like LangChain or LlamaIndex?
The architecture applies to any deployment where the LLM emits structured tool invocations and the application executes them. LangChain agents, LlamaIndex agents, the OpenAI Assistants API, and the Anthropic Messages API with tools all fit the same pattern. The framework that orchestrates the agent loop runs on top of the HTTP layer where the gateway sits. The gateway sees the function calls the model requests and the results the application returns, regardless of the orchestration framework.
- How does this affect latency?
The per-tool policy evaluation adds policy decision time to each tool invocation. The evaluation runs in single-digit milliseconds for pattern-based and table-based policy checks. The overhead is small relative to the tool invocation time, which is typically tens or hundreds of milliseconds for a database query and seconds for an external API call. The gateway runs the policy decision in parallel with any non-blocking pre-fetch work, so the marginal latency in production deployments is below five milliseconds per tool invocation.
- What happens when the model invents a tool argument that the user is not authorized to use?
The argument constraint policy fires at inspection point two. A
send_email(to=ceo@company.com)invocation from a support tier-1 user whose permitted recipients do not include the CEO produces a deny decision before the application executes the tool. The model receives a tool_result describing the policy denial, and the model adjusts its next turn accordingly. The application never executes the unauthorized tool. The audit record captures the attempt.- How do you handle tools that call other LLMs?
A tool that itself calls an LLM is a recursive case. The outer LLM emits a tool_use block, the application executes the tool by calling another LLM, and the inner LLM call goes through the gateway the same way as any other LLM call. The audit trail nests the inner call inside the outer one through the request_id chain. The policy applies at each layer. A deployment that uses multi-model orchestration with policy enforcement at every layer produces an audit trail that reconstructs the full call graph.
- Does the tool-use record carry enough information for EU AI Act Article 12?
The Article 12 record is required to be detailed enough to reconstruct the AI system's decisions. A tool-use sequence is part of the decision. A record that captures the LLM request, the tool invocations the model asked for, the policy decisions, the tool results with classification, and the final outcome satisfies the reconstruction requirement. The identity context, the policy version, and the tamper-evident signature complete the record shape Article 12 expects.