← Blog

AI response tool-call validation: the five checks that run before a tool call reaches the executor

When an LLM response contains a tool call, the tool call sits between the model output and a side effect in a real system. Untouched tool calls execute whatever the model produced, including hallucinated tools, malformed arguments, and unauthorized parameters. Production deployments run five checks at the gateway before the tool call reaches the executor: schema validation, tool-allowlist check, argument authorization, idempotency-key attachment, and audit-record production. This piece walks through each check, the failure modes it catches, and how the checks compose across the OpenAI, Anthropic, and Bedrock tool-call formats.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Platform & Architecturetool-callingai-agentai-gatewayagent-securityllm-response
AI response tool-call validation: the five checks that run before a tool call reaches the executor

An LLM tool call is a bridge between model output and system state. The model generates a JSON blob that describes a function name and arguments; the executor calls the function; something in a real system changes. Deployments that pass the tool call directly from the model to the executor accept whatever the model produced. Hallucinated tool names, malformed arguments, unauthorized parameters, and tool calls the caller was not authorized to trigger all reach the executor unchecked.

The five checks below run at the gateway between the model response and the executor. They apply regardless of whether the deployment uses OpenAI's tool_calls, Anthropic's tool_use blocks, or Bedrock's provider-specific format. The normalization step is a precondition; see the llm-multi-model-routing piece for the format handling.

I want to walk through each check, the failure modes it catches, and how the checks compose.

Check one: schema validation

The tool call must match a declared schema. The schema names the tool, the expected arguments, the required fields, and the argument types.

What this check catches

Hallucinated tools. The model produces a tool_calls entry with the name create_purchase_order_bulk when the declared toolset only exposes create_purchase_order. The schema-validation check rejects the hallucinated tool before it reaches the executor.

Malformed arguments. The model produces valid JSON but with a field named customer_id when the schema calls for customerId. The check rejects the call and returns an error the model can retry against.

Missing required fields. The model produces a call to send_email without a subject field. The check rejects the call.

Where it enforces

At the gateway, immediately after the response arrives from the upstream and before any downstream execution. The check runs on the normalized tool-call representation.

The failure semantics

A schema-validation failure returns an error to the model in the next iteration of the tool-use loop, giving the model a chance to correct. Repeated failures on the same call trigger a bounded retry cap; after the cap, the request fails cleanly and produces an audit record indicating the schema-validation loop was exhausted.

Check two: tool-allowlist

The tool the model called must sit on the allowlist for the current caller, role, and workload.

What this check catches

Scope escalation. A caller authorized only for read-only tools calls a mutation tool. A workload authorized only for the ticketing tools calls a payment tool. A model that discovered a tool from context leakage calls a tool that was never intended to be visible to this caller.

Where it enforces

At the gateway, after schema validation. The allowlist keys on the resolved identity, role, and workload from the gateway's earlier resolution. The permit decision from the policy engine carries the authorized tool set, and the check compares the model's tool call against the set.

The failure semantics

An allowlist failure returns an error to the model with a message indicating the tool is not authorized for this caller. The model usually adjusts the plan and picks a different tool. Persistent unauthorized-tool calls after the retry cap produce an audit record with the deny outcome.

Check three: argument authorization

The argument values in the tool call must fall inside the caller's authorization scope.

What this check catches

Parameter abuse. A caller authorized for their own customer records calls get_customer with a customer ID belonging to another tenant. A caller authorized for read-only operations calls update_price with a price that would produce a business-rule violation. A caller authorized to send emails only within their organization calls send_email with an external recipient.

Where it enforces

At the gateway, after the allowlist check. Argument authorization requires the check to understand the tool's semantics: which arguments carry authorization implications, what the caller's scope is, and how to map from the argument value to the scope check. The mapping is deployer-configured per tool.

The failure semantics

Argument authorization failures return an error to the model with a message that names the disallowed parameter without leaking the correct value. The model retries with a corrected argument (usually one within scope). Repeated failures produce an audit record and terminate the tool-use loop.

Check four: idempotency-key attachment

The tool call is decorated with an idempotency key derived from the request ID and a per-tool nonce. Downstream tool servers use the key to reject duplicate executions.

What this check catches

Duplicate side effects under router-level retry. If the router retries the LLM request after a partial failure, the retried request produces a new tool call. Without an idempotency key, the tool server executes both calls. With an idempotency key attached at the gateway, the tool server rejects the duplicate.

Where it enforces

At the gateway, after argument authorization. The key derives deterministically from the request ID, the tool name, and the argument hash, so retries produce the same key. The key attaches to the tool call as an extra parameter or a request header depending on the tool server's contract.

The failure semantics

The tool server rejects duplicates by returning a specific error code. The gateway recognizes the code and returns the original tool result to the caller (or the model, if this is inside a tool-use loop). The user sees one execution regardless of how many retries the router performed.

Check five: audit-record production

The tool call, the argument values (redacted where sensitive), the outcome of the previous four checks, and the executor's response attach to the per-decision audit record.

What this check catches

Missing tool-call evidence. Deployments that record only the LLM completion in the audit record cannot answer the question "which tools did the model call for this specific request." The audit record with tool-call fields answers the question directly.

Provenance for downstream side effects. When a tool call produced a side effect (an order, an email, a database update), the audit record ties the side effect back to the model response, the model version, the policy version, and the natural-person identity behind the request.

Where it enforces

At the gateway, on the response path. The audit record commits before the response returns to the caller.

The failure semantics

Audit-record write failures fail the request. Losing the evidence for a tool call the caller was about to consume is a compliance violation the deployer cannot recover from downstream.

Beyond the five checks

Tool-calling workloads add secondary requirements that some deployments need: rate limits per tool (an agent should not call send_email a thousand times), semantic checks on argument values (a price argument that violates business rules), and cross-call correlation checks (a sequence of tool calls that individually pass but collectively indicate compromise). The five checks above are the base. The extensions attach downstream of them.

DeepInspect

This is the architecture DeepInspect was built to provide. DeepInspect runs the five checks at the gateway between the model response and the tool executor. Every tool call is normalized, schema-validated, allowlist-checked, argument-authorized, and decorated with an idempotency key before it reaches the executor. Failures return errors to the model for the retry loop.

Every check produces fields on the per-decision audit record. The record contains identity, role, policy version, classification, tool name, tool arguments (redacted where sensitive), check outcomes, idempotency key, and executor response. When a regulator, an auditor, or an incident responder asks which tools the model called for a specific request, the audit record answers with the full sequence.

Book a demo today.

Frequently asked questions

What about MCP servers?

The Model Context Protocol (MCP) is the emerging standard for exposing tools to LLM callers. The five checks apply to MCP tool calls the same way they apply to native provider tool calls. The gateway sits between the model and the MCP server; the schema comes from the MCP server's tool list; the allowlist and argument authorization apply per caller.

How does argument authorization work when the argument is free-form text?

Free-form arguments (a search query, a message body) do not receive parameter-level authorization the way structured arguments do. Deployments authorize the tool itself and rely on the tool server's downstream authorization for the argument's content. For sensitive tools (send-email, publish-post), an additional response-side content classifier can inspect the argument before execution.

Can the checks add too much latency?

Schema validation and allowlist checks are microsecond operations against in-memory structures. Argument authorization can be more expensive if it involves lookups; deployments cache the authorization results per caller. In production tests, the five-check pipeline adds under 5 ms to the tool-call path.

What about tools that trigger long-running side effects?

Long-running tools (a batch job, a workflow) usually return an operation ID synchronously and complete asynchronously. The audit record captures the operation ID at gateway commit; the completion event correlates back to the audit record through the ID. The five checks apply at initiation; the completion event does not go back through them.

Can I skip idempotency for read-only tools?

Yes, provided the read-only tool is truly side-effect-free (no logging, no metering, no cache-invalidation). Most tools that look read-only still produce side effects (an audit trail, a rate-limit counter, an analytics event). The safe default is to attach idempotency keys to every tool call and let the tool server decide whether the key matters for its use case.

How does this compose with OWASP LLM Top 10 controls?

The OWASP Top 10 for LLM Applications LLM01 (prompt injection), LLM06 (excessive agency), and LLM09 (misinformation) all touch the tool-call path. Schema validation catches malformed tool calls; the allowlist and argument checks limit agency; the audit record produces the evidence for post-incident review. See the owasp-top-10-for-llm-applications piece for the mapping.