How is this different from a service mesh that already enforces mTLS between workloads?

The service mesh enforces transport-layer identity between workloads. The mesh does not read the AI request body or evaluate per-request AI policy. The AI request boundary verification asks the layer to read the request body, classify the content, evaluate the policy version of record, and write the per-decision audit record. The mesh and the inspection layer compose: the mesh covers the workload-to-workload identity, the inspection layer covers the AI request semantics.

Does zero-trust for AI require us to abandon the API key model?

The API key model survives for the application-to-inspection-layer leg, where the inspection layer reads the propagated identity claims from the headers the application sends. The leg between the inspection layer and the upstream model uses the model provider's authentication of choice (API key, IAM role, signed request). The change is that the model provider sees the inspection layer as the caller, and the inspection layer sees the application's propagated identity. The per-request user identification happens on the inspection-layer side.

How does the layer handle multi-model deployments where the same caller can route to different model providers?

The route identifier in the policy bundle binds the caller's request to a specific model endpoint. A caller authorized for "summarize customer support tickets" routes to a specific model (for example, an Anthropic Claude deployment) and a caller authorized for "draft marketing copy" routes to a different endpoint (for example, an OpenAI deployment). The inspection layer reads the route identifier, evaluates the policy bundle bound to the route, and forwards to the matched upstream. The audit record carries the route identifier so the security team can review per-route decision patterns.

What does the audit record series look like across a 24-hour window?

A high-traffic deployment can emit millions of records across a 24-hour window. The record store is partitioned by time and joinable on the natural-person, agent, session, and route fields. A query against the store for a specific user across the window returns the user's request series ordered by time, with each record carrying the policy version and decision outcome. The query latency is bounded by the record store's index strategy and runs in seconds for typical regulator-style queries.

Zero Trust Applied to AI Systems: The Per-Request Identity, Policy, and Audit Boundary

The zero-trust architecture pattern replaces the perimeter assumption with per-request verification. The architecture asks four questions on every request: is the caller who they say they are, is the caller's device in the posture the policy requires, does the caller's identity carry the authorization the request needs, and does the request match the policy that is in effect for the route. The architecture writes the answers to an audit store the security team and the auditor can read. The pattern has been operationalized in the network access space (BeyondCorp, ZTNA), the workload identity space (SPIFFE, mTLS service meshes), and the data access space (per-query policy at the query plane). The AI request space is the next surface the architecture has to cover.

I want to walk through the four zero-trust principles and how each one maps to a concrete decision the AI request path has to commit on every call, the new surfaces the AI traffic introduces, the deployment topology that fits a zero-trust posture, and the audit record format the regulator reads.

Principle 1: Verify the caller on every request

The classical zero-trust principle "never trust, always verify" applies to the AI request the same way it applies to a sensitive HTTP API call. The caller's identity has to be verified on every request, not on the session.

The AI request boundary verification reads four claims. The natural-person identifier carries the user the request acts on behalf of, sourced from the SSO. The agent identifier carries the autonomous identity if the caller is an agent rather than a human. The session identifier carries the chat session or the application session the request belongs to. The route identifier carries the AI route the application is calling against (which model, which deployment, which policy bundle).

A request that lacks any one of these claims at the AI request boundary cannot satisfy the audit requirement. The application that calls the AI endpoint with a static API key alone meets neither the identity verification principle nor the audit record requirement. The zero-trust posture asks the inspection layer to refuse such a request at the boundary.

Principle 2: Enforce the least privilege the policy permits

The classical least-privilege principle asks the system to grant the caller only the access the task requires. Applied to the AI request, the principle translates to per-route policy bundles that describe what the caller is allowed to ask, what the caller is allowed to receive, and what classification the response is allowed to carry.

The per-route policy reads the identity claims, the request body classification, and the policy bundle bound to the route. The policy commits one of four decisions: allow (the request reaches the model unmodified), modify (the request is rewritten to fit the policy, often through redaction of disallowed fields), redact (the response is rewritten before reaching the caller, often through masking of disallowed fields), or block (the request fails at the boundary with a structured error).

The least-privilege posture rejects the deployment pattern where every application talks to every model through a single shared API key. The pattern equates to flat-network access in the network zero-trust world. The replacement is per-route policy bundles bound to per-route identities, with the inspection layer enforcing the binding at request time.

Principle 3: Assume the request boundary is hostile

The classical assume-breach principle asks the architecture to behave as if the perimeter is already compromised. Applied to the AI request, the principle has two operational consequences.

The first consequence is the prompt-injection assumption. The request body the application sends to the model can carry instructions the application did not author. The request body has to be classified at the boundary and treated as untrusted until the classification rules out the injection patterns. The classification is a per-request decision the inspection layer commits.

The second consequence is the model-output assumption. The response the model returns can carry content that exfiltrates the data the request was permitted to read. The response body has to be classified at the boundary and trimmed to the classification the caller's policy allows. A response that contains data above the caller's classification is redacted at the boundary before the response reaches the application.

The assume-hostile posture applies to both directions of the AI request: the request body the application sends, and the response body the model returns. The inspection layer reads both at the TLS termination and commits the decisions on both legs.

Principle 4: Verify and audit every decision

The classical principle asks the architecture to write per-decision records to an audit store the security team and the auditor can read. Applied to the AI request, the principle maps to the EU AI Act Article 12 obligation, the NIST AI RMF MANAGE 1.3 obligation, and the ISO 42001 record-keeping obligation.

The audit record format the AI request path commits carries seven fields per request. The record carries the natural-person identifier, the agent identifier where applicable, the session and route identifiers, the policy version that evaluated the request, the decision outcome, the upstream model and version, and the integrity metadata that proves the record was not altered after the fact. The record store applies hash chaining across records so the record series carries tamper-evident properties.

The audit posture is what distinguishes zero-trust from the perimeter security pattern. The perimeter pattern relies on the gateway being trusted to do the right thing. The zero-trust pattern asks the gateway to write down each decision and the policy it applied so a reviewer can verify the gateway did the right thing per request.

The deployment topology that fits a zero-trust posture

The topology places an inspection layer at the AI request boundary between the calling application (or agent) and the upstream model endpoint. The layer terminates the AI provider TLS, reads the request, evaluates the four principles, applies the decision, commits the audit record, and forwards the request to the model. The response runs back through the same layer for the redaction and classification decisions on the response body, and a second audit record commits on the response leg.

The inspection layer is stateless. Each request is evaluated against the propagated identity claims, the policy bundle bound to the route, and the request and response body content. The audit store is the durable side of the layer. The application keeps its own application-level logs. The two record series compose: the application records cover application state, the inspection records cover the zero-trust evidence.

DeepInspect

DeepInspect is the zero-trust inspection layer for the AI request boundary. The product terminates the AI provider TLS, reads the request and response, verifies the propagated identity claims, evaluates the policy bundle per route, applies pass, modify, redact, or block decisions on each leg, and commits per-decision audit records to a tamper-evident store with hash chaining across records.

The product runs as a stateless proxy. The deployer's existing SSO propagates through. The policy bundles per route describe what each AI route is allowed to read, what the response is allowed to carry, and what classifications the record series captures. The record series is the first-party evidence the deployer owes the regulator and the auditor.

If your security team is extending the zero-trust architecture into the AI request path, let's talk today.