How does per-route policy interact with rate limits?

Per-route rate limits sit alongside per-route policy. The same route pattern that selects a policy rule can drive a per-route, per-role, per-user rate budget. The audit record captures the rate-limit outcome the same way it captures the policy outcome.

What about routes the policy author has not yet covered?

The inspection point defaults to deny on unknown routes when the deployer ships in fail-closed mode. The default is configurable. Most regulated deployments run fail-closed by policy. Unknown routes also trigger an alert so the policy author can add the missing route or the deployer can investigate why the application called an unexpected endpoint.

Can the route pattern match on request fields beyond the URL?

Yes. The pattern syntax supports matching on the URL path, the query parameters, request headers, and selected request-body fields. A common pattern uses the URL path to bind the rule to the LLM endpoint and the body shape to bind the rule to the model identifier inside the request.

How does the policy artifact get distributed to the inspection points?

The policy artifact is a versioned configuration the deployer ships to every inspection point. Updates propagate through the same change-management process the deployer uses for other security configuration. The inspection point caches the active policy version and stamps it on every decision record. A policy update touches the active version field; old decisions retain the version they were evaluated under.

How does the route pattern interact with NIST AI agent identity Pillar 2?

NIST Pillar 2 is delegated authority: the per-request, per-role, under-this-policy evaluation. The route pattern is one of the inputs to the Pillar 2 decision. The route narrows the rule set; the role refines it; the prompt classification finalizes it. The composition is the implementation of Pillar 2 at the inspection point.

Per-Route AI Policies: Attaching Policy to the URL Path, Not the Application

Per-route AI policies attach the policy decision to the API route the request is calling, not to the application that initiated it. A chat-completion endpoint, an embeddings endpoint, a file-upload endpoint, a batch endpoint, an audio endpoint, and the agent action surfaces each carry different risk profiles, different data shapes, and different blast radii. A single policy applied uniformly across all of them is either too permissive on the high-risk surfaces or too restrictive on the low-risk ones. The architecture that satisfies the 2026 regulatory expectations attaches rules to the route, evaluates them at the inspection point, and composes them with per-role and classification rules.

I want to walk through what per-route policy looks like in practice, how route patterns express AI-specific constraints, and how the architecture composes per-route with per-role and prompt-level classification at the inspection point.

Why route-level granularity matters

Different LLM API routes carry different risk profiles. The chat-completion route carries the prompt that the user pasted. The embeddings route carries the document the application is vectorizing. The file-upload route carries a binary that the model will operate on. The batch route submits hundreds or thousands of requests at once. The audio transcription route accepts recorded speech. The agent action route triggers actions in the real world.

A blanket "deny PII to LLM" policy at the application layer treats all six routes as the same surface. The embeddings route receives the document that the chat-completion route would have rejected. The batch route bypasses the per-request review the chat-completion route enforces. The audio route carries voice data the document-level classifier was not built for. The agent action route ships the policy decision to the model and accepts whatever the model decides to do.

Route-level granularity solves the asymmetry. Each route gets a rule set tuned for the data shape it accepts and the action it produces.

How route patterns express AI-specific constraints

A per-route policy is a set of rules whose when clause matches the route path and the route-specific request shape. The pattern matches on the path template, the model identifier where applicable, and any route-specific request fields.

Each rule encodes a constraint the route-level shape makes explicit. The chat-completions rule trusts the role to receive redacted PII. The embeddings rule denies the same class of data because the embedding persists in a vector store and is hard to revoke. The file route blocks ID scans entirely. The batch route blocks regulated data because the per-item evidence trail diverges from the synchronous path. The audio route enforces residency. The agent route enforces actor role.

Composing per-route with per-role and classification rules

The per-route layer is the outer match. The per-role and classification layers run inside it.

The route matches first because the route determines which other rules apply. A role-level rule that permits PII for a customer-support agent only makes sense on the chat-completions route, not the embeddings route. The route narrows the rule set before the role and classification layers fire.

The composition runs at the inspection point. Each rule is evaluated against the request, the decisions are aggregated, and the deny-overrides combinator returns the final outcome. The audit record carries the route, the role, the classification labels, the policy version, and the outcome.

Why deny-overrides is the safe default combinator

Multi-rule policies need a combinator that resolves conflicts when two rules return different decisions. Deny-overrides means any rule returning a deny ends the evaluation with a deny.

The alternative combinators (permit-overrides, first-applicable, only-one-applicable) all have specific use cases. In an AI usage context, the deny-overrides default fits the EU AI Act Article 12 expectation that the system fails closed when uncertainty exists. A permit-overrides combinator allows a permissive role rule to override a route-level restriction the auditor would have wanted to enforce.

The deployer should default to deny-overrides and document any deliberate exception. The policy version, including the combinator choice, gets stamped on every decision record.

How the route pattern handles model-agnostic deployments

The route is a property of the upstream LLM endpoint, not of DeepInspect. OpenAI uses /v1/chat/completions. Anthropic uses /v1/messages. Azure OpenAI uses /openai/deployments/{deployment}/chat/completions?api-version=.... Bedrock uses /model/{model-id}/converse. The route pattern in the policy matches on the unified upstream path, and the inspection point normalizes the request into a model-agnostic representation before classification.

A model-agnostic policy lets the deployer write one chat-completion rule that covers OpenAI, Anthropic, Azure OpenAI, and Bedrock simultaneously. The route pattern is a list of equivalent paths across providers. The deployer adds a new path when adopting a new provider; the rules stay the same.

How per-route policy feeds the audit record

The route name is one of the canonical fields on the per-decision audit record. The auditor or the regulator filters the record set by route to see, for example, all batch submissions that included regulated data classification labels. The per-route view of the audit set is one of the most useful query shapes during a post-incident review.

The record also captures the rule IDs that fired on the request. The auditor can trace from the decision back to the rule that produced it and back to the route pattern that activated the rule. The traceability is the property an external review reads as evidence the policy was applied as written.

DeepInspect

This is the architecture DeepInspect was built to provide. DeepInspect sits at the AI request boundary as a stateless proxy, evaluates per-route, per-role, and prompt-level classification rules in one inspection step, and writes a signed per-decision audit record that captures the route, role, classification, policy version, and outcome.

The deployer authors policy once and the same inspection point applies it across OpenAI, Anthropic, Azure OpenAI, Bedrock, Vertex, and self-hosted endpoints. The route pattern is the unifying primitive: one rule covers the equivalent paths across providers, and the audit record reads the same regardless of which provider served the request.

If you are designing a per-route AI policy model for your AI traffic, book a demo today.