← Blog

Per-Route AI Policies: How To Implement Endpoint-Specific Enforcement in Front of LLM APIs

Per-route AI policies attach a different enforcement rule to each LLM endpoint behind the inspection layer. A request to the customer-support route runs under one policy. A request to the developer-tooling route runs under another. The implementation lets a single inspection layer serve every team without the lowest common denominator policy that an organization-wide rule produces. This piece walks through the data model, the matching algorithm, the policy state that has to be present at decision time, and the operational characteristics that hold up at production scale across OpenAI, Anthropic, Azure OpenAI, and Bedrock endpoints.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Platform & Architectureai-gatewayper-route-policiesinline-enforcementai-architectureidentity-awareaudit
Per-Route AI Policies: How To Implement Endpoint-Specific Enforcement in Front of LLM APIs

Per-route AI policies attach a separate policy bundle to each LLM endpoint that traffic flows through. A request from the customer-support assistant evaluates against the support policy. A request from the marketing copywriter evaluates against the marketing policy. A request from the developer-tooling integration evaluates against the developer policy. The inspection layer matches the route on a structured field at the start of the request and binds the matching policy to the decision before the request reaches the model. The result is one inspection layer that produces different enforcement outcomes for different teams while keeping a single audit pipeline.

I want to walk through how per-route policies are implemented at the request layer, what the policy data model has to carry, where the matching algorithm has to live, and the operational properties that have to hold under load.

What a route is in this architecture

A route in this context is a stable identifier for the workflow that produced the AI request. The identifier is not the model endpoint. The same OpenAI Chat Completions endpoint serves multiple internal workflows, and the inspection layer cannot infer the workflow from the model name. The route identifier is carried by the calling application, attached to the request as a structured field, and signed or otherwise authenticated so the inspection layer can trust it.

Three structures appear in production deployments. The first is a header field on the inbound request, such as X-DeepInspect-Route: support-assistant. The second is a subdomain or a path prefix that the application addresses, such as https://support.gw.acme.com/v1/chat. The third is a per-application API key issued by the gateway itself, where the key carries the route identity. Each one has the same effect at the matching layer: a stable identifier reaches the inspection layer with the request, and the inspection layer uses it as the policy bundle key.

The policy data model

Each route binds to a policy bundle that the inspection layer evaluates at request time. The bundle has six fields that recur across production deployments.

The first is identity scope. Which roles, groups, or tenants are permitted to call this route at all. A request from a user outside the scope fails with a 403 before any other policy fires.

The second is data classification rules. Which prompt-content classes are permitted, which are redacted, and which block the request. The customer-support route may redact PII and pass through everything else. The developer-tooling route may pass through internal code identifiers but block source code that is licensed under restricted terms.

The third is model authorization. Which model endpoints this route is permitted to call. A route bound to a regulated workflow may only call gpt-4-turbo in the EU region. A route bound to a developer workflow may call gpt-4o or claude-sonnet-4 freely.

The fourth is rate and cost limits. Per-tenant and per-route caps that the inspection layer enforces deterministically. The developer route may have a higher token budget than the marketing route, and the inspection layer applies the right cap based on the route.

The fifth is response handling. Whether responses are inspected, whether function calls are evaluated for downstream policy implications, and what to do if the model returns a refusal pattern that the policy wants to convert to a block.

The sixth is audit metadata. Tags, ticket identifiers, project codes, and any other field the audit consumer wants stamped on the per-decision record. The bundle stores the schema for these fields so the inspection layer rejects requests that fail to carry them.

The matching algorithm

The matching algorithm runs at the start of the request and binds exactly one policy bundle to the decision. Three properties determine whether the implementation holds up under regulator review.

The first is determinism. Two requests with the same route identifier and the same identity context have to bind to the same policy bundle. Implementations that compute the policy bundle from a heuristic, a fuzzy match, or a most-recently-edited rule fail this property. The match has to be a structural lookup against an indexed policy table.

The second is failure mode. A request whose route identifier is unknown to the inspection layer cannot fall through to a permissive default. The deployment failure mode of last resort is deny. A new application that ships without a registered route does not silently route to the inspection layer's catchall policy. It fails closed and the inspection layer logs the unknown route for operator review.

The third is policy version pinning. The audit record stamps the policy version that evaluated the decision, not just the policy name. When an operator edits the support-assistant policy in production, every subsequent request runs under the new version, and the audit record makes the version explicit. The auditor that pulls a sample decision a year later can reproduce the exact policy state.

Where the implementation lives in the request path

The inspection layer for per-route policies sits between the calling application and the LLM endpoint. The architectural choice that matters is whether the policy lookup happens on the request path (sub-50 ms decision before the model API call) or after the model returns (post-hoc). Per-route policies that fire post-hoc fail the EU AI Act Article 12 and the Fannie Mae LL-2026-04 traceability requirement because the decision is not in line with the action.

The matching step has to complete before the request reaches the model. Most production deployments place the inspection layer as an HTTP proxy in front of the LLM endpoint. The calling application replaces https://api.openai.com/v1/chat/completions with the inspection layer's URL, and the inspection layer forwards the request to OpenAI after the policy match and the policy evaluation pass. The same architecture works in front of Anthropic, Azure OpenAI, Bedrock, and self-hosted endpoints because the inspection layer speaks HTTP.

Operational properties under load

A per-route policy implementation runs against the same load envelope as the LLM endpoints it fronts. Three operational characteristics hold up in production.

The first is policy lookup latency under 5 ms. The lookup is an indexed read against the policy table keyed by the route identifier and the tenant. Implementations that hit a database round trip on every request fail this property at scale. The lookup table sits in memory and reloads on policy change.

The second is policy evaluation latency that matches the LLM's natural variance envelope. End-to-end inspection-layer overhead measures under 50 ms in production deployments. LLM inference takes 500 ms to 5 seconds. The inspection-layer overhead is invisible relative to the model's response time.

The third is the audit commit. The audit record commits before the inspection layer forwards the response to the application. A crash between the model returning and the application receiving the response leaves a record of the model call. A crash between the model receiving the request and committing the record leaves a record of the block or modification decision. The application never sees a response without a record committed.

Regulatory framing

EU AI Act Article 12 expects records that bind a specific decision to a specific identity, policy state, and outcome. Per-route policies produce these records by construction: every decision stamps the route identifier, the policy version, the identity scope match, the data classification outcome, and the model authorization outcome. Article 26 deployer obligations are satisfied by the per-route audit record. Fannie Mae LL-2026-04, NIST AI agent identity and authorization, HIPAA 45 CFR 164.312, and the DORA operational resilience requirements all consume the same record format.

Organizations that run a single organization-wide AI policy through their inspection layer satisfy the audit requirement at the lowest common denominator. The record exists for every request, and that satisfies the existence test. It fails the traceability test the moment the auditor asks why the support team's PII handling is the same as the legal team's privileged-information handling. Per-route policies produce a defensible answer.

DeepInspect

This is the gap DeepInspect closes. DeepInspect sits inline as an HTTP proxy between calling applications and any LLM endpoint, evaluates a per-route policy bundle for every request, and commits a per-decision audit record before forwarding to the model. The route identifier reaches the policy table on every call. The policy bundle determines identity scope, data classification, model authorization, rate limits, and response handling. The audit record stamps the route, the policy version, the identity context, the data classification outcome, and the decision in a record that a regulator and an enterprise auditor accept as evidence.

The architecture lets one inspection layer serve every team in the organization without the lowest common denominator policy that a single global rule produces. Customer-support workflows run under customer-support rules. Developer tooling runs under developer rules. Regulated workflows run under the policy the regulator expects. The audit pipeline is one pipeline, the policy table is one table, and the operational toil scales sub-linearly with the number of routes.

If you are running a single global AI policy because per-route enforcement felt too operationally heavy, let's talk.

Frequently asked questions

How does per-route policy enforcement compare to a single organization-wide AI policy?

A single organization-wide policy satisfies the existence requirement and fails the traceability requirement. Every team's request evaluates against the same rule, which forces the policy to the lowest common denominator that every team can live with. The legal team's privileged-information handling, the support team's PII handling, the developer tooling's source code handling, and the marketing team's brand-asset handling all run under one rule. Per-route enforcement attaches a separate policy bundle to each workflow. The legal route blocks privileged information from leaving the perimeter, the developer route redacts source code that is licensed under restricted terms, and the support route redacts PII. Each policy is tuned to the workflow without compromising any of them.

What happens when a calling application does not send a route identifier?

The inspection layer fails closed on unknown routes. A request that arrives without a valid route identifier registered in the policy table is rejected with a 403, the rejection is logged, and the operator gets a notification that an unregistered application has called the inspection layer. The deployment workflow for a new application includes a route registration step where the application owner declares the route identifier and the policy bundle it binds to. A registered route can be either active or disabled; an active route allows traffic, a disabled route fails closed without alarming. The fail-closed default protects against new shadow integrations that ship without going through the policy review.

How does the audit record represent a per-route decision?

The per-decision audit record carries the route identifier, the policy version that evaluated the decision, the identity context (user, tenant, role), the data classification outcome, the model and version called, the decision outcome (pass, block, modified), the timestamp, and a cryptographic integrity signature. An auditor pulling a sample record reconstructs the exact policy state at decision time from the policy version field, the exact identity context from the identity field, and the exact decision outcome from the outcome field. Article 12 of the EU AI Act, Fannie Mae LL-2026-04, NIST AI RMF, HIPAA 45 CFR 164.312, and DORA operational resilience all accept the same record shape.

Can per-route policies coexist with global organization-wide policies?

Yes, in a layered evaluation. The inspection layer evaluates the global policy first as a baseline (for example, never transmit social security numbers to any model) and then evaluates the route-specific policy on top (for example, the support route also redacts the customer's order identifier from prompts). A request that fails the global policy is rejected before the route policy fires. A request that passes the global policy continues to the route policy for further evaluation. The audit record carries both the global policy version and the route policy version so the auditor can reconstruct the full evaluation chain.

How do operators change per-route policies safely in production?

The policy table is version-controlled. Operators edit policies through a change workflow that produces a new policy version, runs the new version against a shadow traffic sample, and promotes the version to production once the shadow run shows no regression. The inspection layer reloads policies on a signal without dropping in-flight requests, and every audit record after the reload stamps the new version. A bad policy version is rolled back by promoting the previous version, which the inspection layer reloads the same way. The audit trail of the policy version field makes regression analysis trivial: every decision under the bad version is identifiable by the version stamp.