← Blog

Identity-Aware AI Gateway: Why Per-User, Per-Role Policy Has to Live at the Request Boundary

An identity-aware AI gateway attaches the enterprise IdP identity to each AI request, evaluates per-user and per-role policy at the request boundary, and commits the audit record with identity context bound at decision time. The architecture differs from generic gateways that operate on application credentials only. The EU AI Act Article 19 identity-of-natural-persons requirement, the NIST agent identity framework, and the post-authentication gap each push the gateway to attach identity at the request rather than the session.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
AI Security Solutionsai-gatewayidentityinline-enforcementai-policy-enforcementaudit-logscompliance
Identity-Aware AI Gateway: Why Per-User, Per-Role Policy Has to Live at the Request Boundary

An identity-aware AI gateway attaches the enterprise IdP identity to each AI request, evaluates per-user and per-role policy at the request boundary, and commits the audit record with identity context bound at the moment of the decision. The architecture differs from a generic AI gateway that operates on application credentials only. The difference is the level of identity granularity the gateway sees: an application identifier versus the natural person or agent acting through the application. EU AI Act Article 19 explicitly requires the identity of natural persons involved in result verification. The NIST AI agent identity and authorization framework, whose comment period closed April 2, 2026, treats agent identity as a first-class principal. The post-authentication gap (between authenticated session and authorized action) closes only when identity attaches at the request rather than the session.

I want to walk through what identity-aware means at the gateway level, how the identity actually attaches to the request, where the policy decision lives, and the audit record the architecture produces.

What identity-aware means at the gateway level

The conventional API gateway sees the calling application as the principal. The application authenticates with an API key, an OAuth token, or a service credential. The gateway records the application identifier and applies policies scoped to the application. The decision boundary is "is this application allowed to call this API."

The identity-aware AI gateway sees the natural person, the agent, and the role behind the application. The application supplies the identity context as a request-level attribute: a verified JWT carrying the IdP claim, a header carrying the SSO session identifier, or a service-mesh identity propagated through the request chain. The gateway records the natural person or agent and the role, and applies policies scoped to that identity. The decision boundary is "is this person, acting in this role, allowed to send this prompt to this model with this classification."

Identity attaches at three levels

The first level is the human user: the natural person whose action initiated the workflow. The IdP claim names the person.

The second level is the role: the authorization context the person carries in the enterprise (engineering, finance, support, executive, legal). The role is the policy variable that decides which models, which data categories, and which routes the person can use.

The third level is the agent: where an AI agent is acting on behalf of the user, the agent's own identity sits alongside the user identity. The delegation from the user to the agent is the additional policy variable.

How identity actually attaches to the request

Three mechanisms work in production.

JWT propagation from the application

The application validates the user's session through the IdP and produces a JWT that names the user, the role, and (where applicable) the agent. The application passes the JWT as a header on the AI request. The gateway validates the JWT signature against the IdP's public key, extracts the claims, and uses them for the policy decision.

The mechanism requires the application to participate. The application owns the JWT issuance and the propagation. The trust model is that the application correctly identifies the user before producing the JWT.

Service mesh identity propagation

In a service-mesh deployment, the calling service's identity propagates through the mesh's identity primitives (SPIFFE, Istio mTLS identities, Envoy filters). The mesh's identity is the service. The user identity must still be supplied as application-level context, which the mesh carries as a header.

The mechanism works well in microservice architectures where the AI request originates from a service that already participates in the mesh. The mechanism does not extend cleanly to browser-based or desktop-application AI requests.

SSO-aware proxy mode

The gateway acts as an SSO-aware proxy that intercepts the user's session and validates the IdP session at request time. The mode works for browser-based AI requests where the user signs in through the IdP and the proxy sees the session cookie. The mode also supports desktop applications that participate in the same IdP flow.

The mechanism gives the gateway direct visibility into the user identity without requiring the application to participate. The trade-off is that the gateway must be on the user's network path, which constrains the deployment topology.

Where the policy decision lives

The policy decision lives at the gateway, evaluated against the identity claims, the prompt classification, the role, and the route.

Per-user policy

Specific users may have specific authorities or specific restrictions. A user designated as a privacy reviewer may have authority to see PII in summarization tasks that other users do not. A user under an HR investigation may have temporary restrictions on AI tool usage. The per-user policy is the exception layer.

Per-role policy

The bulk of policy decisions operate at the role level. Engineering may use Copilot Business. Sales may use the sanctioned LLM through the CRM integration. Finance may use the sanctioned LLM with financial data classification. The per-role policy is the default authorization scope.

Per-route policy

The route is the (model, endpoint, task) tuple. A given role may be allowed to call GPT-4 for general summarization but not for source-code reasoning. A different role may be allowed to call Claude for legal review but not for customer messaging. The per-route policy decomposes the gateway's enforcement matrix.

Per-classification policy

The prompt classification is the data variable. A prompt containing customer PII may be permitted for some roles and blocked for others. The classification is evaluated at request time, before the model receives the prompt.

The four policy dimensions (per-user, per-role, per-route, per-classification) combine at the gateway. The decision is contemporaneous: the gateway evaluates the matrix at the moment of the request and commits the decision to the audit record.

Compliance angle

The EU AI Act Article 19 identity requirement is the explicit hook for identity-aware enforcement. The Article 26 deployer obligation runs parallel: the deployer must ensure input data is relevant and must monitor system operation. The NIST AI agent identity and authorization framework codifies agent identity as the third-pillar requirement. DORA Article 19 applies the same record requirements to financial entities. The Fannie Mae LL-2026-04 governance framework, which takes effect August 6, 2026, applies them to mortgage origination.

DeepInspect

This is exactly what DeepInspect does. DeepInspect runs as an identity-aware AI gateway that sits at the AI request boundary as an external enforcement layer, operating as a stateless proxy between authenticated users or agents and any LLM endpoint. The gateway accepts identity context through JWT propagation, service-mesh identity, or SSO-aware proxy mode. Every HTTP request is evaluated against per-user, per-role, per-route, per-classification policy. The per-decision audit record is committed by the proxy, independent of the application and independent of the LLM provider, before the model response returns.

The record contains a verified identity for the natural person, the role and authorization context, the agent identity where applicable, the data classification applied to the prompt, the model and version called, the policy version that governed the decision, the decision outcome, and a cryptographic signature that prevents post-hoc modification.

Book a technical deep dive at deepinspect.ai.

Frequently asked questions

How does the identity-aware gateway interact with the enterprise IdP?

The gateway consumes the IdP's identity claims through one of three propagation mechanisms: JWT validation, service-mesh identity, or SSO-aware proxy mode. The gateway does not duplicate the IdP. The IdP remains the source of truth for identity and authentication. The gateway is the enforcement layer that attaches the IdP-issued identity to each AI request.

Can the gateway handle agent identity separately from user identity?

Yes, and it should. An AI agent acting on behalf of a user carries the agent's own identity alongside the user identity. The policy at the gateway evaluates both: the agent must be authorized to take the action, and the action must be within the delegation the user granted the agent. The per-decision record captures both identities and the delegation context.

What if the application doesn't propagate identity?

Without identity propagation, the gateway falls back to the application's service credential as the only identifier. The Article 19 identity-of-natural-persons requirement is not satisfied. The deployment path for identity-aware enforcement requires either application-side JWT propagation, service-mesh identity, or SSO-aware proxy mode. The choice depends on the application architecture.

Does this work for multi-tenant SaaS applications?

Multi-tenant SaaS that calls AI on behalf of its own customers carries the customer's identity through the SaaS application to the AI call. The identity-aware gateway evaluates per-tenant policy in addition to per-user policy. The tenant becomes an additional policy dimension, scoped to the customer's contract with the SaaS.

How does the gateway scale with policy complexity?

Policy evaluation is structured as a directed evaluation against a typed policy document, not as a free-form rule engine. The complexity scales with the number of (user, role, route, classification) combinations. Production deployments typically evaluate a single request in sub-millisecond time and overall gateway overhead stays under 50 ms in internal testing, regardless of policy complexity within practical bounds.