What if my calling application does not propagate identity?

The identity claim has to originate somewhere in the request path with authority. The pattern is to require a signed JWT or an OIDC-issued token on the request from the calling application. The token asserts the natural-person or agent identity and is verified at the AI request boundary. Applications that authenticate only with a static API key have to be updated to attach a per-request identity claim before the AI boundary can enforce identity-aware policy.

Can we get by with three of the four predicates?

For a demo, yes. For production with regulatory scope, no. The regulator's stress test lands on whichever predicate is missing: no identity, no reconstructability; no model, no scope-limiting; no classification, no data-flow record; no policy, no evidence of enforcement.

How do we handle emergency access?

Emergency access is a policy path, not a bypass. The policy artifact defines the emergency identity's expanded authorization, the requirement to log the emergency as a distinct policy match, and the alerting on emergency use. The audit record for an emergency-access decision shows the emergency policy fired, which supports the post-incident review.

What is the failure mode when the identity provider is down?

Depends on the deployment's fail-open versus fail-closed setting for identity errors. Fail-closed denies requests when identity verification fails, which is the safer default. Fail-open permits requests with a "identity-unverified" flag, which suits deployments where availability outweighs identity risk. The ai gateway fail-closed piece covers the pattern.

How does the model come to know its own model attribute?

The client-side SDK attaches the target model as a request attribute the gateway reads. When the gateway itself does the model routing, the gateway records the actual model that was routed to. Both cases produce the correct audit attribute; the routing gateway is authoritative on which model actually served the request.

Does this work with streaming responses?

Yes. The four predicates evaluate at request initiation. Streaming responses pass through the response classifier as they stream, with buffering at the classifier boundary. The ai gateway streaming responses piece covers the streaming-specific patterns.

AI Request Authorization Model: The Four Predicates Every Production AI Call Has to Answer

Every production AI request has to answer four authorization predicates before the model call happens: who is asking, which model is being called, which data classification the payload carries, and which organizational policy applies. Deployments that evaluate only one or two of the four produce audit gaps the regulator, the SIEM operator, or the incident-response team surfaces the first time an incident touches production data. The four-predicate model is the pattern the EU AI Act, HIPAA, SOC 2, and ISO 27001 each converge on for different reasons. I want to walk through the four predicates, the input attributes each requires, the policy engine pattern that composes them, and the audit record that comes out the other side.

The four predicates are AND-composed. Missing one is not an approximation. Missing one is an audit gap.

Predicate one: identity

The identity predicate answers "who is asking." The input attribute is a verified identity claim on the request, propagated from the enterprise identity provider through the calling application to the AI request boundary. The claim identifies a natural person or an agent acting on behalf of a natural person, with the delegation chain intact.

The failure mode is the shared service account. When the calling application authenticates to the LLM provider with a single API key that stands for the whole application, the identity predicate collapses to "the calling application" for every request. The AI request boundary cannot distinguish requests from the CFO from requests from a summer intern. The ai agent identity pillar covers the identity binding pattern.

The identity claim has to include enough attributes to feed the other predicates: role, department, tenant, region, delegation lineage. A bare user ID satisfies the "who" question but leaves the policy engine unable to evaluate role-based restrictions.

Predicate two: model

The model predicate answers "which model is being called." The input attribute is the target model identifier: gpt-4o, claude-3-5-sonnet-20241022, bedrock:anthropic.claude-3-5-sonnet, gemini-1.5-pro-002. The predicate matters because organizational policies differ across models: some models are approved for regulated data, others are approved only for internal-only workflows, others are approved for consumer-facing use with output filtering enabled.

The failure mode is the LLM proxy that abstracts away the model identifier. When the calling application says "give me a completion" and the proxy routes to whichever model is cheapest, fastest, or least busy, the policy engine cannot evaluate model-specific restrictions. The ai gateway architecture piece covers the abstraction pattern; the security implication is that model routing decisions have to be policy-aware.

The model attribute has to include the provider, the specific model version, and the endpoint. Model versions matter because a "gpt-4" wildcard covers 2024, 2025, and 2026 releases with different safety characteristics.

Predicate three: data classification

The data classification predicate answers "which data class is in the payload." The input attributes come from classifiers running at the boundary: PII detection, PHI detection, PCI detection, source-code-with-secrets detection, competitor-mention detection, and organization-specific classifications the deployer's data catalog defines.

The failure mode is trusting the calling application to tag its own requests. When the application assigns the classification, an application bug or a prompt-injection attack that mislabels the request produces a false-clean payload. The classifier at the boundary independent of the application catches what the application misses. The llm dlp pillar covers the classifier layer.

The classification output feeds both the policy engine (which classifications are permitted for this identity and this model) and the audit log (what class was in the payload, whether or not the request was permitted).

Predicate four: organizational policy

The policy predicate answers "what does the organization say about this combination." The input is the policy artifacts, expressed as reviewed code, and the runtime context: session count, elapsed time, prior denials, environment tag (production, staging, development).

The failure mode is policy that lives in a gateway UI as a set of dropdowns and toggles. The UI produces a fast time to first policy at the cost of change history, review trail, and rollback plan. The policy-as-code piece covers the pattern that gives the review and rollback properties.

The policy artifact has to be versioned. Every audit record includes the policy version that applied, which means the incident review question "which policy was in effect when this decision landed" has a git SHA as the answer.

Composing the four predicates

The policy engine at the AI request boundary receives the four inputs and evaluates them as an AND-composed decision. In Rego:

The decision runs in single-digit milliseconds when the policy engine is in-process. Latency budget for the whole boundary is sub-50ms at p95, of which the model-side latency and network dominate.

The audit record

The audit record for every decision includes the four predicate inputs and the outcome:

The record is what auditors accept for SOC 2 CC7.2, ISO 27001 Annex A 8.15, EU AI Act Article 12, and HIPAA 164.312(b). The ai audit logs format spec covers the field-level detail.

DeepInspect

This is exactly what DeepInspect does. DeepInspect sits at the AI request boundary and evaluates the four predicates on every request. Identity comes from the enterprise identity provider through a verified claim. Model comes from the request target. Data classification comes from classifiers running at the boundary independent of the calling application. Policy artifacts live in git as reviewed code. The audit record includes all four inputs and the outcome, lands in an append-only store, and references the policy version by git SHA.

The four-predicate model composes across LangChain, LlamaIndex, AutoGen, Semantic Kernel, the OpenAI Agents SDK, custom agent frameworks, and direct API calls, because the enforcement point is HTTP and the predicates are model-agnostic.

Book a technical deep dive at deepinspect.ai.