← Blog

AI Governance Policy: What a Policy Has to Specify to Be Enforceable

Most AI governance policies are written for the auditor but cannot be evaluated at the request layer. A policy that lacks classification rules, identity definitions, and enforcement decision points is prose, not control. Article walks through what the policy has to specify to be enforceable.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Compliance & Regulationai-governanceai-compliancepolicy-enforcementeu-ai-actcomplianceaudit
AI Governance Policy: What a Policy Has to Specify to Be Enforceable

Most enterprise AI governance policies are written by lawyers for auditors. They describe principles, scopes, and roles in language that satisfies a procurement review. They omit the operational definitions that the request layer needs to evaluate a decision. The result is a policy that the General Counsel can defend in writing and the application cannot enforce at runtime. Under EU AI Act Article 26, the deployer is obligated to use the high-risk AI system in accordance with the instructions for use and to implement appropriate human oversight. Implementation, in practice, requires a policy the system can read.

I want to walk through what an AI governance policy actually has to specify to be enforceable, and how the policy connects to the per-decision audit record that proves enforcement.

What an enforceable policy specifies

A policy that operational systems can evaluate has six concrete sections beyond the principles. The principles section sets direction. The six operational sections set rules.

Identity and role definitions

The policy defines the populations of users, agents, and service accounts that interact with AI systems. Each population gets a name (clinical-staff, mortgage-underwriter, customer-support, data-engineering, automation-agent), a list of authentication sources, and a mapping to the identity provider's group memberships. The policy specifies which populations may call which models, which roles within each population have elevated authorization, and which roles trigger human-in-the-loop review. Without this section, the request layer has no contract to evaluate.

Data classification rules

The policy defines the data classes that may or may not appear in prompts and responses. Public, internal, confidential, restricted, regulated-PII, regulated-PHI, regulated-NPI, and trade-secret are the standard top-level classes. The policy specifies the operational definition of each class, the regular-expression or model-based detector that classifies the content, and the action the system takes when each class is detected. Classification without an attached detector is principle, not rule.

Model and route authorization

The policy specifies which models each population is permitted to call, which API routes are in scope, and what model-specific constraints apply. The clinical-staff population may call the in-house Llama deployment but not the public OpenAI API. The mortgage-underwriter population may call the in-house model for redacted prompts but is blocked from the public API for any prompt that contains regulated-NPI. The per-route, per-role specification is what the enforcement point evaluates at runtime.

Decision-time actions

For every combination of identity, role, data class, and route, the policy specifies the action. Permit. Redact and permit. Permit with logging only. Block. Block and escalate. Block and notify. The action set is finite. The mapping from (identity, role, data, route) to action is the enforceable surface of the policy. A policy without an action set is a recommendation, not a control.

Audit record schema

The policy specifies what the system records for each decision. Identity verified, role evaluated, data classification, policy version, decision outcome, timestamp, and integrity mechanism are the minimum fields. The policy specifies the retention period, the access controls on the audit records themselves, and the disclosure procedure. Without this section, the controls produce inconsistent evidence that the auditor cannot sample.

Exception handling

The policy specifies how exceptions are requested, approved, and time-bounded. Every exception is an event that requires a record of its own. The exception record names the requester, the approver, the scope (which population, which data class, which route, for how long), and the compensating control. An exception without a time bound is a quiet expansion of the permitted action set. The policy has to forbid permanent exceptions.

Where the policy fails in practice

I see three predictable failure modes when an organization tries to turn a written policy into runtime enforcement.

The policy specifies the action but not the detector

The policy says "PII must not be sent to third-party LLMs." The system has no operational definition of PII, no classification engine that runs at request time, and no enforcement decision point. The policy is correct on paper. The controls cannot evaluate it. This is the most common failure. The fix is to attach a named detector to every data class the policy references.

The policy assumes identity context that the application does not propagate

The policy says "elevated roles may bypass the standard redaction rule with appropriate logging." The application calls the model using a static service credential. The request layer cannot tell whether the caller is an elevated role or a standard user. The policy is unenforceable until the application attaches the identity context to every model request. NIST's Pillar 1 framing names this gap; I walked through the breakdown in the AI agent security post.

The policy and the audit record are written by different teams

The policy is drafted by Legal and Compliance. The audit record is designed by Platform Engineering. The schemas drift. The policy references concepts (intent, risk-score, business-justification) that the record does not capture. The record captures fields (model-name, token-count, response-latency) that the policy does not reference. The auditor asks for a record that maps to the policy and gets a record that maps to the inference engine instead.

What the policy ties into

The policy is not a standalone document. It is the input to three downstream systems: the enforcement point, the audit log, and the disclosure procedure. The policy version that governed each decision is a field in the audit record. The audit record schema is a section of the policy. The disclosure procedure references the audit record. When the three are not synchronized, the auditor sees a gap between what the policy claims and what the system enforces.

The EU AI Act Article 12 record-keeping requirement makes the synchronization a regulatory obligation. The records must reconstruct the decision and the operational context. The operational context is the policy state at the moment of the decision. A policy that changes weekly and an audit record that does not capture policy version cannot satisfy the requirement.

DeepInspect

This is the integration DeepInspect provides. DeepInspect sits at the AI request boundary as a stateless proxy. The policy is defined declaratively, in YAML or in the Studio, and version-controlled. Every request is evaluated against the current policy using identity context supplied by the application and data classification produced by the proxy's classifier. Every decision produces an audit record that includes the policy version that governed the evaluation, the role evaluated, the data classification detected, the action taken, and the integrity signature.

For an enterprise governance program, the proxy is the runtime that turns the written policy into per-request enforcement. The policy is no longer prose. It is the rule set the system evaluated, recorded, and is prepared to disclose.

Frequently asked questions

Do we need a separate policy for each AI use case?

A single top-level AI governance policy is preferable, with use-case-specific addenda that name the populations, data classes, and routes that apply. The top-level policy sets the principles, the classification taxonomy, the action set, the audit schema, and the exception procedure. The addenda specify which combinations apply to which deployments. The federation pattern keeps the policy consistent across the institution while allowing business units to attach use-case context. A separate policy per use case fragments the controls and produces inconsistent evidence.

What is the right cadence for reviewing the AI governance policy?

The policy needs a scheduled review at least quarterly and an event-driven review whenever a new model, a new use case, or a new regulation enters scope. The quarterly review is the baseline. The event-driven review catches the cases that the quarterly cadence misses, including a new model provider, a new high-risk classification, a new enforcement action by a regulator, or a material change to a connected regulation (EU AI Act amendments, Fannie Mae lender letters, NIST framework updates).

How granular should the data classification taxonomy be?

The minimum taxonomy is five classes: public, internal, confidential, restricted, and regulated. Most regulated enterprises split the regulated class into PII, PHI, NPI, and trade-secret to match their existing data governance vocabulary. Going beyond five classes makes the policy harder to evaluate at runtime because the detector has to distinguish more categories at the prompt level. The taxonomy should match the existing data governance program, not invent a new one. The AI policy reuses the classes the organization already classifies documents under.

Should the policy mention specific model providers by name?

The policy should mention model providers by name when the deployment uses specific providers and the policy applies differently to each. The clinical-staff population may call the in-house deployment freely and is blocked from public providers. The mortgage-underwriter population may call OpenAI under specific redaction rules and Anthropic under the same. Naming the providers in the policy creates an enforceable contract. Generic policies that say "third-party LLMs" without naming providers fail when a new provider arrives and no one updates the rule set.

How does the policy handle agents and automation accounts?

Agents and automation accounts are treated as a distinct identity population in the policy. The policy specifies which agents may act on whose behalf, what scoped authorization each agent receives, and what the action lineage looks like when an agent makes an AI request. NIST's three-pillar framing applies directly. Pillar 1 is the agent's verified identity. Pillar 2 is the scoped authorization the agent received from a human or another agent. Pillar 3 is the action lineage that records the chain. The policy specifies the authorization scopes available to each agent population.