← Blog

AI Model Governance: Controls That Operate on the Request Path

AI model governance fails when it sits at the model registry layer alone. Model cards and versioning catalog the asset. Per-request enforcement governs how the model is actually used. Article walks through the runtime layer most model governance programs leave out.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Compliance & Regulationai-governanceai-complianceauditeu-ai-actarchitecturecompliance
AI Model Governance: Controls That Operate on the Request Path

AI model governance programs tend to live in the model registry. Model cards document inputs and outputs. Version tags track deployment history. Approvals flow through a workflow engine before a model reaches production. Every one of those controls is useful at the lifecycle layer. None of them governs how the model is actually used at runtime. When a regulator asks which model decision touched a specific customer record, the model registry answers "the system was using v3.1 of the underwriting model." It cannot answer who initiated the request, what data was in the prompt, what policy applied, or whether the action was permitted under the policy in effect at that moment.

I want to walk through the layers AI model governance actually has to cover, where most programs stop, and the runtime evidence the regulators are asking for.

The three layers of model governance

Model governance has three distinct layers. Most programs operate one of them. Regulatory frameworks now require all three.

Lifecycle governance

Lifecycle governance covers the model from training to deprecation. Inputs documented. Training data lineage tracked. Evaluation metrics reported. Model cards published. Version tags applied. Deployment approvals routed. This is the layer the existing MLOps toolchain handles well. Almost every regulated enterprise has lifecycle governance in production.

Inventory governance

Inventory governance covers which models exist in the institution, where they run, who owns each one, what data classifications they handle, and what use cases they support. Fannie Mae LL-2026-04 makes inventory a named pillar of the mortgage AI governance mandate. The inventory is the artifact the CRO uses to map regulatory exposure. Most enterprises have a partial inventory at the start of a governance program and discover the rest during the first audit.

Runtime governance

Runtime governance covers the policy that applies to each model request, the enforcement of that policy before the model produces a response, and the per-decision audit record that proves what happened. This is the layer most model governance programs leave out. The model registry tells you which version was deployed. The runtime layer tells you what the deployed version actually did at 10:42:11 on May 20 for user 1093 with prompt content classified as restricted-NPI.

What the regulators ask for

Article 12 of the EU AI Act requires "automatic recording of events (logs) over the lifetime of the system" sufficient to identify risk-creating situations and to reconstruct what the system did. Article 19 specifies the contents: period of use, input data, reference databases checked, identity of natural persons involved. These records describe what the model did, not what the model is. The model registry holds the metadata. The runtime layer holds the records.

The Fannie Mae mandate uses different vocabulary for the same requirement. The lender must produce audit trails for AI-assisted decisions on demand. The disclosure-on-demand test maps to the same per-decision record the EU AI Act asks for. I walked through this convergence in the LL-2026-04 breakdown.

Where lifecycle and inventory programs stop short

The three failure modes I see in model governance programs that operate at the lifecycle and inventory layers alone.

The model card describes the model, not the deployment

Model cards specify training data, evaluation metrics, intended use cases, and known limitations. They are the cataloging primitive. They do not specify the policy that applies when this model runs in production, the populations of users permitted to call it, or the data classifications it may receive. The model card is necessary at the artifact level and insufficient at the runtime level.

Version tags identify the model, not the request

The deployment pipeline applies v3.1 to the production environment. The request layer calls the deployed endpoint. The audit log records that the endpoint was called. It does not record which population called it, which role authorized the call, what classification the prompt fell under, or what policy version governed the decision. Version-tagged deployments answer half of the regulator's question and leave the more important half open.

Approvals govern the model, not the use

Deployment approvals route through risk, security, and compliance before a model goes live. The approval governs whether the model is permitted to operate at all. It does not govern whether a specific prompt from a specific user, against a specific data class, on a specific route, is permitted. The approval is a gate. The runtime is a stream.

What runtime model governance requires

Runtime model governance produces, for every model request, a per-decision audit record containing identity, role, data classification, policy version, decision outcome, and a tamper-evident integrity mechanism. The record is independent of the application that made the request. It is committed before the model response returns to the application. It persists regardless of the application's runtime state.

The architectural pattern that produces this primitive is an external enforcement proxy that sits between the application and any LLM. The proxy evaluates the policy against the identity context the application supplies and the classification it produces. The proxy commits the record before the application sees the response. The proxy is model-agnostic, which is what lets the runtime layer cover the in-house Llama deployment, the OpenAI endpoint, the Bedrock-hosted Claude, and the on-prem Mistral under a single governance regime.

DeepInspect

This is the runtime layer DeepInspect provides. DeepInspect sits at the AI request boundary as a stateless proxy between the application and any LLM. Every request is evaluated against per-route and per-role policies using the identity context the application supplies. PII, PHI, and other regulated classes are detected at the prompt level and redacted or blocked based on the policy. Every decision produces a per-decision audit record containing identity, role, policy version, data sensitivity, decision outcome, and timestamp. The record is signed and committed before the application receives the model's response.

For an enterprise model governance program, the proxy is the runtime evidence layer that the lifecycle and inventory layers do not produce on their own. The model registry continues to track the artifact. The proxy tracks what the artifact did.

Frequently asked questions

How does AI model governance differ from traditional model risk management?

Traditional model risk management, codified in Federal Reserve SR 11-7 for banks, focused on quantitative models in credit, market risk, and treasury. The framework specified model development, validation, governance, and use controls. AI model governance extends MRM in three ways. The model class is broader: LLMs, embeddings, and generative models are now in scope. The use modality is interactive: prompts and responses, not batch scoring. The evidence requirement is per-decision: the regulator wants the audit trail of specific requests, not the annual validation report alone. MRM controls remain in force. AI governance adds the runtime evidence layer on top.

Does model governance cover both internally trained and third-party models?

Both. The deployer is accountable under Article 26 of the EU AI Act regardless of who trained the model. The Fannie Mae mandate explicitly extends accountability to AI mistakes by subcontractors and vendors. Internally trained models give the enterprise more control over training data and evaluation. Third-party models shift training-time evidence to the provider but leave the runtime evidence requirement on the deployer. The proxy-based runtime governance pattern works identically across both: the enforcement and audit layer sits in front of the model, not inside it.

What is the role of the model card in regulated deployments?

The model card is the artifact-level documentation the regulator expects at the procurement and onboarding stage. It states training data sources, evaluation metrics, intended uses, known biases, and version history. The model card supports the inventory layer of governance. It does not satisfy the per-decision evidence requirement. The proxy-based runtime layer produces the per-decision records that the model card alone cannot. The two artifacts live side by side: the model card for what the model is, the proxy log for what the model did.

How do you govern model fine-tuning and adapter usage?

Fine-tuned models and adapter-based deployments are treated as distinct model versions for governance purposes. Each fine-tune has its own model card, its own inventory entry, and its own policy attachment at the runtime layer. The runtime proxy evaluates the request against the policy associated with the specific deployed endpoint, which includes whether fine-tuning is permitted for this population, this route, and this data class. Adapter usage is logged in the per-decision record so the auditor can trace which adapter applied at the moment of the decision.

What does model deprecation look like under per-decision governance?

Model deprecation under per-decision governance is a routing change at the proxy plus a policy update. The deprecated model is removed from the permitted route list for new requests. The proxy logs and rejects requests that target the deprecated endpoint. Existing per-decision records for the deprecated model remain retained according to the audit retention schedule, because the regulator can still ask about decisions the deprecated model made during its production lifetime. The model registry marks the version as deprecated. The runtime layer enforces the change.