AI Governance Software: What to Look For Beyond the Policy Builder
AI governance software splits into policy-building, inventory, and runtime enforcement. Most products in the category cover policy and inventory and leave runtime evidence to whatever the engineering team builds. Article walks through the architectural layers and what to ask vendors before signing.

The AI governance software category emerged around three workflows. Drafting a policy. Maintaining an inventory of AI systems. Running risk and impact assessments before a model goes to production. Most of the products in the category cover those three workflows well. The category gap, which the regulators are now closing, is runtime evidence: the per-decision record that proves what the policy actually enforced at the moment a user pasted a tax return into a model. Procurement evaluations that focus on the policy builder and the inventory dashboard miss this layer entirely, and the resulting buy is a governance program that documents intentions without producing the records a regulator asks for.
I want to walk through how the AI governance software category actually breaks down, what each layer covers and does not cover, and the questions that separate a vendor that sells a workflow from a vendor that produces evidence.
How the category breaks down
The vendors in this category divide into four architectural layers. Most products cover one or two. Few cover all four.
Policy authoring and lifecycle
The policy authoring layer is where Legal, Compliance, and Risk teams draft, version, review, and approve the AI usage policy. Workflow tools that route the document through stakeholders, capture comments, and produce a version history live here. The output is a policy document. Sometimes the output is a structured policy in YAML or JSON that downstream systems can consume.
AI inventory
The inventory layer catalogs every AI system, model, and use case the institution operates. Vendor, model version, training data sources, intended use case, data classifications handled, and ownership are the standard fields. The inventory feeds the regulatory exposure map. Fannie Mae LL-2026-04 names inventory as a pillar of the mortgage AI governance mandate.
Risk and impact assessment
The risk and impact assessment layer runs structured evaluations on each new AI deployment before it reaches production. The artifacts are model cards, data protection impact assessments (DPIAs), bias evaluations, and use-case fitness reviews. The output is a deployment recommendation that the model risk committee approves or rejects.
Runtime policy enforcement and audit
The runtime layer applies the approved policy to every model request, takes the action the policy specifies, and produces the per-decision audit record. This is the layer that turns the policy document into evidence. This is the layer most products in the category do not cover. The product surface that vendors typically expose is a dashboard that summarizes activity. The dashboard is downstream of the runtime layer. The runtime layer itself is where the enforcement and the evidence are produced.
What each layer covers and does not cover
A buyer evaluating AI governance software needs to know which layer each product covers and where the gap sits relative to the existing stack.
Policy authoring covers intentions, not enforcement
A workflow tool that captures a policy in a document does not enforce the policy at the request layer. The policy authoring product is necessary at the lifecycle level. It is not the control. The buying mistake is to treat policy authoring as the control and miss the runtime layer where the policy is actually evaluated.
Inventory covers what exists, not what happened
The inventory tells the CRO that the underwriting model v3.1 is in production and handles regulated-NPI. The inventory does not tell the auditor that on May 20 at 10:42:11, user 1093 sent prompt X to the model and received response Y. The inventory is the catalog. The audit record is the event log.
Risk assessment covers the pre-deployment gate, not the production behavior
A DPIA documents the privacy impact of the system before it goes live. The DPIA does not capture what the system did once it went live. The risk assessment is the procurement-gate artifact. The runtime evidence is what the regulator asks for after the system is in production.
Runtime enforcement covers what happened, not what should have happened
The runtime layer records and enforces. It does not draft the policy. It does not maintain the inventory. It does not run the DPIA. The runtime layer is the evidence engine. Policy authoring, inventory, and risk assessment feed it. The output of the runtime layer feeds back into reporting, exceptions handling, and re-assessment.
Questions to ask vendors
The questions that separate workflow software from evidence-producing software, when evaluating AI governance products.
Does the product enforce the policy at the request layer, or does it document the policy and leave enforcement to the application? If the answer is the second, the product is upstream of the evidence layer. The buyer still needs a runtime control.
Does the product produce a per-decision audit record that the application cannot suppress or modify? Application-controlled logs fail the self-attestation test for regulatory audits. I walked through the failure modes in the vendor liability post. The runtime layer has to produce records the application cannot tamper with.
Does the product capture identity context at the request layer, or rely on the application to log identity? The application typically calls the model with a service credential. Without identity propagation to the request layer, the audit record fails Article 19 of the EU AI Act, which requires the identity of natural persons involved.
Is the product model-agnostic? A runtime layer that only works in front of one model provider's API leaves the rest of the institution's AI traffic ungoverned. The runtime layer needs to sit in front of OpenAI, Anthropic, Bedrock, Azure OpenAI, Vertex, and self-hosted endpoints under a single policy.
What is the enforcement overhead? End-to-end enforcement overhead under 50 ms in production tests is the standard from DeepInspect's internal testing. LLM inference runs 500 ms to 5 seconds. The proxy overhead has to be invisible against the model latency for inline enforcement to be operationally acceptable.
DeepInspect
This is the runtime layer DeepInspect provides. DeepInspect sits at the AI request boundary as a stateless proxy between the application and any LLM. Every request is evaluated against per-route and per-role policies using the identity context the application supplies. The proxy classifies the prompt at runtime, applies the policy, and commits a per-decision audit record before the response returns to the application. The record is signed and tamper-evident.
For institutions that already operate a policy authoring tool and an inventory tool, DeepInspect is the runtime layer that turns the policy into enforcement and the inventory into per-decision evidence. The proxy is model-agnostic, which lets a single governance regime cover OpenAI, Anthropic, Bedrock, Azure OpenAI, Vertex, and self-hosted deployments simultaneously.
Frequently asked questions
- Can a single AI governance product cover all four layers?
A few products attempt to cover policy authoring, inventory, risk assessment, and runtime enforcement under one roof. The risk is that the runtime layer, which is operationally the most demanding, gets shallow coverage because the vendor's product origin was in the workflow layers. The pattern I see working in regulated enterprises is to operate the workflow layers in the GRC platform the institution already uses, and to add a runtime enforcement and audit layer as a separate component in the AI request path. The runtime layer feeds events back into the GRC platform for reporting.
- How does AI governance software interact with the existing GRC platform?
The AI governance software sits next to the GRC platform, not on top of it. The GRC platform holds the master inventory, the policy versions, the risk assessments, and the exception register. The AI governance runtime layer holds the per-decision records and the enforcement state. The two integrate through events: the runtime layer emits structured events for policy violations, exceptions, and high-risk decisions, which the GRC platform ingests for reporting and case management.
- Is AI governance software a replacement for traditional model risk management?
No. Traditional model risk management, codified in SR 11-7 for banks, governs the model lifecycle: development, validation, governance, use. AI governance software extends MRM by adding inventory at the AI-system level (broader than the quantitative-model class MRM covered) and per-decision evidence at the runtime layer (a new artifact MRM did not require). The MRM controls remain in force. AI governance software is the layer that lets the institution discharge the obligations that MRM did not anticipate.
- What does AI governance software typically cost relative to the value it produces?
AI governance software pricing varies by layer and by institution size. Policy authoring and inventory tools tend to price per user and per workflow. Runtime enforcement tools tend to price per AI request or per active model endpoint. The economic case for the runtime layer is the regulatory penalty exposure it covers: the EU AI Act penalty tier for high-risk non-compliance reaches €15 million or 3% of global annual turnover, whichever is higher. The runtime layer's cost is small relative to the penalty exposure for an institution that handles regulated AI use cases.
- How long does an AI governance software deployment typically take?
Policy authoring and inventory tools deploy in weeks because the integration is light and the data is curated by the GRC team. Risk assessment tools deploy alongside the existing pre-deployment review process. The runtime enforcement layer deployment depends on the AI request path architecture. In a centralized request path with a single proxy point, the runtime layer can be in production in two to four weeks. In a distributed request path with many model endpoints called directly from many applications, the deployment can take a quarter. The runtime layer's deployment cost is what it costs to consolidate the AI request path through a single enforcement point.