Does generative AI governance apply to internal RAG pipelines?

Yes. Internal RAG pipelines call LLM endpoints, often with prompt content that carries retrieved documents from internal stores. The inspection layer at the LLM request boundary sees the assembled prompt and runs classification against the document content the RAG pulled. The (data, identity, model) policy applies the same way to RAG traffic as to direct user prompts.

How does the governance program handle agentic AI workflows?

Agentic AI workflows generate prompts programmatically as the agent executes its task. The inspection layer treats the agent's prompt the same way as a user's prompt: classification on the content, policy evaluation against the agent's service identity plus the originating user identity, and decision against the (data, identity, model) triple. The record carries both identity fields.

Does the program need to cover every model in use?

The program covers the models in scope for the policy. Programs typically start with the models used inside high-risk use cases (the Article 12 deadline drives this) and extend coverage to the broader model surface over the following quarters. The inspection layer is model-agnostic, so extending coverage is a configuration step rather than a re-architecture.

How does the program handle escalation to human review?

The policy can route specific categories to a human reviewer queue. The reviewer receives the request context, the classification, and the policy version, makes the decision, and the system records the reviewer's identity and decision on the same series as the AI's. This satisfies the Article 14 human oversight obligation and feeds into the record auditors sample.

What metrics does the governance program track?

Programs typically track the volume of decisions by category (permit, redact, block, escalate), the false-positive rate on the classifier, the distribution of classification labels, the percentage of requests with identity binding, the policy version coverage across decisions, and the integrity signature verification rate. The metrics feed the NIST AI RMF MEASURE function and the EU AI Act Article 17 quality management system.

Generative AI Governance: The Inspection-Layer Decisions That Sit Between Policy and Production

Generative AI governance is the discipline of binding organizational policy to per-request enforcement on the production traffic that flows between users or agents and any LLM. The binding sits at the inspection layer. The policy decides what categories of data and which user identities can interact with which models on which decisions. The enforcement layer applies the policy to each request. The record series produces the per-decision evidence the auditor samples.

Most generative AI governance programs land on three immediate operational questions. Which prompts can flow to which models. Which identities can submit prompts on which categories. What record the program retains on each decision. The questions sit naturally at the HTTP request boundary between authenticated users or agents and the LLM endpoint, because that boundary is where both the prompt content and the verified identity are available at the same moment.

Categories the governance program has to decide on

The categories that drive generative AI governance decisions:

The policy is a function over these three axes: a triple of (data, identity, model) decides whether the request is permitted, redacted, or blocked. The record carries all three axes plus the decision and the policy version.

Where the enforcement sits

The enforcement runs at the HTTP request boundary. The proxy terminates TLS at the inspection layer, authenticates the caller against the corporate IdP, runs the classifier against the prompt content, evaluates policy against the (data, identity, model) triple, and commits the audit record. The placement is the only one that holds the prompt and the verified identity together at the moment of the decision.

A network-side placement sees the destination but not the prompt body. An application-side placement sees the prompt but not the verified identity unless the application is wired through. A model-provider-side placement sees the prompt and the API caller but not the natural-person identity behind the application session. The proxy placement removes the integration dependency from each application team and produces a canonical record series across the program.

The record the audit references

The per-decision record carries the fields the EU AI Act Article 12 + Article 19 series expects, plus the additional fields the program tracks for operational analytics:

The series is tamper-evident because each record's signature chains against the previous record. Deletion or modification is detectable on a routine integrity check. The series is queryable by any field for audit sampling, operational analytics, or compliance reporting.

Mapping to EU AI Act and NIST AI RMF

EU AI Act Article 12 requires automatic recording of events sufficient to ensure traceability. Article 19 specifies the fields the record carries: timestamps, input data, and identification of natural persons involved. The August 2, 2026 deadline applies to high-risk AI systems including credit scoring, employment screening, education access, and biometric identification.

NIST AI RMF covers generative AI governance through its four functions. GOVERN covers the policy layer. MAP covers the inventory and risk categorization of generative AI use cases. MEASURE covers the metrics the inspection layer produces (decision rates, classification distributions, escalation volumes). MANAGE covers the feedback loop between the records and the policy refinement.

The Generative AI Profile in NIST AI 600-1 adds specific guidance for generative AI risks including confabulation, harmful content, data privacy, and CBRN information. The profile maps to the same four functions and references the same evidence base as the broader RMF.

What the enforcement layer decides on each request

The enforcement decision for a single request flows through several evaluations:

The escalate path is the integration point for Article 14 human oversight obligations. The policy can route specific request categories to a reviewer who decides whether the request proceeds. The reviewer's decision is recorded on the same series.

Where most programs are landing in mid-2026

The programs I talk to who are inside Annex III for the August 2 deadline are landing on three workstreams. The first is the AI inventory: knowing what generative AI use cases are in production, which ones fall inside Annex III, and which models they reach. The second is the enforcement cutover: placing the inspection layer at the request boundary for the identified use cases. The third is the record series accumulation: turning on the per-decision record before the deadline so the audit evidence base has months of history rather than days.

Programs that complete the enforcement cutover early have a stronger audit position because the record series has been running long enough to demonstrate the policy is binding on real traffic. Programs that delay the cutover until the deadline carry one day of evidence on day one of the obligation.

DeepInspect

DeepInspect is the inspection layer for generative AI governance. The proxy sits inline between authenticated users or agents and any LLM, terminates TLS at the inspection layer, authenticates against the corporate IdP, classifies the prompt content, evaluates policy against the (data, identity, model) triple, and commits a per-decision audit record before the response returns. The records carry the fields EU AI Act Article 12, Article 19, and NIST AI RMF MEASURE reference.

For organizations binding policy to production generative AI traffic, the proxy placement supplies both the enforcement and the record on the same surface. The placement is the gap between aspirational policy and operational evidence.

If you are facing the August deadline, let's talk.