← Blog

AI Data Governance: Classifying What Enters and Leaves the Prompt

AI data governance fails when the classification engine runs on documents and not on prompts. The data lake is sorted, the AI request path is not. Article walks through the prompt-level classification, lineage, and disclosure architecture that satisfies the regulators asking new questions about model inputs.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Compliance & Regulationai-governanceai-compliancecomplianceeu-ai-actshadow-aiai-security
AI Data Governance: Classifying What Enters and Leaves the Prompt

Enterprise data governance has spent two decades classifying documents. PII tags on customer records. Confidentiality labels on strategy decks. Retention rules on transaction histories. The data lake is sorted. The AI request path is not. When an underwriter pastes a borrower's tax return into an LLM, the document-level classification stays in the document repository. The prompt that crosses the wire to the model carries the content without the classification tag, without the access controls that govern the document, and without a record that links the request back to the original data class. The legacy DLP that watches the network sees encrypted HTTPS to api.openai.com and cannot inspect the prompt content. The legacy data lineage tool sees a database read and cannot follow the bytes into the model.

I want to walk through what AI data governance has to cover that document-level governance does not, where the existing tools fall short, and the prompt-level architecture that closes the gap.

What AI data governance has to cover

AI data governance extends the existing data governance program in four dimensions.

Prompt-level classification

Documents are classified once, at rest, and the tag travels with the document through the storage system. Prompts are assembled at runtime. A prompt to a model may include text the user typed, retrieved context from a vector store, system instructions from the application, and metadata stitched together by the request layer. The classification has to operate on the assembled prompt at the moment the request crosses the AI request boundary. Document classification does not transfer; the prompt is a new artifact.

Lineage from prompt to source

Auditors ask which data sources contributed to a model decision. The retrieval-augmented generation pattern pulls from vector stores, document repositories, and live databases. The lineage record has to capture which sources were queried, which results were retrieved, which results were included in the prompt, and which were truncated. Without lineage, the auditor cannot reconstruct what the model saw.

Response classification

The model produces text. The text may include PII, PHI, or regulated commercial information either because the user asked for it, because the retrieval surfaced it, or because the model produced it through inference. The response is data leaving the AI request boundary. It needs classification at the same level as the prompt. Outbound classification is what catches the case where the prompt was clean but the response contains regulated data.

Cross-border data flow recording

The model endpoint may be hosted in a different jurisdiction than the data subject. EU customer data flowing to a US-hosted model endpoint creates a cross-border transfer event under GDPR, regardless of whether the data is anonymized in the prompt. The per-decision record needs to capture the model endpoint location so the DPO can satisfy the cross-border reporting obligation.

Where the existing tools fall short

The three architectural failures of legacy data governance for AI traffic.

Document classification does not propagate to the request layer

The HRIS document is tagged "regulated-PII." The user opens the document, copies the salary table, and pastes it into a prompt. The classification stayed in the HRIS. The prompt to the model is unlabeled text. The data governance tool cannot tell that the prompt content originated as regulated-PII. The runtime classifier has to look at the content, not the source tag.

Network DLP is blind to prompt content

When an engineer or underwriter sends a prompt to a third-party LLM, the data travels as an HTTPS POST to the provider's API. Network DLP sees encrypted web traffic. The prompt content is invisible unless TLS inspection is configured for AI provider domains specifically and the API payload is parsed. Most enterprise DLP deployments are not configured for AI providers, and the Cloud Radix research shows 86% of IT leaders are completely blind to these AI interactions.

Vendor SaaS embeds model calls invisibly

A material share of enterprise AI usage flows through vendor SaaS that embeds model calls under the hood. The quality-control vendor uses ML to flag loan defects. The customer-service platform uses an LLM to summarize tickets. The pricing engine scores risk through a model. The deployer's data governance program has no visibility into what data the vendor's model received. The vendor's shared responsibility model leaves the data exposure with the deployer.

What prompt-level governance produces

Prompt-level governance produces a per-request record that links the assembled prompt content to its data classification, its lineage from source systems, the action taken on the prompt (permit, redact, block), the model endpoint that received the request, and the response that the endpoint produced (with its own classification). The record is the artifact the DPO uses to answer cross-border questions, the CISO uses to answer breach-scope questions, the CRO uses to answer regulatory exposure questions, and the General Counsel uses to disclose under regulator request.

The architectural pattern is the same pattern that the EU AI Act Article 12 record-keeping requirement codifies. An external enforcement proxy that sits at the AI request boundary, classifies the prompt at runtime, evaluates the policy against identity and classification, takes the action the policy specifies, and commits the record before the response returns to the application.

DeepInspect

This is the architecture DeepInspect provides for AI data governance. The proxy sits between the application and any LLM. Every prompt is classified at the request layer using detectors for PII, PHI, NPI, and other regulated classes. Every response is classified on the return path. Every decision produces a per-decision audit record containing the data classification, the source lineage, the model endpoint, the action taken, and the identity context. The record is signed and committed before the application receives the response.

For the data governance program, the proxy turns prompt content into a classified, traceable, recordable artifact at the moment of the request. The data lake's classification continues to govern documents at rest. The proxy governs documents in motion to AI systems.

Frequently asked questions

How does AI data governance interact with GDPR?

The DPO's GDPR mandate applies in full to data that crosses the AI request boundary. Personal data in a prompt is a GDPR processing event. The lawful basis has to be established. The cross-border transfer has to be documented if the model endpoint is hosted outside the EU. The data subject's rights apply, which includes the right to know what processing has occurred and the right to erasure under specific conditions. The per-decision record produced at the AI request boundary is the operational evidence the DPO needs to discharge those obligations. Article 12 of the EU AI Act adds a per-decision record requirement on top of GDPR's processing register.

What is the right granularity for data classification in prompts?

Five classes is the minimum: public, internal, confidential, restricted, and regulated. Most regulated enterprises split the regulated class into PII, PHI, NPI, and trade-secret. The classification has to be evaluated on the assembled prompt at the request layer, not on the source document. The action set attached to each class is the operational rule the proxy enforces: permit, redact, permit-with-warning, block, block-and-escalate. Going beyond five top-level classes makes the runtime classifier harder to maintain without materially changing the action set.

How does retrieval-augmented generation affect data governance?

Retrieval-augmented generation expands the data governance surface because the prompt is assembled from multiple sources at runtime. The user query may be unclassified. The retrieved context from the vector store may be regulated. The system prompt may include sensitive operational instructions. The proxy classifies the assembled prompt rather than the individual fragments, because the model sees the assembled artifact and the auditor asks about what the model saw. The lineage record captures which sources contributed which fragments, which lets the auditor trace the regulated content back to its source on demand.

How do you govern data flowing through vendor SaaS that uses AI?

Vendor SaaS that embeds AI calls under the hood requires contractual flow-down of the data governance obligation. The procurement contract has to require the vendor to produce per-decision records of AI usage on the deployer's data and to disclose those records on request. The vendor's SOC 2 attestation does not satisfy the operational obligation; the deployer needs the per-decision records. In the absence of vendor-side records, the deployer's runtime proxy can intercept the SaaS-to-model traffic if the SaaS is deployed inside the corporate network, but the more common pattern is to require the vendor to expose its own AI request audit endpoint to the deployer.

Does prompt-level classification slow down model requests?

The classification adds processing on the request path. End-to-end enforcement overhead measures under 50 ms in production tests from internal DeepInspect testing. LLM inference takes 500 ms to 5 seconds depending on model and prompt length, so the classification overhead is invisible relative to the model's response time. The runtime classifier is designed to fail fast on common cases and apply heavier model-based detectors only when the prompt content warrants it. The math favors inline classification because the model latency dominates the request budget.