← Blog

Sensitive Data AI Detection: Classifying Prompt Content at the AI Request Boundary

Sensitive data AI detection classifies prompt content at the AI request boundary, where the prompt is reconstructed into a structured payload and a classifier surfaces the categories the policy reads. The category set includes PII (email, phone, SSN, NPI), PHI, PCI, secrets (API keys, tokens, certificates), source code, and customer identifiers. Document-level classifiers do not run cleanly against prompt context windows. The inspection-point classifier runs at request time, surfaces labels the policy uses, and stamps the labels on the per-decision audit record.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
AI Security Solutionsdata-loss-preventionai-securitydlppolicy-enforcementinline-enforcement
Sensitive Data AI Detection: Classifying Prompt Content at the AI Request Boundary

Sensitive data AI detection classifies prompt content at the AI request boundary, where the prompt is reconstructed into a structured payload and a classifier surfaces the categories the policy reads. The category set in regulated environments includes PII (email, phone, SSN, NPI), PHI, PCI, secrets (API keys, tokens, certificates), source code, and customer-identifier patterns specific to the business. Network DLP and endpoint DLP each see part of the surface and miss the rest. The architectural answer is a classifier that runs at the AI request boundary, against the reconstructed prompt context window, and stamps the labels on the per-decision audit record.

I want to walk through where document-level classifiers fail on prompt context windows, what category schema makes sense for AI prompt content, and how the classifier composes with policy at the inspection point.

Why document-level classifiers fail on prompt context

Document-level classifiers were built for files: a multi-page document, an attachment in transit, a record in a database. The classifier reads the document, applies pattern matchers and ML models, and assigns labels at the document level.

A prompt context window does not look like a document. The context contains a system prompt, a few rounds of conversation, a user message, and often a structured payload the user pasted (JSON, a customer record, a configuration file). The classifier that runs against a 10-page PDF treats this as one heterogeneous blob and produces the union of labels across all the parts. The policy that reads "the prompt contains PII" then fires on every prompt that contains an email signature, whether or not the email is the work product.

The prompt-aware classifier breaks the context into its components: system block, user turns, assistant turns, tool definitions, tool outputs, attached files. Each component gets classified independently. The labels attach to the component, not the whole context. The policy can match on "user turn contains PHI" or "system block contains secrets" with different consequences for each.

The category schema that makes sense for AI prompt content

The minimum category set covers six classes the policy reads consistently.

PII (personally identifiable information): email addresses, phone numbers, full names, postal addresses, government identifiers (SSN, passport numbers), financial identifiers (account numbers, NPI for healthcare professionals). Each subcategory carries its own regulatory weight. SSN is high-risk under GLBA; NPI is high-risk under HIPAA.

PHI (protected health information): the HIPAA-defined 18 identifiers when paired with health data. The classifier needs to detect the pairing, not just the identifier in isolation. A name alone is PII. A name plus a diagnosis is PHI.

PCI (payment card industry data): full card numbers, CVVs, card holder authentication data. PCI DSS has hard requirements on storage and transmission. Prompt content carrying PCI data triggers escalated deny rules in most policies.

Secrets: API keys, access tokens, private keys, certificates, database connection strings. The classifier catches structural patterns (high-entropy strings of certain lengths) and known prefixes (sk-, ghp_, xoxb-, AKIA). The category fires on prompts that include credentials whether intentional or accidental.

Source code: code blocks, configuration, infrastructure-as-code. The category is not in itself sensitive, but the policy often pairs it with destination rules: source code can go to internal models but not to external models without redaction.

Customer identifiers: deployer-specific patterns. A SaaS company's customer IDs, a bank's loan numbers, a hospital's MRN. The deployer registers a regex or a small classifier per pattern. Customer identifiers behave like PII for the deployer's data but are not on the generic PII list.

How the classifier composes with policy

The classifier emits labels. The policy reads labels. The two are deliberately decoupled so the deployer can change the policy without retraining the classifier.

A representative interaction at the inspection point:

[@portabletext/react] Unknown block type "code", specify a component for it in the `components.types` prop

The composition lets the policy change without touching the classifier. A new rule that denies prompts with secrets and source-code together can ship as a policy update. The classifier continues to emit the same labels.

Where the classifier runs

Three deployment patterns for the classifier itself cover the typical shapes.

Inline at the gateway. The classifier is a component inside the gateway process. Inference runs on the gateway's hardware (CPU for small models, GPU when available for larger). Latency is the lowest because no network hop is involved. Throughput is bounded by the gateway's hardware budget.

Sidecar classifier. The classifier runs as a sidecar process the gateway calls over localhost. The pattern lets the deployer scale the classifier independently of the gateway. The latency adds a round-trip over the loopback interface, which is negligible in practice.

Centralized classifier service. The classifier is a shared service multiple gateways call. The pattern fits high-throughput environments where the classifier hardware budget is significant. The latency adds a network round-trip. The service can batch requests for higher throughput at the cost of per-request latency.

How the classifier handles streaming responses

Response-side classification has to run against streamed tokens. The naive approach (wait for the full response, then classify) defeats streaming.

The classifier runs against rolling windows of streamed tokens. Each window of size N tokens is buffered, classified, and either forwarded or held. The window size is chosen to give the classifier enough context to make confident decisions while keeping the user-visible latency low. A 100-token window adds tens of milliseconds of buffering on a typical streaming model.

A classifier match in the middle of a streamed response can either block the rest of the stream, redact the matched span, or terminate the stream. The policy rule chooses among the three. The audit record captures the streamed chunks that were modified or blocked.

How the labels feed the audit record

Every per-decision audit record carries the labels the classifier emitted on the prompt and the response. The labels are stable category names that the auditor or the regulator reads against the regulatory expectation:

[@portabletext/react] Unknown block type "code", specify a component for it in the `components.types` prop

An EU AI Act audit reads the user-turn labels to verify data classification was applied. A HIPAA audit reads the labels to verify PHI handling rules fired. A Fannie Mae LL-2026-04 disclosure reads the labels to verify customer-identifier handling matched the documented usage policy.

The label set is the lingua franca between the classifier and the regulator. The deployer's job is to keep the label set aligned with the regulatory expectations the deployer is subject to.

DeepInspect

This is the architecture DeepInspect was built to provide. DeepInspect sits at the AI request boundary as a stateless proxy. The inspection point normalizes each AI request into a model-agnostic representation, runs the classifier against each component of the prompt context window, evaluates policy against the emitted labels, and writes a signed per-decision audit record that carries the labels on the record.

The classifier and the policy are decoupled, which lets the deployer extend the label set without restructuring the policy. The classifier ships with the regulated-environment category set out of the box and accepts deployer-supplied patterns for customer-identifier classes specific to the business.

If your AI data classification approach is still document-level and you are moving it to prompt-level at the AI request boundary, book a demo today.

Frequently asked questions

Why not use the existing DLP classifier instead of a new one?

Existing DLP classifiers are tuned for documents and files. They run against a different input shape and produce labels at the wrong granularity for prompt content. The classifier at the AI request boundary reuses concepts (regex patterns, entity recognizers) the deployer's DLP team already understands and extends them for the prompt-context shape.

How does the classifier handle synthetic identifiers (e.g. test data)?

The classifier emits labels based on structural matching. A test SSN like 123-45-6789 looks like a real SSN to a regex matcher. The deployer registers an allow-list pattern for the test data the policy can read alongside the SSN label, e.g. pii.ssn with allowlist: test-fixtures. The policy can permit test SSNs in development environments and deny them in production.

What about false positives?

False positives are managed at the policy layer, not the classifier layer. The classifier emits a confidence score with each label. The policy can require a confidence threshold above a fixed level before firing the deny path. False positives below the threshold still appear on the audit record for later review, which gives the deployer a feedback loop for classifier tuning.

How does the classifier handle non-English prompts?

The classifier ships with multilingual support for the regex- and entity-recognizer-based categories (PII, PCI). Source-code detection is language-agnostic on the file-extension and syntactic-pattern level. PHI requires per-locale tuning because the identifier patterns differ between regimes (HIPAA in the US, GDPR-equivalent in the EU, locale-specific identifiers elsewhere).

Does the classifier itself become a regulated component?

The classifier is part of the AI inspection infrastructure. Its outputs feed audit records the regulator reads. Many deployers treat the classifier configuration (the registered patterns, the confidence thresholds, the locale settings) as part of the AI usage policy artifact and version-control it alongside the policy.