← Blog

PII Detection in LLM Prompts: Classifier Choices and the Per-Request Decision

PII detection on LLM prompts has to operate at request latency, work on free-form text, and produce a deterministic classification that drives a policy decision. The classifier choices fall into three categories: regex and lookup tables, small purpose-trained models, and LLM-based classifiers. Each has a latency and coverage profile. This piece walks through the choices, where each fits, the integration into the AI request boundary, and the audit record the classification produces.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
AI Security Solutionspiiai-dlpllm-dlpclassificationgdprdata-protection
PII Detection in LLM Prompts: Classifier Choices and the Per-Request Decision

PII detection on LLM prompts has to operate at request latency, work on free-form text, and produce a deterministic classification that drives a policy decision at the AI request boundary. The detection has to handle the data shapes that show up in prompts: names embedded in narrative, account numbers in copy-pasted error messages, email addresses in support ticket text, government identifiers in HR queries, and the long tail of regulated-data shapes the organization has in its environment.

The classifier choices fall into three architectural categories: deterministic rules and lookup tables, small purpose-trained models, and LLM-based classifiers. Each has a latency and coverage profile. The integration into the AI request boundary is the same regardless of which classifier runs underneath.

I want to walk through the three classifier categories, where each fits, the integration pattern, and the audit record the classification produces.

Category 1: Deterministic rules and lookup tables

The rules layer covers structured PII shapes: email addresses, phone numbers in common formats, social security numbers, credit card numbers, IBANs, US tax identifiers, EU national identifiers in published formats, and the regular expressions for similar structured data classes.

The latency is in the low single-digit milliseconds per prompt. The classification is deterministic: the same prompt produces the same classification result every time. The coverage is bounded by the rules the operator writes. A new structured PII class requires a new rule.

The precision varies by class. Credit card detection with Luhn validation hits high precision. SSN detection on bare nine-digit sequences hits middling precision because not every nine-digit sequence is an SSN. The rules layer typically pairs with allowlists (e.g., known test SSNs) and contextual qualifiers (presence of "SSN" or "Social Security" near the candidate match) to manage precision.

The recall is bounded by the patterns the rules cover. Free-form prompts where the PII shape is non-standard (e.g., a phone number written as "five five five dash one two three four") fall outside the rules layer.

Category 2: Small purpose-trained models

The purpose-trained classifier layer covers unstructured PII: names in narrative context, addresses, organizational affiliations, demographic indicators, health information described in clinical narrative, and the categories of PII the rules layer cannot reach with regex.

The latency for a small encoder-only classifier (a few hundred million parameters at most) running on CPU is typically ten to thirty milliseconds per prompt at typical prompt sizes. On accelerator hardware the latency can drop further. The classification produces a confidence score per category and an aggregated decision.

The precision and recall trade-off shifts with the training data. Models trained on broad PII corpora generalize across narrative shapes; models fine-tuned on organization-specific data lift performance on the long tail the organization actually sees. The deployment choice is whether to ship a generalist model or to invest in fine-tuning on the organization's own historical prompt corpus.

The classifier output drives the policy decision the same way the rules output does. The downstream policy layer does not care which classifier produced the decision; it acts on the classification result.

Category 3: LLM-based classifiers

The LLM classifier layer uses a model to classify the prompt against the data classes the operator defines. The classifier can handle unusual narrative shapes, multi-class detection in a single pass, and policy questions that mix data classification with intent ("is this prompt asking the model to do something inappropriate with the embedded PII").

The latency is the cost. A small LLM classifier running on CPU is in the hundreds of milliseconds per prompt at typical sizes. On accelerator hardware the latency is workable but the cost-per-request rises. The throughput-per-node is lower than the small-model classifier.

The use case where the LLM classifier earns the latency is the policy decisions that cross-reference content in ways the smaller classifiers cannot. For most production PII detection, the small-model classifier is the right balance. The LLM classifier sits behind it for the cases that need richer evaluation.

The combined architecture

The pattern that holds under production load is a layered classifier. The rules layer runs first; high-confidence matches are decided immediately. The small-model classifier runs on the prompt content that the rules did not resolve. The LLM classifier runs only on the residual cases that need richer reasoning.

The layering keeps the median latency at the rules-layer cost and the tail latency at the small-model cost. The LLM classifier handles the exception cases without affecting the typical path.

The architecture matters because the production AI workload is high-volume. A classifier that adds 200 ms to every request consumes the latency budget the inline enforcement layer is designed to operate within. The layered approach holds the budget while preserving coverage on the exception cases.

Integration at the AI request boundary

The classifier runs inside the AI gateway or enforcement proxy that sits between the application and the LLM API. Every request passes through. The classifier produces a classification result. The policy layer evaluates the result against the policy in effect and produces a decision: allow, redact the detected PII and proceed, or block.

The integration with the calling application is the HTTP request layer. The application makes its normal call to the LLM API; the gateway intercepts; the classification runs; the policy decides; the appropriate request continues to the model or returns the blocked response. The application is not aware of the classifier directly.

The configuration interface for the operator covers the rules, the model selections, the policy bindings (which PII class triggers which decision in which workflow), and the audit retention. The operator configures the policy without touching the classifier internals.

The audit record

The audit record produced for each request includes the classification result alongside the rest of the per-decision record: user identity, timestamp, policy version, classification, decision, outcome. The classification component records which classifiers fired, with what confidence, on which spans of the prompt content.

The audit value is in the forensic reconstruction. When a regulator asks why a specific PII-bearing prompt was allowed (or why it was blocked), the record shows the classification that drove the decision and the policy that was in effect. The record is tamper-evident and under the deployer's control.

For GDPR audit, the record shows the per-request handling of PII. For Article 12 of the EU AI Act, the record satisfies the automatic-recording requirement for the AI system. For internal audit, the record demonstrates that the PII detection control operated as designed across the audit period.

What PII detection does not cover

PII detection on the prompt content covers the inbound data movement to the model. The outbound model response, if it produces PII the user did not provide (because the model hallucinated or because it was prompted to generate it), is a separate detection problem. The same classifier can run on the response. The architecture extends the per-decision pattern to the response side.

PII that moves through channels other than AI prompts (regular email, file storage, conversational messages) is the territory of the traditional DLP. The AI request-layer DLP covers the AI surface specifically.

PII that the model has already learned (from training data or from prior context) is outside the inline classifier's reach. The control there is data hygiene at training and context-window assembly time, not request-layer classification.

DeepInspect

The layered classifier architecture is what DeepInspect runs at the AI request boundary. DeepInspect sits as a stateless proxy between users and agents and any LLM. The rules layer handles structured PII. The small-model classifier handles unstructured PII. The LLM classifier handles the residual cases the smaller layers cannot resolve. The decision produces the per-decision audit record.

The sub-50ms enforcement budget the platform operates within is achieved by the layered architecture: the median request hits only the rules layer; the typical request adds the small-model classifier cost; only the exception cases pay the LLM classifier cost. The audit record commits before the response returns.

For GDPR readiness in EU-facing deployments, for HIPAA PHI detection (which is a specialized PII class with a stricter handling requirement), for EU AI Act Article 12 logging, the PII detection layer is the operational control. Book a demo today.

Frequently asked questions

How is PII detection on prompts different from PII detection on documents?

The document-level DLP classifies a whole document with metadata: source, owner, sensitivity. The prompt-level detection classifies a free-form text snippet without that metadata. The prompt may contain excerpts from multiple sources; the document-level classification of each source is not directly available at the prompt classifier. The architecture has to operate on the text alone, which makes the classifier choice different from the document DLP choice.

Can a general PII classifier cover regulated data classes like PHI and MNPI?

Partially. PHI is PII plus a layer of clinical-context identifiers (diagnosis, treatment, provider, visit identifier) that the general PII classifier may not catch. MNPI is a different kind of class altogether (specific financial information at specific moments) that is not detectable by general PII patterns. Each regulated class typically wants a class-specific layer on top of the general classifier. The architecture supports the layering; the classifier choice is per-class.

What happens to false positives in the classifier output?

False positives produce either an unnecessary block (if the policy is strict) or an unnecessary redaction (if the policy redacts on detection). The operator tunes the policy per data class and per workflow. The tunable parameters include the confidence threshold, the contextual qualifiers, and the allowlist of acceptable values. The audit record captures the classifier confidence so post-hoc tuning can use real production data.

How does the classifier handle multilingual content?

The rules layer is language-agnostic for the structured shapes (credit card numbers, IBANs, well-formed national identifiers). The small-model classifier's coverage depends on training data; multilingual models cover the major languages with varying performance. For deployments with significant non-English prompt volume, the classifier choice should reflect the language mix. Fine-tuning on internal data improves the long tail.

Does the classifier need to update as new PII classes emerge?

Yes. The rules layer extends with new patterns as new structured classes appear in production. The small-model classifier benefits from periodic re-training on a refreshed corpus. The LLM classifier inherits broader updates through model refreshes. The audit record across versions captures which classifier version made each decision, so historical decisions can be re-evaluated against newer policies if the regulatory regime changes.