PII detection

PII detection is the classification step that identifies personally identifiable information inside a text payload before the payload crosses a trust boundary. Detection runs as a combination of regex patterns (SSN, credit card, phone, email), named-entity recognition models (person names, addresses), and content classifiers tuned for context (a nine-digit number near "SSN" is a stronger signal than a nine-digit invoice number). In an AI gateway, PII detection runs against the decrypted prompt body and feeds the classification verdict into the policy decision.

What PII detection has to recognize

GDPR Article 4 defines personal data broadly: any information relating to an identified or identifiable natural person. The detection model has to recognize direct identifiers (name, SSN, passport, email, IP address) and indirect identifiers (a postal code plus a birth date plus a gender, which together re-identify the subject in most populations). The IBM Cost of Data Breach Report measured customer PII exposure at 65% in shadow AI breaches versus 53% across all breaches. The gap is the prompt content that escapes the network DLP and reaches the LLM in cleartext.

How detection feeds the gateway policy

The classification verdict carries the PII category, the confidence score, and the field offsets. The policy decision point reads the verdict together with the verified subject and the destination route. A route policy that says "EU-resident PII may not travel to non-EU LLM endpoints" gets enforced by combining the detection verdict with the destination geography. A route policy that says "PHI may only travel to BAA-covered endpoints" gets enforced the same way. The detection itself does not block; the policy does. The audit record names both: the classification verdict and the policy version that decided.

What PII detection has to recognize

How detection feeds the gateway policy

Related reading