PII detection

PII detection is the classification step that identifies personally identifiable information inside a text payload before the payload crosses a trust boundary. Detection runs as a combination of regex patterns (SSN, credit card, phone, email), named-entity recognition models (person names, addresses), and content classifiers tuned for context (a nine-digit number near "SSN" is a stronger signal than a nine-digit invoice number). In an AI gateway, PII detection runs against the decrypted prompt body and feeds the classification verdict into the policy decision.

What PII detection has to recognize

GDPR Article 4 defines personal data broadly: any information relating to an identified or identifiable natural person. The detection model has to recognize direct identifiers (name, SSN, passport, email, IP address) and indirect identifiers (a postal code plus a birth date plus a gender, which together re-identify the subject in most populations). The IBM Cost of Data Breach Report measured customer PII exposure at 65% in shadow AI breaches versus 53% across all breaches. The gap is the prompt content that escapes the network DLP and reaches the LLM in cleartext.

How detection feeds the gateway policy

The classification verdict carries the PII category, the confidence score, and the field offsets. The policy decision point reads the verdict together with the verified subject and the destination route. A route policy that says "EU-resident PII may not travel to non-EU LLM endpoints" gets enforced by combining the detection verdict with the destination geography. A route policy that says "PHI may only travel to BAA-covered endpoints" gets enforced the same way. The detection itself does not block; the policy does. The audit record names both: the classification verdict and the policy version that decided.

Related reading

  • AI Inline Enforcement Architecture: Where the Policy Decision Sits and What It Has To Commit

    AI inline enforcement runs the policy decision in the request path, before the model API call returns to the calling application. The architecture places a deterministic policy decision point between the application identity and the model endpoint and commits a per-decision audit record before the response forwards. This piece walks through the architectural components, the decision-time data shape, the failure modes the implementation has to handle, and the regulatory profile that the inline placement satisfies (EU AI Act Article 12, NIST AI agent identity and authorization Pillar 2 and Pillar 3, Fannie Mae LL-2026-04, DORA Article 6).