Can the gateway catch every PII leak?

No classifier achieves 100% recall. The gateway's classifier catches the documented PII patterns and any custom patterns the deployment defines. False negatives happen and are part of the residual risk the architecture has to plan for. The defense in depth includes upstream controls (do not put PII in prompts that do not need it) and downstream controls (audit log for post-incident review).

How is LLM06 different from LLM02 (insecure output handling)?

LLM02 covers downstream code-execution risks from the model's output: an output that contains SQL the application executes, JavaScript the browser renders, or shell commands a script invokes. LLM06 covers content the output should not have contained in the first place. The two overlap when the sensitive content is also dangerous to execute (a leaked API key the application then uses).

What does cross-tenant leakage look like in practice?

Most documented cross-tenant cases involve shared retrieval indexes, shared response caches, or shared session stores. A multi-tenant RAG system that does not scope the retrieval to the tenant returns one tenant's documents to another's queries. The architectural fix is per-tenant scoping at every shared resource; the gateway provides the second line of enforcement at the response boundary.

Does the classifier slow responses down?

Modern PII classifiers run in low-millisecond regimes and the work is dominated by network round-trips, not classification. The end-to-end latency overhead is invisible relative to the model's inference time.

Can the model itself be instructed not to leak?

Marginally. Instructions in the system prompt help against accidental leakage and do little against adversarial extraction. The instruction is worth including; it cannot be the only control.

Where does this fit in OWASP AISVS?

OWASP AISVS Chapter 6 (output handling) and Chapter 8 (data protection) cover the verification requirements for the LLM06 surface. The chapters require documented output classification, redaction policies, per-tenant scoping checks, and per-decision logs of classifier outcomes.

OWASP LLM06 Sensitive Information Disclosure: The Output-Side Controls a Gateway Enforces

OWASP LLM06 covers sensitive information disclosure. The Top 10 entry describes the failure mode in data-flow terms: the model emits information the application or the user is not authorized to receive. The disclosure paths split into three architecturally distinct cases. Training-data leakage, where the model surfaces PII or proprietary data it learned during pre-training or fine-tuning. In-context leakage, where the retrieval layer or a tool response pulls sensitive data into the prompt and the model echoes it back. Cross-tenant leakage, where a multi-tenant deployment confuses one tenant's data with another's.

The output-side controls live at the gateway. The gateway sees every response before it reaches the user. The gateway can classify the response, redact identified sensitive elements, block the response entirely, or route the response through a stricter policy based on the calling identity's authorization level. The application can implement output filtering too, but application-side filtering shares the application's failure modes: an application bug bypasses the filter, an application compromise removes it, and a multi-application environment has no consistent place to enforce it.

I want to walk through the three disclosure paths, the output-side controls that work at the gateway, the redaction and routing patterns that hold under attack, and the residual application work that the gateway cannot replace.

The three disclosure paths

The disclosure paths look similar on the wire and require different controls.

Training-data leakage. The model emits content it memorized during training. The content can be PII (an email address, a phone number, a social security number that appeared in training data), proprietary text (a section of a copyrighted document, a verbatim passage from a confidential corpus), or system-level data (an API key, a configuration value that was inadvertently included in training). The leakage happens because the model's training process produced an association between certain prompt patterns and certain content.

In-context leakage. The model emits content that was supplied to it during the current session through retrieval, tool output, or system prompt content. A RAG system retrieves a document containing PII; the model summarizes the document and includes the PII in the summary. A tool call returns a database row; the model formats the row as a response and includes the sensitive columns. The leakage path is the application's own data flow, not the model's training history.

Cross-tenant leakage. The model or the retrieval layer or the cache layer confuses one tenant's data with another's. A multi-tenant RAG system that uses a shared index without per-tenant scoping returns tenant A's documents to tenant B's queries. A response cache keyed on prompt content without tenant scoping returns tenant A's response to tenant B's request. The leakage is an architecture-of-the-platform problem.

The output-side controls at the gateway

Six output-side controls do most of the LLM06 work when enforced at the gateway layer.

Per-identity PII classification on responses. The gateway runs the response through a classifier that flags PII patterns. Identified PII is redacted, the response is blocked, or the response is routed to a stricter policy depending on the calling identity's data scope. The classifier covers names, email addresses, phone numbers, government identifiers, financial account numbers, health identifiers, and any custom patterns defined by the deployment.

Per-identity content-class filtering. Beyond PII, the gateway can flag content classes the calling identity is not authorized to receive: legal advice, medical advice, internal pricing, source code, customer-other-than-self records. The filtering is identity-aware: a support agent identity might receive sanitized records; a customer identity receives only their own records.

Schema-bound responses for tool-mediated outputs. When the response is the output of a tool invocation, the gateway can enforce that the response shape matches the declared schema. A tool that is supposed to return a customer record's first name and last login returns exactly those fields; any additional fields the tool emits are stripped at the gateway. The control protects against tools that return more than they advertise.

Per-tenant scoping enforced at the gateway. The gateway carries the calling identity's tenant context as a first-class part of the request. Responses that contain data from another tenant are blocked. The tenant check is independent of the application's tenant scoping; the gateway is a second line of enforcement.

Output-length and content-shape caps. The gateway can enforce maximum response lengths, maximum number of returned records, and structural caps on how much of a retrieved document gets echoed back. The caps reduce the surface for bulk-exfiltration patterns where an attacker tries to extract a large corpus through a single response.

Audit on every response with the classifier result. Every response produces a per-decision audit record that includes the response classifier outcome, the redactions applied, the policy decisions, and the calling identity. The audit record is the forensic trail for any post-incident investigation of a disclosure event.

The redaction and routing patterns

Three patterns recur in production deployments.

Redact-and-pass. PII patterns are replaced with placeholders in the response before it reaches the user. The response is logged with both the original and the redacted form for audit. The user receives a response that still answers the question but with the sensitive fields scrubbed. The pattern works when the response value to the user is in the structure, not the specific PII.

Block-and-route. When the classifier hits a high-severity content class, the response is blocked entirely and the user receives a generic refusal. The original response is logged for review. The pattern is used when redaction is unsafe (the response leaks information just by existing) or when the calling identity is not authorized to receive any portion of the content class.

Route-by-identity. The same prompt routes to different policy paths based on the calling identity. An identity with elevated authorization receives the full response; an identity with restricted authorization receives a redacted version. The pattern enables one application to serve multiple authorization classes without separate per-class deployments.

What sits outside the gateway boundary

The model's training history is outside the gateway boundary. The gateway cannot un-train data that was memorized. The gateway can only detect and redact the memorized content as it surfaces in responses.

The application's own data-flow design is outside the gateway boundary. If the application chooses to push sensitive context into the prompt without scoping it to the user's authorization, the gateway can detect and redact, but the architectural fix is in the application. The gateway is the safety net; the application is the primary control.

This is one of the cases where the DeepInspect HTTP-boundary rule applies cleanly. If the leakage path is the application reading data from a database and writing it to a file the user has access to, the file write never touches the gateway. The gateway sees the AI HTTP traffic, not the application's file system or database operations.

How LLM06 maps to GDPR, HIPAA, and EU AI Act

GDPR Articles 5 and 32 require that personal data be processed in a manner that ensures appropriate security, and that controllers implement technical measures to prevent unauthorized disclosure. A model that emits PII to an unauthorized recipient is a GDPR Article 5 disclosure event. The output-side controls at the gateway are part of the Article 32 technical measures that the controller can document and rely on.

HIPAA's Privacy Rule restricts disclosures of PHI to the minimum necessary for the disclosure's purpose. A model that surfaces PHI in a response to a user not authorized to receive it is a Privacy Rule violation. The redact-and-pass pattern aligned with the minimum-necessary standard is the architectural answer.

EU AI Act Article 13 (transparency obligations for deployers) and Article 19 (logging) both intersect with LLM06. Transparency requires that the deployer disclose what information the AI system processed and produced; logging requires per-event records sufficient to reconstruct the processing. The gateway's per-decision audit log with classifier results is the artifact that satisfies both requirements for the AI inference layer.

DeepInspect

This is the output-side control DeepInspect provides for the LLM06 surface. DeepInspect sits inline between authenticated users or agents and the LLMs they call, classifies every request and response against PII and content-class policies, redacts or blocks identified disclosures based on the calling identity's authorization, enforces per-tenant scoping as a second line of defense, and writes a per-decision audit record outside the calling application.

The classifier runs at the gateway, which means the response is evaluated before it reaches the user. An application bug that would have surfaced PII to an unauthorized recipient gets caught at the gateway and either redacted or blocked. The per-tenant scoping check runs at the gateway, which means a multi-tenant application bug that confused two tenants' data gets caught at the gateway. The audit record produces the evidence the GDPR controller, the HIPAA covered entity, or the EU AI Act deployer needs for post-incident review.

If you are mapping the OWASP LLM Top 10 controls against your current architecture and your LLM06 coverage depends on the application correctly filtering every response, let's talk today.