← Blog

LLM Response Redaction Patterns: How to Filter Model Output Without Breaking the Response

The prompt is the input the gateway inspects before the model sees it. The response is the output the gateway inspects before the caller sees it. Response redaction runs against free-form generated text, which is a harder inspection problem than prompt classification. This piece walks through the redaction patterns that hold up on the response side: token-boundary preservation, semantic-preserving substitution, structured-response filtering, and the audit records that prove the filter ran. The patterns apply to the LLM DLP layer of any inline gateway.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
AI Security Solutionsllm-dlpresponse-redactionai-securityai-dlpai-gatewaydata-leak-prevention
LLM Response Redaction Patterns: How to Filter Model Output Without Breaking the Response

Zscaler's ThreatLabz 2026 AI Threat Report, published June 17, 2026, found a 93% year-over-year jump in employees moving enterprise data into AI tools, 18,033 TB in total, and 410M+ ChatGPT DLP policy violations, up 99% year over year. Prompt-side DLP catches part of that flow. The response side is where the model's output carries data back to the caller, and response-side inspection is a harder problem than prompt inspection. I want to walk through the redaction patterns that hold up when the target is generated text, the failure modes when the filter breaks token boundaries, and the audit records that prove the filter ran.

The four response redaction failure modes

Response-side redaction has four failure modes that appear in production. Token-boundary corruption. Semantic drift. Structured-response breakage. False-negative regex miss.

Token-boundary corruption happens when the redaction cuts across a token the model returned. The caller receives a response with a truncated token in the middle, which either fails to parse (for structured responses) or renders as nonsense (for prose responses). The redaction was correct as a substring operation but wrong as a token operation.

Semantic drift happens when the redaction replaces content with a placeholder that changes the response's meaning. The model returned "the customer at 555-1234 requested a refund." The redaction produced "the customer at [REDACTED] requested a refund." The response is safe, but the operator downstream that processed the number for routing loses the routing signal.

Structured-response breakage happens when the redaction operates on the raw text of a JSON, YAML, or code response and breaks the structure. The response was valid JSON. The redaction produced text that no longer parses as JSON.

False-negative regex miss happens when the regex pattern the filter uses does not match the format the model returned. The model returned a credit card number with alternating hyphens and spaces. The regex expected only hyphens. The filter passes the response, and the caller receives the number.

The token-boundary preservation pattern

Token-boundary preservation runs the redaction against the tokenized response, not the raw text. The gateway tokenizes the response the same way the model does, identifies the tokens that carry the sensitive content, and replaces the token range with a placeholder token that preserves the boundary.

The pattern requires the gateway to use the same tokenizer the model uses. For OpenAI models, the tokenizer is tiktoken. For Anthropic models, the tokenizer is Anthropic's proprietary format that the SDK exposes. For Bedrock models, the tokenizer depends on the underlying model family.

The replacement produces a response the caller can parse without corruption. The placeholder is a valid token in the tokenizer's vocabulary. The caller sees "the customer at XXXX requested a refund" or "the customer at [PHONE] requested a refund" depending on the enterprise's placeholder policy.

The audit record captures the token range that was replaced, the placeholder token that was inserted, and the classification that triggered the replacement.

The semantic-preserving substitution pattern

Semantic-preserving substitution replaces sensitive content with a token that preserves the response's routing or downstream-processing utility. The pattern applies when the response feeds another system that needs a stable identifier.

For phone numbers, the substitution replaces the number with a hash of the number bound to the enterprise's tokenization key. The downstream system that keys off phone numbers can still route by the hash. The number itself never crosses the gateway.

For email addresses, the substitution replaces the address with a per-request pseudonym that the enterprise's session store maps back to the real address. The downstream system that emails the customer receives the pseudonym and calls back through the session store.

For customer identifiers, the substitution replaces the identifier with a tenant-scoped token the enterprise's own systems recognize. The downstream system that looks up the customer receives the token and resolves the customer through the enterprise's authoritative store.

The pattern trades tokenization complexity for downstream utility. The audit record captures the substitution pair and the enterprise's tokenization key identifier, so the reviewer can verify the substitution held.

The structured-response filtering pattern

Structured-response filtering runs against JSON, YAML, XML, or code responses the model returns. The gateway parses the structure, walks the fields, applies the redaction per field based on the field's classification, and re-serializes the structure.

The pattern requires the gateway to know the response's expected structure. The application declares the response schema when it invokes the model, or the gateway infers the structure from the model's output and the tool the agent called.

For a JSON response with a customer.phone_number field, the filter walks to the field, applies the phone-number redaction, and re-serializes. The response the caller receives is valid JSON with the field redacted. The parse on the caller's side succeeds.

For a code response with a hardcoded credential the model surfaced from training data or from context, the filter parses the code (or matches the credential pattern in a language-aware way), redacts the credential, and re-emits the code. The response compiles or runs on the caller's side without a syntax error from the redaction.

The multi-pass regex plus classifier pattern

The multi-pass filter runs several passes against the response. Pass one applies deterministic regex against known patterns (credit cards, SSNs, phone numbers, email addresses). Pass two applies embedding classification against tenant-bound sensitive-content vectors. Pass three applies a small classifier model against the response.

The multi-pass design reduces the false-negative rate the single-pass regex has. Pass one catches the deterministic patterns fast. Pass two catches the semantically sensitive content the regex missed. Pass three catches the residual patterns the first two passes did not.

Each pass adds latency. The multi-pass filter fits deployments that accept the extra tens of milliseconds against the LLM inference baseline of 500 ms to 5 seconds. Deployments with tighter budgets run passes one and two and defer pass three to a background audit job that reviews the response after delivery.

The audit records that prove the filter ran

The audit records answer three questions the reviewer asks.

Which passes ran on the response. The record captures the filter passes that executed and the classifier verdicts.

Which content was redacted. The record captures the token ranges, the pre-redaction hash of the content, and the placeholder that replaced it. The pre-redaction hash lets the auditor prove the redaction fired without storing the raw sensitive content.

Which caller received the redacted response. The record ties the response to the caller's identity, the request identifier, and the model that produced the response.

The interaction with EU AI Act and HIPAA

EU AI Act Article 12 requires automatic recording of events over the lifetime of the system to support traceability. The response redaction record is the artifact that proves the enterprise's response-side control fired on a specific request.

HIPAA's Security Rule requires audit controls that record and examine activity in information systems that contain PHI. A response that includes PHI redacted at the gateway boundary produces a record the auditor can review without the raw PHI leaving the enterprise. The Business Associate Agreement with the model provider limits liability further because the PHI never reached the caller.

Cloud Radix found that 57% of healthcare professionals use unauthorized AI to process PHI (SOAP notes, diagnostic plans) without a Business Associate Agreement. The response redaction pattern applies to the sanctioned deployment that runs under a BAA, where the enforcement moves from "prevent access" to "control what flows back."

DeepInspect

This is exactly what DeepInspect enforces at the response side. DeepInspect sits inline between users or agents and the LLM APIs they call. For every response, the gateway runs the multi-pass filter, applies token-boundary-preserving redaction, and records the (caller, request, passes, redactions) tuple that satisfies the audit question.

The filter uses the model's own tokenizer to preserve boundaries. The semantic-preserving substitution pattern is available for enterprises that need downstream routing utility. The structured-response filter handles JSON, YAML, and code responses without breaking the format. The audit records land in a hash-chained log the auditor can query per caller or per request.

Book a demo today.

Frequently asked questions

What is the difference between prompt redaction and response redaction?

Prompt redaction runs against the input the caller sent to the gateway, which is text the caller composed or the application generated. The content is predictable and often structured. Response redaction runs against the model's output, which is free-form generated text with variable structure. The response filter has to handle both prose and structured output, which is why it is a harder problem than prompt redaction.

Why does the gateway need the model's tokenizer?

The gateway needs the tokenizer to preserve token boundaries during redaction. A redaction that cuts across a token corrupts the response and either fails to parse (for structured responses) or renders as nonsense (for prose responses). The gateway tokenizes the response the same way the model does and replaces the token range as a unit.

How does semantic-preserving substitution work?

The gateway replaces the sensitive content with a token that preserves downstream utility. For phone numbers, the substitution is a hash bound to the enterprise's tokenization key. For customer identifiers, the substitution is a tenant-scoped token the enterprise's systems recognize. The downstream system routes off the substitution, and the sensitive content itself never crosses the gateway.

How does the filter handle code responses?

The filter parses the code in a language-aware way, matches the sensitive pattern (credentials, keys, tokens), redacts the pattern, and re-emits the code. The re-emitted code compiles or runs on the caller's side without a syntax error from the redaction. The audit record captures the redaction and the language the code was in.

What is the audit-record hash and why is it important?

The audit record captures the pre-redaction hash of the sensitive content, not the content itself. The hash lets the auditor prove the redaction fired on a specific piece of content without the enterprise storing the raw sensitive content. The pattern reduces the audit-log's own sensitivity while preserving the traceability the auditor needs.

How does the pattern interact with a HIPAA BAA?

A sanctioned deployment that runs under a BAA has covered-entity obligations for PHI. The response redaction pattern moves the enforcement from "prevent access" to "control what flows back to the caller." The audit records prove the response-side control fired and land in a log the auditor can review without the raw PHI leaving the enterprise.