← Blog

Prompt Injection Mitigation Techniques: The Eight Controls That Hold Up Under Review

Prompt injection mitigation in production AI deployments splits into eight controls: prompt structure, input classifiers, retrieval-time content evaluation, identity-bound policy enforcement, output classifiers, tool call authorization, conversation-aware state checks, and per-decision audit records. This piece walks through what each control catches, what each one misses, and the architectural layer where each fires. The pattern that holds up under EU AI Act Article 12 and DORA Article 19 review.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Problem-Awareprompt-injectionllm-securityai-securityinline-enforcementpolicy-enforcementaudit
Prompt Injection Mitigation Techniques: The Eight Controls That Hold Up Under Review

Prompt injection mitigation in production AI deployments splits into eight controls that operate at different architectural layers. A defense posture that uses one or two of the controls leaves a real attack surface. A defense posture that combines all eight closes the surface and produces the per-decision audit record EU AI Act Article 12 and DORA Article 19 reviewers will accept. The architectural pattern matters more than any single technique because the attacks shift faster than catalogues can update. The control architecture is what survives.

I want to walk through the eight controls, what each one catches, what each one misses, and where each fires in the request path. The walkthrough names the control, the attack class it addresses, and the layer where it sits.

Why a single control is not enough

The model treats the application instructions, the user input, and the retrieved content as a single context window. There is no architectural boundary the model can use to distinguish trusted spans from untrusted spans. The attacks that exploit the absence of a boundary span direct injection, indirect injection, tool-output injection, multi-turn persuasion, encoded payloads, and authority impersonation. No single control addresses all of those.

The eight-control architecture is the pattern I have seen hold up across enterprise customer support agents, RAG pipelines, agentic browsers, code-assist tools, and internal copilots. The pattern survives because every control has a specific scope, and the controls compose to cover the attack surface without redundancy.

Control 1: prompt structure with role and delimiter discipline

The first control is the prompt construction practice the application team owns. The system instructions sit in a dedicated role (system, developer, or the equivalent the provider exposes). The user input goes into a clearly delimited span. The retrieved content goes into a separate delimited span with a different label. The output schema constrains the model to a typed response.

The control reduces compliance with the simpler payloads. It does not address indirect injection through retrieved content, tool-output injection in agent loops, multi-turn persuasion, or encoded payloads. It runs inside the application's process under the application's custody.

Control 2: input classifier at the request boundary

The second control is a classifier that fires before the prompt reaches the model. The classifier evaluates the user input against pattern matchers for instruction-override phrases, role-reversal framing, encoded payloads, and authority impersonation. The classifier produces a verdict: permit, redact, block.

The control catches the direct injection class. It does not address indirect injection through retrieved content or tool outputs because those enter the prompt from other paths. The classifier sits at the HTTP boundary between the application and the model. The verdict is committed to the audit record before the request continues.

Control 3: retrieval-time content evaluation

The third control is the classifier that fires when retrieved content enters the prompt. The control evaluates each chunk against pattern matchers, classifies the chunk's source trust level, and applies a stricter policy than the user input policy because the retrieved span is attacker-influenced by definition. The architecture I covered in the RAG prompt injection breakdown names the enforcement point.

The control catches the indirect injection class for RAG and document-upload paths. It does not address tool-output injection in agent loops because the tool output enters the prompt through a different path.

Control 4: identity-bound policy enforcement

The fourth control is the identity-aware policy check that fires at the request boundary. The check evaluates the request against the authenticated user's role, the data classification rules that apply to the user's permissions, and the per-route policy. A request that asks the model to access data the user is not authorized to see is denied at the boundary, regardless of whether the request was an explicit prompt or an injection-induced action.

The control closes the post-authentication gap I covered in the inference lifecycle analysis. The check fires for every request. The verdict is bound to the user's identity and committed to the audit record.

Control 5: output classifier and content policy

The fifth control evaluates the model output before the application acts on it. The classifier applies a content policy: catches leaked system prompts, prohibited data classes (PII, PHI, financial NPI), unauthorized tool calls in the response, and outputs that match the disallow list. The verdict fires before the response returns to the user or before the downstream action runs.

The control catches the output-formatting hijack class and the data-class leak attacks where the prompt induced the model to output content the application would have rejected. It does not address attacks where the harmful action already occurred in the agent loop before the response check ran. Subsequent controls address that gap.

Control 6: per-call tool authorization

The sixth control evaluates each tool call the model proposes against per-route, per-role policies. The check fires before the tool executes. The check evaluates the user's authorization to issue the tool call against the specific resource and the specific operation. Tool calls that exceed the user's authorization are denied.

The control closes the connected-tool authorization gap that opens through ChatGPT actions, Claude MCP connectors, Gemini Vertex function calling, and bespoke agent tool sets. The verdict is committed to the audit record with the tool source, the proposed parameters, and the policy that fired.

Control 7: conversation-aware state checks

The seventh control maintains state across turns in a conversation or steps in an agent loop. The check evaluates cumulative intent: a payload that spreads the override across multiple turns is detected because the state object accumulates the evidence. The check also catches the case where adversarial content entered the state from a retrieved document several turns earlier and is now influencing the current decision.

The control catches the multi-turn persuasion class and the LangGraph-style state accumulation attacks I covered in the LangChain prompt injection analysis. The conversation state is stored in the inspection layer's policy decision point, not in the application.

Control 8: per-decision audit record

The eighth control is the audit record itself. Every decision the inspection layer makes produces a record containing the identity, the role, the prompt content (with sensitive spans redacted per policy), the retrieved sources, the tool calls, the policy version, the decision outcome, and a cryptographic signature. The record is committed before the model receives the request or before the application acts on the response.

The control closes the audit-independence requirement EU AI Act Article 12 and DORA Article 19 specify. The application never has custody of the write path. The record stands as evidence in a regulatory inquiry or a customer audit. I covered the format in the AI audit logs format spec analysis.

How the eight controls compose

Controls 1 and 5 sit inside the application. Controls 2, 3, 4, 6, 7, and 8 sit in the inspection layer at the HTTP boundary. The composition produces defense in depth: the application reduces the surface at construction time, the inspection layer enforces policy at runtime, and the audit record stands as evidence outside the application's custody.

Most production deployments I have seen run Control 1 alone. Some add Controls 2 and 5. Few run Controls 3 through 8 without an inspection layer. The architectural piece that closes the gap is the layer at the HTTP boundary.

DeepInspect

This is the architecture DeepInspect was built to provide. DeepInspect implements Controls 2, 3, 4, 6, 7, and 8 in the inspection layer. The application still owns Controls 1 and 5 at the application boundary. The composition produces the deterministic policy evaluation, the identity-bound enforcement, and the per-decision audit record the regulator and the customer auditor will accept.

DeepInspect is model-agnostic, framework-agnostic, and retrieval-agnostic. The same enforcement layer protects ChatGPT, Claude, Gemini, Bedrock, Vertex, and self-hosted deployments. The policy primitives are identical because the attack surface at the HTTP boundary is identical.

If your prompt injection mitigation stops at the application-side filters, the residual surface is broad and the audit record is missing. Run the free AI Readiness Check to see where the gaps sit in your stack.

Frequently asked questions

Which control catches encoded payloads?

Control 2 (input classifier at the request boundary) catches encoded payloads. The classifier canonicalizes the prompt: decodes base64 and hex sequences, strips zero-width Unicode characters, normalizes Unicode forms, and applies pattern matchers against the canonical form. Encoding-based evasion patterns lose their cover. The verdict is deterministic and committed to the audit record.

Which control catches multi-turn persuasion?

Control 7 (conversation-aware state checks) catches multi-turn persuasion. The check maintains state across turns and evaluates cumulative intent. A payload that establishes a benign frame in turn 1, adds detail in turn 2, and requests the prohibited output in turn 3 is detected because the state accumulates the evidence and the policy fires on the cumulative pattern. Single-turn classifiers miss the attack.

Do I need all eight controls if my deployment is internal-only?

The internal-only framing does not change the architecture. Internal users have application credentials and can issue prompts that violate the organization's policy. The Meta March 18 Sev-1 illustrates the pattern. The audit record matters more in internal deployments because the regulator and the customer auditor often arrive after an internal incident has expanded into a disclosure event. Controls 4, 6, and 8 are necessary regardless of the deployment scope.

How does the inspection layer handle deployments with multiple model providers?

The inspection layer is model-agnostic. The same controls fire whether the request routes to OpenAI, Anthropic, Google Vertex, AWS Bedrock, or a self-hosted endpoint. The policy primitives apply identically. The audit record format is identical across providers. The architectural value is the consistency: the same defense posture applies regardless of which model the application picks.

What latency budget does the eight-control architecture require?

End-to-end enforcement overhead from the inspection layer measures under 50 ms in production tests from internal DeepInspect testing. LLM inference itself takes 500 ms to 5 seconds depending on the prompt and the model. The inspection layer adds a 1 to 10 percent overhead on top of the model's own response time. The architecture supports inline enforcement on every request without measurable impact on the end-user experience.