Indirect prompt injection

Indirect prompt injection is the variant of OWASP LLM01 in which the attacker payload reaches the LLM through a document the model retrieves, browses, or ingests, rather than through the user prompt directly. A poisoned webpage, a calendar invite with hidden instructions, a Jira ticket body, a Slack message in a channel a chatbot reads, a code review the LLM summarizes, an email an agent processes for a user. The model treats every token in its context window with the same weight, so the attacker's instructions compete on equal footing with the developer's instructions once retrieval completes.

How indirect injection lands inside an enterprise workflow

A retrieval-augmented generation pipeline pulls source documents into the context window before generation. A customer-support agent reads the support ticket body. A code-review agent reads the diff and the PR description. A meeting agent ingests the calendar invite and the linked agenda. Each retrieval is a write to the context window from a source the developer did not author. When that source is attacker-controlled (and "attacker-controlled" includes any internet-reachable document the agent reads), the attacker's text becomes part of the model's instructions. Refusal training reduces the success rate probabilistically; it does not bound the attack surface.

What inspection at the retrieval boundary requires

Effective defense inspects every retrieval payload, not just the user prompt. A policy decision point at the AI request boundary classifies each chunk of retrieved content for instruction-override patterns, role-reset attempts, exfiltration phrasing, and prompt-leakage probes. The policy decides whether the retrieved content can be added to the context window, whether it should be neutralized first, or whether the request should fail closed. Per-decision audit records capture which retrieval triggered the block and where the chunk came from. That evidence is what holds up later when a regulator reviewing an EU AI Act Article 12 incident asks the deployer to reconstruct what data the system processed for a specific request.

Related reading

  • Indirect Prompt Injection: How RAG and Tool-Use Pipelines Get Compromised Through Retrieved Content

    Indirect prompt injection is the attack pattern where adversarial content reaches the model through a retrieved document, a tool result, or any other source the model treats as part of its context. The attacker never interacts with the application directly. The injection succeeds when the model executes the embedded instructions on the next retrieval or the next agent loop iteration. RAG pipelines and tool-using agents are exposed by construction. This piece walks through the attack mechanics, the surface area in production deployments, why the model alone cannot defend, and the request-boundary controls that produce a defensible posture.

  • OWASP LLM01 Prompt Injection: The 2025 Update and What the Inspection Layer Enforces

    OWASP LLM01 captures both direct and indirect prompt injection in a single category in the 2025 update. The architectural reason is that the control point is the same: the request boundary. Application-side defenses fail by construction because the application cannot tell which spans of the prompt the model treats as instructions. Model-side defenses fail because refusal training is probabilistic. This piece walks through the LLM01 attack surface, the inspection-layer controls that produce a defensible posture, the audit record that survives review under EU AI Act Article 12 and DORA Article 19, and the deployment pattern that fits a production AI stack.

  • Prompt Injection in Production: Where It Happens, What It Costs, and How To Prevent It at the Request Boundary

    Prompt injection is the class of attacks where adversarial content in a prompt overrides the application instructions or extracts data the model was not authorized to reveal. The attack surface includes direct user prompts, indirect injection through retrieved documents and tool results, and chained injection through agent loops. OWASP has consistently ranked prompt injection as the top LLM vulnerability. This piece walks through the attack mechanisms in production, the failure modes of model-side defenses, the request-boundary controls that produce a defensible posture, and the audit record format that holds up after an attempt is detected.