Prompt injection
Prompt injection is an attack where untrusted input gets concatenated into the LLM context window and the model treats that input as instructions rather than as data. The attacker controls part of the input, so the attacker controls part of the output. OWASP catalogs this as LLM01 in the LLM Top 10 (2025 update). Direct injection puts the payload in the user prompt; indirect injection hides it in a document, web page, email, or tool response that the LLM later reads.
How prompt injection works
The LLM treats every token in its context window with the same weight, so an instruction the developer wrote and an instruction an attacker smuggled in compete on equal footing. When the attacker payload appears later in the context (closer to the generation point) or carries higher salience for the task, the attacker payload wins. The model has no native trusted-vs-untrusted text channel inside the context window.
Direct prompt injection arrives through the chat field, an API parameter, or a tool argument. Indirect prompt injection arrives through a document the LLM retrieves, a webpage the LLM browses, a calendar invite the agent reads, a code review the LLM summarizes, or a Slack message a chatbot ingests. The OWASP LLM01 entry treats both as variants of the same root cause.
Where the enforcement layer sees it
A policy decision point that sits between the authenticated caller and the LLM endpoint inspects every prompt payload before the LLM sees it, classifies the data inside the context window, and applies per-route rules. Model-side guardrails reduce indirect injection attempts probabilistically through training. External policy enforcement makes a deterministic pass or block decision based on identity, route, and data classification, producing a per-decision audit record either way.
Related reading
- Prompt Injection in Production: Where It Happens, What It Costs, and How To Prevent It at the Request Boundary
Prompt injection is the class of attacks where adversarial content in a prompt overrides the application instructions or extracts data the model was not authorized to reveal. The attack surface includes direct user prompts, indirect injection through retrieved documents and tool results, and chained injection through agent loops. OWASP has consistently ranked prompt injection as the top LLM vulnerability. This piece walks through the attack mechanisms in production, the failure modes of model-side defenses, the request-boundary controls that produce a defensible posture, and the audit record format that holds up after an attempt is detected.
- Indirect Prompt Injection: How RAG and Tool-Use Pipelines Get Compromised Through Retrieved Content
Indirect prompt injection is the attack pattern where adversarial content reaches the model through a retrieved document, a tool result, or any other source the model treats as part of its context. The attacker never interacts with the application directly. The injection succeeds when the model executes the embedded instructions on the next retrieval or the next agent loop iteration. RAG pipelines and tool-using agents are exposed by construction. This piece walks through the attack mechanics, the surface area in production deployments, why the model alone cannot defend, and the request-boundary controls that produce a defensible posture.
- OWASP LLM01 Prompt Injection: The 2025 Update and What the Inspection Layer Enforces
OWASP LLM01 captures both direct and indirect prompt injection in a single category in the 2025 update. The architectural reason is that the control point is the same: the request boundary. Application-side defenses fail by construction because the application cannot tell which spans of the prompt the model treats as instructions. Model-side defenses fail because refusal training is probabilistic. This piece walks through the LLM01 attack surface, the inspection-layer controls that produce a defensible posture, the audit record that survives review under EU AI Act Article 12 and DORA Article 19, and the deployment pattern that fits a production AI stack.