Is context window poisoning a form of prompt injection?

Context window poisoning is a category of indirect prompt injection. The attacker's content enters the model's context through retrievals, tool responses, or other intermediated paths rather than through direct user input. The model's failure to distinguish data from instructions is the underlying vulnerability that both share.

Can the model be trained to ignore poisoned context?

Model-level training reduces susceptibility but does not eliminate it. Each new generation of models improves on the previous generation's resistance, and each new generation of attacks finds new patterns to evade resistance. The model-level defense is part of the answer; the architectural defense at the gateway is the complement.

How does this attack relate to RAG security?

RAG is one of the primary vectors for context poisoning because the retrieval surface pulls documents that the deployer does not write. The corpus has to be treated as semi-trusted input. RAG security includes the corpus-level controls (authentication on writes, content review, source provenance) and the retrieval-time controls (the gateway redactions described above). Both are necessary.

What is cross-session memory poisoning?

Cross-session memory poisoning is the case where attacker-controlled content lands in the agent's persistent memory in one session and influences later sessions. The attack persists beyond the session boundary. The defense requires that persistent memory writes be subject to the same gateway scrutiny as in-session context, and that memory reads be redacted before reaching the model.

Should agents have access to external URLs?

External URL access is the primary path for vector 4 (external document references). The defensible posture is to allow URL access for a limited allowlist of domains, with content fetched through a gateway that scans for injection patterns before the content enters the window. Unrestricted URL access combined with no gateway scrutiny is the highest-risk configuration.

How does this apply to the EU AI Act?

When the agent is part of a high-risk AI system, the Article 12 automatic logging obligation applies to every call. The gateway's per-call audit record satisfies the logging requirement. A successful poisoning incident that produces an Article 73 reportable outcome will be investigated against the audit record. The supervisory authority's question is what the system did and why; the audit record is the answer.

AI Agent Context Window Poisoning: How a Single Bad Retrieval Steers an Entire Session

An AI agent runs in a context window. The window holds the system prompt, the user's request, the retrieved documents from any RAG step, the tool descriptions, the prior tool calls and responses, and the model's prior outputs in the session. The window is the model's working memory. Every subsequent token the model generates is conditioned on the content of the window. Context window poisoning is the attack pattern where attacker-controlled content lands in the window and steers the model's later decisions. A single bad retrieval can alter the model's behavior for the rest of the session.

I want to walk through the attack vectors that produce poisoned context, the detection signals a gateway can act on, the redaction patterns that prevent the poison from reaching the model, and the audit record that supports forensic investigation when poisoning is suspected.

The attack vectors

Context window poisoning enters the window through one of six vectors.

Vector 1: RAG corpus contamination. The retrieval-augmented generation pipeline pulls documents from a corpus. An attacker who places a document in the corpus (or modifies an existing document) can inject content that the retrieval surface returns for relevant queries. The model treats the retrieved content as context.

Vector 2: Tool response injection. A tool the agent calls returns a response that contains attacker-controlled content. The response goes into the window because the agent's reasoning depends on the tool output. The attacker controls the tool's response either by compromising the tool or by being the source of the data the tool returns.

Vector 3: User input forwarding. The agent forwards user input to the model. If the user is the attacker, or if a user is intermediated by attacker-controlled content (a forwarded email, a shared document, a Slack channel where messages can come from external sources), the input contains injection material.

Vector 4: External document references. The agent fetches documents by URL or document identifier. The fetched content lands in the window. An attacker who controls the URL contents controls the input.

Vector 5: Cross-session memory. The agent uses persistent memory from prior sessions. An attacker who influenced a prior session has planted content that surfaces in later sessions.

Vector 6: System prompt manipulation. The system prompt is constructed dynamically from templates with values pulled from configuration or from per-user state. An attacker who modifies the templates or the state values changes the system prompt the model sees.

How a poisoned context steers the session

The model conditions its decisions on the full window. A single line of poisoned content can influence the rest of the session in three ways.

The model follows embedded instructions. The poisoned content includes imperative text directing the model toward a specific action. The model treats the instruction as part of its context and acts on it.

The model adopts implied premises. The poisoned content presents false facts as background. The model's subsequent reasoning is built on the false foundation. The output is shaped by the implied premise even when the premise is not directly cited.

The model loses focus on the user's actual request. The poisoned content adds a competing task. The model splits attention between the legitimate request and the poisoned task. The legitimate request is partially served while the poisoned task is also pursued.

The three effects compound. A long session with poisoned context drifts further from the user's intent with each step.

The detection signals at the gateway

The gateway sits between the agent runtime and the model and between the agent and any tool calls that flow through HTTP. The gateway observes the content of every prompt, every retrieval, every tool response, and every model output. Four signal classes are useful for detection.

Signal 1: Injection-pattern markers in retrieved content. Documents that arrive in the context contain text with imperative verbs targeting the model, formatting that mimics system-prompt structure, or references to other instructions. The gateway scans for these patterns before the content enters the window.

Signal 2: Cross-source content correlation. The same suspicious string appears in retrieved documents from multiple sources. The pattern suggests a coordinated injection across the corpus rather than a single bad document.

Signal 3: Behavioral drift in the model's outputs. The agent's tool calls in a session diverge from the pattern expected for the user's request type. The gateway compares the in-session tool-call pattern against the expected pattern for the agent and flags anomalies.

Signal 4: Egress targeting analysis. Tool calls that send data to external destinations are evaluated against the expected destinations for the agent. A call to an unfamiliar destination after a retrieval is a signal that the retrieval may have poisoned the context.

Detection at any single signal is imperfect. The combination of signals across the four classes catches most realistic poisoning patterns.

The redaction patterns that prevent the poison from reaching the model

Detection is the first half. The second half is preventing the poisoned content from entering the window in the first place. Three redaction patterns at the gateway layer apply.

Pattern 1: Pre-context filtering of retrieval results. Before retrieved documents enter the context window, they pass through a filter that removes injection-marked sections. The filter operates at the chunk level: a chunk that contains an injection pattern is dropped, with the rest of the document preserved.

Pattern 2: Sanitization of tool responses. Tool responses are processed before reaching the model. Free-text fields are scanned for imperative verbs targeting the model and the matching text is replaced with a placeholder. The model sees that the field contained text but does not see the injection content.

Pattern 3: Egress confirmation for sensitive actions. The agent's action that calls an external destination is confirmed at the gateway against the user's original request scope. The confirmation evaluates whether the action is within the request's authorized surface. Actions outside the surface are denied. The poisoned context that induced the action is contained.

The three patterns operate together. Pre-context filtering reduces the volume of poison that reaches the window. Sanitization handles what slips through. Egress confirmation contains the actions that the poisoned model attempts to take.

The role of structured context boundaries

Models that distinguish "context" from "instructions" by structural markers (a system role versus a user role, a documents block versus an instructions block) are more resistant to context poisoning. The structural distinction is not perfect because the model can still be induced to treat document content as instructions, but the resistance is meaningfully higher.

The gateway can enforce structural discipline. Retrieved documents are placed in a documents block with explicit boundaries. Tool responses are placed in a tool-output block. User input is placed in a user-input block. The system prompt's framing emphasizes that the documents and tool outputs are data, not instructions.

This is defense in depth. The structural framing combined with the gateway redaction produces a posture that catches poisoning across multiple layers.

The audit record the gateway produces

The gateway records every retrieval, every tool call, every model interaction. The record includes the content that entered the context window, the content that the gateway redacted, the detection signals that fired, the policy decisions made, and the model outputs that resulted. The record supports two investigative use cases.

Use case 1: Suspected poisoning incident. A user reports that the agent behaved oddly. The investigator pulls the session's audit record. The record shows the retrieval results, including any that the gateway redacted. The investigator traces the chain from a poisoned retrieval through the model's subsequent decisions.

Use case 2: Corpus-level audit. The investigator examines retrievals over time to identify documents that consistently produce flagged context. The pattern indicates corpus contamination that has to be addressed at the data layer, not only at the gateway.

For regulated deployments under the EU AI Act, the audit record contributes to Article 12 logging when the agent is part of a high-risk AI system. The record supports the post-market monitoring obligations under Article 72 and the incident-reporting evidence chain under Article 73.

DeepInspect

DeepInspect is a stateless policy gateway between authenticated users or agents and any LLM. The gateway observes the full chain of an agent session: the system prompt, the user input, the retrievals, the tool calls, the tool responses, the model outputs. Policy at the gateway can scan retrieved content for injection patterns, redact tool responses, and enforce structural context boundaries. Egress is bounded by the user's original request scope.

For agent deployments that consume documents from a corpus or interact with external tools, DeepInspect provides the architectural layer that catches context poisoning. The gateway sees what the model is about to consume and acts on it before the consumption changes the session.

If you are facing the August deadline, let's talk.