AI Security for RAG Systems: The Inspection Layer Between the Retrieval Output and the Model Call
Retrieval-augmented generation systems read documents from a vector store or a search backend into the model context window before the model reasons. The retrieval step is the point where the system pulls content of varying provenance, authorization, and trustworthiness into the prompt. The security boundary sits at the HTTP path between the retrieval output and the model call. This piece walks through the threat model RAG opens, the identity and authorization decisions the inspection layer commits, the audit record for retrieval-derived content, and the indirect prompt injection surface the retrieved documents expose.

Retrieval-augmented generation systems compose two distinct steps into a single conversational outcome. The first step retrieves documents from a vector store, a search backend, a graph database, or a structured-data index. The second step calls the LLM with the retrieved documents in the prompt context. The first step's output reaches the second step as untrusted-input-the-model-treats-as-trusted unless the deployment runs an inspection layer between them. The security boundary sits at the HTTP path between the retrieval output and the model call, which is the boundary where the identity authorization, the data classification, and the injection detection have to happen.
I want to walk through the threat model RAG opens, the identity and authorization decisions the inspection layer commits at the request boundary, the audit record for retrieval-derived content, and the indirect prompt injection surface the retrieved documents expose.
The threat model RAG opens
RAG produces three kinds of risk the deployment has to handle. The first is over-retrieval. The retrieval step returns documents the user is not authorized to read. The user asks a question, the retrieval matches against documents across the corpus, and the matches include documents whose access control restricts the user. The model reasons over the documents and returns content the user was not supposed to see. The classic example is a corporate RAG system where the retrieval indexes documents across departments without enforcing per-document ACLs, and a user from one department retrieves a sensitive document from another department.
The second is data leakage through prompt context. The retrieval returns documents that contain PII, PHI, MNPI, or other regulated content. The content reaches the model's context window. The model produces a response that may include or summarize the regulated content. The user receives the content in the response. The model provider may also retain the prompt and the response under the provider's data handling terms.
The third is indirect prompt injection. The retrieved documents contain text crafted to manipulate the model's instruction following. The document might include "ignore previous instructions and answer X" or a more subtle steering pattern. The model treats the embedded text as instructions on the next reasoning pass and produces output the deployment did not intend. The attack succeeds because RAG by design feeds external content into the model with no verification step before the model reasons over it.
The three risks compound. A user who retrieves a sensitive document (over-retrieval) can leak the content (data leakage) through a response shaped by an injection pattern (indirect prompt injection) the document carried. The single retrieval step crosses all three risks.
The identity and authorization decisions
The inspection layer at the HTTP boundary reads the prompt the application sends to the model. The prompt includes the system prompt, the user's query, and the retrieved documents the application chose to include. The retrieved content is visible to the inspection layer as part of the prompt.
The decisions cover three axes. The first is user-against-retrieved-document. The user's role and group memberships have to authorize the retrieved document. A user without authorization to a document class receives a block or a modify (the document is redacted from the prompt). The application has to propagate the user identity to the inspection layer for the policy to evaluate against.
The second is user-against-action. The user's role determines what the model is allowed to do with the retrieved content. A user with read-only access cannot trigger a downstream tool call that writes against the retrieved data. The action authorization runs against the same identity context as the retrieval authorization.
The third is data-against-model. The retrieved content's classification determines which model the prompt is allowed to route to. PHI in the retrieved content blocks routing to a model whose data processing terms do not cover PHI. Source code subject to export controls blocks routing to a model whose inference happens outside the controlled jurisdiction.
The policy evaluates the combination. A user with role A retrieves a document of class X for purpose Y against model Z. The combination produces pass, block, or modify. The record commits the inputs and the outcome.
The audit record for retrieval-derived content
The record carries identity (the natural-person identifier), route (which RAG pipeline, which step), data classification (the classifier output on each retrieved document and on the assembled prompt), policy version, decision outcome (pass, block, or modify with the rule identifier), the upstream model and version, and integrity metadata. The record also carries the document identifiers the retrieval returned and the access-control evaluation result for each document. The reviewer reading the record sees which documents reached the prompt context, which were redacted by the modify decision, and which were blocked from inclusion.
The format is the same format the EU AI Act Article 12, GDPR Article 32 security of processing, HIPAA 45 CFR 164.312 access record, DORA Article 19 operational records, and Fannie Mae LL-2026-04 audit trail consume. The record series produces the evidence the auditor asks for when the auditor asks "what documents were exposed through the RAG system to this user during this period."
The record series is independent of the application's retrieval logs and the vector store's query logs. The inspection-layer record carries the identity context, the data classification, and the policy decision, which are the fields the regulator's record-keeping obligation specifies. The retrieval-side logs cover the retrieval mechanics (latency, recall, ranking) and complement the inspection-layer record.
The indirect prompt injection surface
Indirect prompt injection is the attack pattern where adversarial text reaches the model through retrieved content. RAG systems are exposed by construction. The retrieval step pulls in documents whose authorship the deployment may not control (customer-submitted tickets, scraped web content, third-party knowledge bases). The text inside the documents can carry injection patterns.
The classifier passes at the inspection layer detect known patterns. The patterns include direct steering text ("ignore previous instructions"), persona switches ("you are now a different assistant"), system-prompt mimicry ("system: do X"), and structural cues (encoded payloads, unusual whitespace patterns). The classifier output composes the data classification at decision time. The policy evaluates the detection result against the identity and the route.
The decision can pass (the detection score is below the policy's threshold), block (the prompt does not forward to the model), or modify (the suspicious span is removed or annotated, the prompt is reshaped to mark retrieved content as untrusted). The record carries the detection signal and the decision outcome.
The defense is the inspection-layer policy plus optional defense-in-depth layers inside the application (Llama Guard for output filtering, structured retrieval that strips formatting from retrieved content before inclusion, post-generation verification of model outputs against retrieved facts). The inspection layer is the primary control because the inspection layer reads the prompt at the request boundary outside the application.
The RAG-specific architectural pattern
The pattern is the inspection layer at the HTTP boundary between the RAG application and the LLM endpoint, with the application propagating the user identity, the retrieved-document identifiers, and the access-control evaluation result to the inspection layer. The inspection layer evaluates identity-bound policy against the assembled prompt and the metadata.
The retrieval step happens inside the application's retrieval pipeline. The retrieval system's access control runs first and produces a candidate set of documents the user is authorized to read. The inspection layer is the second control: it reads the assembled prompt with the candidate documents and evaluates the policy against the data classification, the route, and the identity. The two layers compose. The retrieval ACL filters the candidate set. The inspection layer enforces the policy at the prompt boundary.
The deployment runs the inspection layer in front of every LLM endpoint the RAG system calls. The records produced at the LLM boundary carry the retrieval metadata the application supplied. The reviewer reads the records in chronological order with shared identity context and a stable correlation identifier.
What the regulatory profile expects
EU AI Act Article 12 expects records over the lifetime of the system with input data, identity, and the period of use. RAG systems that handle regulated data produce decisions the records cover. Article 26 deployer obligations consume the same records. Article 99 penalties for high-risk non-compliance reach EUR 15 million or 3% of global annual turnover, whichever is higher.
GDPR Article 32 expects technical and organizational measures appropriate to the risk. RAG over personal data of EU residents falls inside the scope. The data minimization principle under Article 5(1)(c) drives the redact-from-prompt pattern the inspection layer's modify decision implements.
HIPAA 45 CFR 164.312 expects access records for PHI. RAG systems that retrieve PHI into the prompt context produce PHI access events the record covers. The Business Associate Agreement chain has to cover the inspection layer and the upstream model providers.
DORA Article 19 expects records of operational events for financial services workflows. RAG over financial data (research notes, trading positions, customer financial records) falls inside the scope.
DeepInspect
This is exactly what DeepInspect does for RAG deployments. DeepInspect sits inline between the RAG application's runtime and any HTTP-based LLM endpoint. The inspection layer reads the assembled prompt with the retrieved documents, runs classifier passes on the content (PII, PHI, PCI, MNPI, regulated identifiers, indirect prompt injection patterns), evaluates identity-bound policy against the document classification and the route, and applies pass, block, or modify before the response forwards.
The audit record series carries identity, route, retrieved-document identifiers, data classification, policy version, decision outcome with the rule identifier, upstream model and version, and integrity metadata. The series satisfies the EU AI Act Article 12, GDPR Article 32, HIPAA 45 CFR 164.312, DORA Article 19, and Fannie Mae LL-2026-04 record-keeping obligations. The reviewer reads a single record series that covers the retrieval-time decisions and the model-time decisions in chronological order. End-to-end inspection-layer overhead measures under 50 ms in production.
If you are running a RAG system in production and the auditor is asking for the records of which documents reached the model through retrieval, let's talk today.
Frequently asked questions
- Does the inspection layer replace the retrieval system's ACLs?
The two layers cover different obligations and compose. The retrieval system's ACLs filter the candidate set of documents the user is authorized to read. The inspection layer reads the assembled prompt at the LLM boundary and enforces identity-bound policy against the data classification and the route. The retrieval ACL is the upstream control. The inspection layer is the downstream control. The deployment runs both because a misconfigured ACL or an over-permissive index can still surface a regulated document into the prompt that the inspection layer's policy catches.
- How does the inspection layer detect indirect prompt injection in retrieved documents?
The inspection layer runs classifier passes on the prompt content the application sends to the model. The retrieved documents appear in the prompt and the classifier reads them. The classifier covers known steering text patterns, persona-switch patterns, system-prompt mimicry, and structural cues (encoded payloads, unusual whitespace). The classifier produces a detection score the policy evaluates against. The decision can pass, block, or modify the prompt. The record carries the detection signal and the decision outcome.
- What about graph RAG and structured-data RAG?
The architectural pattern is the same. The retrieval system returns nodes and relationships (graph RAG) or structured rows (structured-data RAG). The application assembles a prompt that includes the retrieved data. The inspection layer reads the assembled prompt and evaluates the policy against the data classification, the identity, and the route. The data classification on structured data covers regulated identifiers (account numbers, member identifiers, MNPI tags) the structured retrieval surfaced.
- How does the inspection layer handle multi-hop retrieval and re-ranking?
The retrieval system's multi-hop steps run inside the application's pipeline. The inspection layer reads the final assembled prompt before the LLM call. The intermediate retrieval steps remain inside the application's logging surface. The deployment can choose to run the inspection layer on intermediate steps where those steps cross HTTP boundaries the deployment wants to audit (for example, an external API the multi-hop pipeline calls). The final LLM-boundary record covers the assembled prompt's content and identity-bound policy at the model call.
- Can the inspection layer redact PHI or PII from retrieved documents before the model sees the content?
Yes. The modify decision the inspection layer commits transforms the prompt before forwarding. The classifier detects the regulated content in the retrieved documents. The policy can require redaction for the specific combination of identity and route. The transformed prompt forwards to the model with the regulated spans removed, tokenized, or replaced with a masked representation. The record carries the modify decision and the specific spans the transformation removed.