How is LLM02 different from LLM01 (prompt injection)?

LLM01 covers the prompt-side attack: an attacker manipulates the prompt to make the model do something the operator did not authorize. LLM02 covers the response-side attack: the model produces content that the application then trusts and passes to a sink that executes or renders it. The two categories often chain (LLM01 sets up the injection, LLM02 is the path to impact), but the controls are different.

Does output filtering solve LLM02?

Partially. Filtering removes known-bad patterns at the response boundary. It does not handle every possible sink combination, and it does not catch novel patterns that match no signature. The defense-in-depth pattern combines filtering, sink-aware encoding at the application layer, and an audit record that lets post-incident forensics reconstruct what passed through.

Where should the trust boundary actually sit in an LLM application?

The trust boundary should sit between the model output and any downstream sink that executes, renders, or transmits content. The model output is user-controlled input until it is classified and policy-decided. The application's existing input-validation patterns should be extended to model responses as if they were user input.

Does LLM02 apply to RAG applications?

Yes, and the surface is broader. RAG applications build the prompt from retrieved documents, which means tool-output injection is in scope. The retrieved-document path is a third vector for attacker-controllable content into the model. Response handling on the RAG output side has to assume the prompt was potentially poisoned.

How does the gateway interact with WAF or RASP layers?

The gateway is upstream of the application's downstream sinks. A WAF or RASP layer is typically deployed in front of the application or inside the application runtime. The two are complementary: the gateway classifies the model response before the application sinks it; the WAF or RASP catches sink-level patterns the application missed. The audit record from the gateway is useful evidence for WAF

OWASP LLM02: Insecure Output Handling and the Trust Boundary Most Apps Get Wrong

OWASP LLM02 in the Top 10 for LLM Applications covers insecure output handling. The category name reads like a software-engineering hygiene issue. The mechanism is not. LLM02 describes a trust-boundary error specific to how LLMs are being integrated into application stacks: the application treats the model response as already-validated content and passes it to a downstream sink (a database query, a browser context, a shell, an outbound HTTP request) without classification or sanitization. The result is the LLM becomes an attacker's hands, turning a prompt-injection or capability-extraction attack into a server-side or client-side execution in the host application.

I want to walk through the LLM02 categories, the architectural trust error most applications carry forward from web-app development without re-evaluating, and the gateway-layer controls that contain blast radius before the bad output reaches a sink.

What LLM02 covers

LLM02 names four common output paths where insecure handling produces concrete attack outcomes. The first is downstream code execution: model output flows to a shell, eval, exec, or similar sink. A prompt that elicits or injects a shell command from the model produces command execution in the host. The second is downstream data store injection: model output flows into a SQL query, NoSQL query, or document store mutation. A prompt that produces a crafted string yields SQL injection or document injection. The third is downstream browser rendering: model output flows into HTML that the user's browser renders. A prompt that produces script content yields cross-site scripting. The fourth is downstream HTTP egress: model output drives a webhook target, a redirect URL, or an outbound API call. A prompt that produces a chosen URL yields server-side request forgery.

In each case the model is not the vulnerability. The application's trust boundary is. The model is being treated as a trusted producer of content when it is closer to a user-controlled input source.

The trust-boundary error

Application security has decades of received wisdom about input validation. Never trust user input; sanitize at the trust boundary; encode output for the destination context. The wisdom was written when the trust boundary was straightforward to draw: the HTTP request from the browser is the input; the application layer is the trust boundary; the database, the rendered HTML, and the outbound HTTP are the sinks.

LLM integration scrambles the boundary. Model output looks like content the application produced. The path from prompt to model to application to sink runs entirely through the application's own code. Developers carry forward the trust they had in their own functions and apply it to the model output by default. The application becomes a confused deputy that treats prompted-from output as application-generated content.

The boundary is in the wrong place. The model is a user-controlled input source whenever the prompt is influenceable by an attacker. That includes direct prompt injection (the user types the prompt), indirect prompt injection (the prompt is built from documents the user can poison), and tool-output injection (the prompt is built from a tool response the user can influence). In all three cases, the model output is downstream of attacker-controllable text.

Why model guardrails do not close LLM02

Model guardrails are designed to refuse certain output classes (illegal content, dangerous instructions, sensitive personal data). They are not designed to enforce the syntactic constraints downstream sinks require. A model can produce well-formed SQL that is also injection-malicious without violating any guardrail policy. A model can produce well-formed HTML that contains script tags without violating any policy. The guardrails are policy-shaped; the sinks are syntax-shaped.

Model guardrails are also not a security control under any reasonable threat model. They are probabilistic and trained, not deterministic and enforced. An attacker that controls part of the prompt can drive a model to produce content that the application then trusts. The architectural fix is at the application's trust boundary, not inside the model.

Gateway-layer controls

A policy gateway at the AI request and response boundary contains LLM02 blast radius in three ways.

First, response classification with sink-aware policy. Every model response is parsed for content classes the application's sinks would treat as code: SQL keywords, shell metacharacters, script tags, URL schemes, base64-encoded payloads. The response carries the classification result downstream as policy input. Applications that sink the response to a shell or database can require a clean classification or apply destination-specific encoding before the sink.

Second, identity-bound egress policy. Tool responses that an agent will pass to an egress-capable downstream tool (an email-send, a webhook, a file-write) are bound to a policy that restricts what classes can propagate. A response classified as containing internal data cannot be passed by the agent to an external webhook in the same session. This is the data-exfiltration-via-tool-output control point the agentic Top 10 framework names.

Third, per-decision audit. The response classification, the policy applied, and the destination sink the application reported are committed to a per-request audit record. When an LLM02 incident happens (a database mutation that should not have, a script tag that rendered, a webhook to an attacker domain), the record reconstructs whether the response triggered the policy, what classification was assigned, and whether the destination sink reported its handling.

Where the gateway boundary ends

LLM02 contains attacks that ultimately execute in the application's own host. The gateway sits in front of the model and behind the application's sink choice. The gateway cannot see whether the application sanitized the response before sinking it; the application owns that.

What the gateway can do is signal to the application what it is receiving. A response with a high SQL-injection signal arriving with a sink_advisory field that names the sink classes the application should sanitize for is more useful than a response with no classification at all. The application still has to act on the advisory. The gateway gives it the information to act.

This is the same boundary that applies to web-app input validation: the WAF or proxy can flag a suspect payload, but the application's own sanitization at the sink is still required. The gateway is an additional layer that catches a class of errors the application would otherwise carry into production.

DeepInspect

This is the gap DeepInspect closes for the response side of the LLM02 problem. DeepInspect sits inline between the model and the calling application, classifies every response against a configurable taxonomy that includes the LLM02 sink classes (SQL, shell, script, URL), and writes a per-decision audit record that captures the classification, the policy applied, and the request context.

The architecture is identity-aware: the response classification is tied to the calling principal, which means policy can differ between an internal developer prototype session and a customer-facing endpoint without changing application code. The audit record commits before the response returns to the application, so the evidence persists even if a downstream incident takes the application down.

For platform teams that are mapping LLM02 controls against the agentic Top 10 and the OWASP AISVS chapter-6 verification claims, the gateway covers the runtime response surface. The application still owns sink-side sanitization; the gateway gives the application a classified response and a policy decision to act on.

If you are reading the LLM02 category against your current AI integration architecture and finding the trust boundary in the wrong place, let's talk today.