How does LLM DLP handle API calls to self-hosted models?

The control is model-agnostic. It operates on the HTTP request layer regardless of whether the model is hosted by the public provider (OpenAI, Anthropic, Bedrock, Azure OpenAI, Vertex) or self-hosted inside the corporate environment (Llama, Mistral, Together's hosted endpoints, Anyscale, on-prem inference behind internal VIPs). The architectural property the control depends on is the HTTPS request boundary. Where the model runs is a separate concern.

What if employees bypass the proxy and call the LLM directly?

The deployment pattern uses network routing, identity provider integration, and endpoint configuration to force AI-bound traffic through the proxy. Employees on the corporate network find the proxy in the egress path because the network policy routes traffic to known LLM endpoints through the proxy. Employees off the corporate network use the proxy because the LLM provider's authentication accepts only the corporate-issued credentials, which the proxy issues. Employees who bypass the proxy by using a personal credential and a personal device fall under the shadow AI category, which the proxy detects at the discovery layer (browser extension, CASB integration, network telemetry on AI endpoints) and which the policy treats as a separate violation.

Does LLM DLP work for embedded AI in SaaS tools?

Embedded AI in SaaS tools (the customer service platform that summarizes tickets with an LLM, the pricing engine that scores risk, the productivity suite that drafts content) is one of the hardest cases. The LLM DLP control sees only the traffic that routes through it. SaaS tools that call the LLM from the vendor's environment are visible only at the SaaS API layer where the data leaves the corporate boundary, not at the LLM request layer where the vendor's stack calls the model. The architectural answer is the vendor due care path: contracts require vendor-side logs that the deployer can request on demand, and procurement diligence ensures the vendor produces those logs. The LLM DLP control covers AI traffic the enterprise itself originates.

How does LLM DLP map to NIST AI RMF?

NIST AI RMF Govern function expects policy and accountability. LLM DLP produces the policy enforcement point. Manage function expects incident response and remediation evidence. LLM DLP produces the per-decision audit record that supports both. Map function expects context understanding (use cases, stakeholders, risks). LLM DLP feeds the use case inventory because the proxy sees every AI call and produces the inventory structurally. Measure function expects measurable evidence of system performance. LLM DLP produces per-decision metrics that feed the measurement.

LLM DLP vs Traditional DLP: Why the Two Controls Operate on Different Data Channels

Traditional DLP inspects file movements, email egress, and known data shapes on the corporate network. LLM DLP inspects prompt content and model responses at the AI request boundary. The two controls operate on different data channels, see different evidence, and satisfy different compliance obligations. Cloud Radix found that 77% of employees using unauthorized AI tools paste sensitive business data into unsanctioned models, and 86% of IT leaders are blind to those interactions because traditional DLP was tuned for files and email, not for the JSON request body that carries the prompt content. I want to walk through what each control actually sees, where each one is blind, and how the 2026 compliance set treats both.

What traditional DLP sees

Traditional DLP runs at three points in the enterprise data path: the endpoint agent on the user's laptop, the network appliance at the egress boundary, and the email gateway in front of the corporate mail relay. Each point inspects movements of files and structured data against rules that look for known shapes.

The endpoint agent watches file system events, USB writes, clipboard copies into authorized destinations, and uploads from the local file system to web destinations. The network appliance inspects HTTPS bodies through TLS interception, scanning for byte patterns that match SSN, credit card, or known document fingerprints. The email gateway scans outbound message bodies and attachments against the same rule set.

The rules engine produces a label on the file or the message body: contains PII, contains PHI, contains source code, contains a fingerprinted document. The enforcement decision applies at the file or message level: block the upload, quarantine the attachment, encrypt the email.

That is the architectural scope traditional DLP was designed for, and the architectural scope it satisfies well for file and email egress.

What traditional DLP misses on AI traffic

The architectural assumption traditional DLP makes is that sensitive content moves as files or as structured records. AI traffic violates that assumption.

When an employee pastes 800 lines of source code into ChatGPT, the data travels as an HTTPS POST to api.openai.com with a JSON request body. The endpoint agent sees a clipboard paste into a browser tab; it does not parse the resulting JSON request. The network appliance sees an encrypted HTTPS body; even after TLS interception, the prompt content sits inside a JSON messages array at a depth the appliance's rule engine was not tuned to read. The email gateway is not in the path at all.

Three structural failures recur:

The first is identity correlation. API calls authenticated with personal API keys do not map to corporate identity. A network gate sees a TLS session from a workstation IP to api.openai.com with an authorization header the gate cannot match to the corporate IdP.

The second is data classification. DLP classifies documents and structured records, not prompt context windows. A prompt is not a document. The classification rules that find an SSN inside a PDF do not generalize to finding a PHI segment inside a 20,000-token context window that combines retrieved documents, system instructions, prior conversation turns, and the user's question.

The third is policy enforcement. The Netwrix figure is that only 37% of organizations have any AI-related governance policy in place. Even when the policy exists, the enforcement point traditional DLP provides operates one layer above the LLM API call, which means the decision happens against file or HTTPS body patterns, not against prompt content.

What LLM DLP sees

LLM DLP operates at the AI request boundary. The control terminates the outbound HTTPS session to the LLM provider, reads the structured JSON request body, parses the prompt content out of the provider-specific shape (the messages array for OpenAI-compatible APIs, the input field for Anthropic, the request body for Bedrock invoke calls), and applies classifiers that decide what data classifications the prompt contains.

The control reads identity context from the token the caller supplies. SSO assertions, OIDC bearer tokens, workload identity certificates, and agent identity claims all bind to a verified user or agent inside the corporate identity provider. The classification feeds a policy decision: this identity, with this role, against this data classification, against this model destination, is or is not authorized.

On the return path, the control reads the model response, classifies the response content, and applies the same policy. A response that contains PHI routed to a caller without PHI authorization is redacted or blocked. The decision produces a per-decision audit record that captures identity, role, classification, policy version, and decision outcome.

The architectural property that differs from traditional DLP is that LLM DLP operates on the prompt and response as first-class data fields, not as opaque bytes inside an HTTPS body.

The two controls in the same enterprise stack

LLM DLP does not replace traditional DLP. The two controls inspect different data channels and the enterprise typically needs both. Traditional DLP continues to inspect file movements, email egress, and clipboard activity. LLM DLP inspects the prompt and response on traffic to LLM endpoints.

Traditional DLP catches the user who downloads a PHI document from the EMR and emails it externally. LLM DLP catches the same user who pastes the contents of that PHI document into ChatGPT. The exfiltration channel differs. The control that sees each channel differs.

The deployment pattern that works in 2026 enterprises has traditional DLP at endpoint, network, and email, and an LLM DLP control at the AI API call layer. The two controls share the same identity directory, the same data classification taxonomy, and the same audit store. The policy decisions produce records that reconcile against the same enterprise risk register.

What the 2026 compliance set expects

EU AI Act Article 12 requires automatic logging of high-risk AI system events over the system lifetime. Article 19 specifies the log content: timestamps, input data, identification of natural persons, retention of at least six months. The August 2, 2026 deadline applies. Traditional DLP records file egress events, not LLM request events. The Article 12 obligation requires the LLM request event, with the prompt content, the identity of the caller, and the policy decision attached.

The Fannie Mae LL-2026-04 disclosure-on-demand obligation, effective August 6, 2026 per the Cooley legal analysis, expects lenders to produce evidence of how AI tools handled specific decisions. The record covers the prompt, the response, the model used, the policy in effect. Traditional DLP produces file movement records, not AI decision records.

Texas TRAIGA took effect January 1, 2026. The California AI Transparency Act took effect the same day. The Colorado AI Act takes effect February 1, 2027. Each of these expects evidence at the AI decision layer that traditional DLP architectures cannot produce.

The architectural answer is to deploy both controls and reconcile the evidence they produce against the enterprise risk register. The architectural mistake is to treat the existing traditional DLP investment as sufficient for the AI request layer.

The audit independence property

LLM DLP that operates as a separate proxy from the application that calls the model produces audit records with write-path independence. The application that made the call never had write access to the audit store. The audit record commits before the model response returns to the application. The application cannot suppress the record by crashing, cannot rewrite the record because it has no write access, and cannot selectively log because the proxy logs every decision.

That property is what distinguishes the LLM DLP audit record from an application-internal log. Article 12 expects the log to be admissible as evidence in regulatory review. A log the application controls is a self-attestation artifact. A log written by an independent proxy is the system of record.

Traditional DLP produces an independent audit log for file movements. The same property is expected at the AI request layer, and only an external control at that layer can produce it.

DeepInspect

This is exactly what DeepInspect provides on the LLM DLP side of the stack. DeepInspect sits between authenticated users and agents and any HTTP-accessible LLM endpoint. The proxy inspects the prompt body on the request path and the model response body on the return path, applies identity-aware policy at both points, and writes a per-decision audit record that the application has no write path to.

The classifier covers PHI under HIPAA, PII under GDPR Article 4, MNPI under SEC and FINRA, PCI under PCI DSS, source code, and policy-defined organization-specific classifications. The decision is deterministic. The same prompt under the same policy under the same identity returns the same outcome.

Enforcement overhead runs under 50 milliseconds in internal DeepInspect testing, against LLM inference latency of 500 milliseconds to 5 seconds. The proxy operates alongside the existing traditional DLP investment and does not require ripping out the endpoint or network DLP architecture.

If your traditional DLP investment is mature and you have no inspection point at the LLM request layer, book a demo today.

The traditional DLP architecture can be extended through TLS interception at the network appliance and a rule update that targets AI provider domains. The extension produces a partial inspection: the appliance sees the HTTPS body for traffic that routes through the corporate network, but does not see the prompt content as a first-class field. The depth of the rule engine and the prompt parsing required are not what traditional DLP was tuned for, which means the rule maintenance cost is high and the false-negative rate is elevated. The extension also misses traffic from off-network endpoints (remote workers without the appliance in path, mobile devices, BYOD laptops that the corporate proxy does not cover). The architectural answer is to deploy an LLM-specific control at the AI request layer in addition to the network DLP investment.