Can we run this at the endpoint instead of at a proxy?

An endpoint agent can cover an application it hooks into but does not cover the full request surface. A proxy at the request layer covers every AI request the enterprise makes regardless of the source application. The proxy topology also does not require agent installation on every developer or user machine.

How does the classifier handle prompts that mix classified and non-classified content?

The classifier operates on the full request body and identifies the specific spans that carry classified content. The redaction transformation replaces the classified spans while leaving the surrounding prompt intact. The AI model receives the redacted prompt and produces a response based on the non-classified content.

Does the inspection layer affect latency?

The classifier and policy engine add latency in the low tens of milliseconds per request. The proxy's own network path adds latency comparable to a regional load balancer. The total added latency is typically under 60 milliseconds. For most AI use cases (where the model itself takes 1-10 seconds), the added latency is not user-visible.

Can we selectively bypass the inspection for specific traffic?

The policy engine supports explicit bypass rules for specific identities, routes, or use cases. Bypass rules run through the same policy change management process as other rules. The audit record captures the bypass as an event. Most organizations avoid bypass rules for regulatory reasons: the auditor treats bypass as a control exception that has to be justified.

How does egress monitoring at the AI layer interact with existing DLP for non-AI traffic?

The two layers run in parallel. The network DLP continues to cover the file transfers, email attachments, and other traditional egress paths. The AI request-layer inspection covers AI-specific traffic. The two layers produce independent audit records that the SIEM can correlate.

LLM Egress Monitoring: Inspecting the Prompt at the Boundary Before It Reaches the Model Provider

Traditional egress monitoring inspects outbound network traffic against a network-DLP catalog. The catalog was designed for file transfers to Dropbox, email attachments to Gmail, and web form submissions to competitor sites. LLM prompts leave the enterprise as HTTPS request bodies to api.openai.com, api.anthropic.com, bedrock-runtime.us-east-1.amazonaws.com, and generativelanguage.googleapis.com. The network DLP inspects the SNI header but the encrypted body remains opaque. Where a TLS-terminating proxy sits in-line, the DLP pattern set does not recognize the prompt content the way it recognizes credit card numbers or file signatures. According to Zscaler's ThreatLabz 2026 AI Threat Report, enterprises moved 18,033 TB of data into AI tools in the past year, up 93% year over year. Only 37% of organizations run any AI governance policy per Netwrix. The gap is architectural.

I want to walk through the failure modes of network-layer egress monitoring on LLM traffic, the inspection-layer architecture that produces prompt-aware egress control, the enforcement decisions the layer supports at each request, and the operational records the layer produces for the security team.

The failure modes of network-layer DLP on LLM traffic

Network DLP tools inspect traffic at the perimeter or at the endpoint. Each deployment topology fails on LLM traffic in a specific way.

Perimeter DLP that sees encrypted TLS traffic cannot inspect the request body. The tool sees the destination hostname (api.openai.com) and the request size. The pattern rules the tool applies to unencrypted HTTP traffic do not apply to encrypted HTTPS traffic. The tool records the destination and the volume, and reports on the aggregate. The tool cannot tell whether the request body contained a customer name, a credit card number, or a source code snippet.

Perimeter DLP with SSL inspection can decrypt the traffic, but the DLP pattern rules were built for structured data. The rules recognize credit card numbers, Social Security numbers, and file signatures. The rules do not recognize the prompt content that carries the customer name inside a sentence like "please summarize the meeting notes from my call with Jane Smith at Acme Corp about their renewal." The sentence carries a customer name and a company name, but the DLP rule set has no pattern that matches the sentence.

Endpoint DLP on the browser sees the request before TLS encryption. But endpoint DLP typically covers keyboard input, clipboard content, and file downloads. It does not cover the JSON payload the browser posts to api.openai.com from a browser extension or from an internal application that authenticates the user's session and posts programmatically.

Endpoint DLP on the developer's IDE cannot cover the AI coding assistant traffic to Copilot or Cursor. Those tools authenticate to a different endpoint and use a different request pattern.

Shadow-AI browser extensions authenticate to consumer AI providers using the employee's personal account. The employee's traffic goes through the browser to chat.openai.com or claude.ai. The endpoint DLP that inspects the browser at the API level does not see the traffic in the way network DLP does.

The upshot is that the network and endpoint DLP layers see a fraction of the LLM egress traffic and inspect a fraction of what they see. The enforcement layer for LLM egress has to sit at the AI request layer specifically.

The inspection-layer architecture

The inspection layer is a stateless proxy that sits between the authenticated user or agent and the AI provider. Every request that leaves the enterprise for an AI provider goes through the proxy first.

The proxy terminates TLS at the enterprise boundary. The proxy sees the request body in cleartext at inspection time. The proxy re-encrypts the request to the AI provider using the enterprise's provider credentials, so the AI provider sees a normal API request.

The proxy runs the classifier on the request body. The classifier recognizes the data classes the enterprise's policy prohibits: customer PII, PHI, PCI data, source code, financial forecasts, legal privilege materials, and other classified content.

The proxy resolves the requester identity. The identity has to come from the enterprise identity provider, not from the AI provider's own account model. The proxy uses the enterprise SSO token or the enterprise-issued API key that binds to a specific service account.

The proxy applies the policy. The policy consults the identity, the target route, the classified data classes in the request, and the applicable rule set. The policy produces a permit-or-deny decision plus an optional transformation (redact the classified data class, replace with a placeholder, or block the request).

The proxy produces the audit record. The record captures the identity, timestamp, target endpoint, model version, classifier verdict, applied policy, decision, and (where relevant) the redaction applied.

The enforcement decisions the layer supports

The inspection layer supports six enforcement decisions per request.

Permit unchanged. The request goes through to the AI provider as sent. Applied when the policy allows the identity, the route, and the data classes.

Permit with redaction. The request goes through, but the classified data classes are replaced with placeholders. Applied when the policy allows the identity and the route but restricts a data class. The redaction is reversible only inside the enterprise's audit record, not at the AI provider.

Deny. The request is blocked at the proxy. The user receives a deny response with the applicable error message. Applied when the policy prohibits the identity, the route, or a data class.

Route to alternative provider. The request is routed to a different AI provider based on the classification. Applied in multi-provider deployments where certain data classes have to go to certain providers (for example, PHI has to go to a BAA-covered provider).

Route to alternative model. The request is routed to a different model on the same provider. Applied where a cheaper or more specialized model is appropriate for the specific request.

Escalate for human review. The request is queued for human review before the response is returned. Applied for high-risk requests where the policy requires oversight.

The decisions are deterministic. The same identity, route, request body, and policy produce the same decision every time. Determinism is a property auditors and regulators expect from an enforcement layer.

The operational records the layer produces

The audit record the inspection layer produces feeds three downstream systems.

The SIEM. The gateway forwards classifier verdicts and policy events as detection signals. The SIEM correlates the AI signals with other security signals in the enterprise.

The data protection function. The privacy officer and the data protection officer use the record series to demonstrate GDPR Article 32 (security of processing) and Article 35 (DPIA) ongoing monitoring.

The compliance function. The CISO and the compliance officer use the record series for SOC 2, ISO 42001, HIPAA, and EU AI Act audit evidence.

The record has to be produced with the tamper-evident properties the audit expects. The record has to sit in storage the application cannot modify, carry a cryptographic integrity signature, and be indexed for the queries the downstream systems run.

The interaction with the identity layer

Egress monitoring at the AI request layer depends on the identity layer producing the requester identity at the inspection point. Two identity patterns are common.

User-authenticated requests. The enterprise SSO token flows through the request. The proxy verifies the token, resolves the user's role and permissions, and applies the policy accordingly.

Service-account authenticated requests. The service account or agent authenticates with an enterprise-issued API key. The proxy verifies the key, resolves the service account's identity and permitted routes, and applies the policy.

Shadow-AI requests where the traffic bypasses the enterprise identity (an employee using a personal ChatGPT account through a personal device on the corporate network, for example) require a different treatment. The perimeter has to detect the traffic to consumer AI endpoints from corporate-managed networks and block it at the network layer. The AI request-layer proxy applies only after the traffic reaches an enterprise-authenticated path.

DeepInspect

The DeepInspect gateway implements the inspection layer described above. The gateway terminates TLS at the enterprise boundary, runs the classifier on the request body, resolves the enterprise identity, applies the policy, produces the audit record, and forwards the request to the AI provider with the enterprise's provider credentials. The gateway supports permit, permit-with-redaction, deny, route-to-alternative-provider, route-to-alternative-model, and escalate-for-human-review decisions per request.

The gateway integrates with Okta, Entra ID, Google Workspace, and other enterprise identity providers. The gateway integrates with Splunk, Datadog, Chronicle, and Sentinel as SIEM destinations. The gateway integrates with the enterprise's Vault or KMS as the credential source for provider API keys.

If your team is replacing network DLP with AI request-layer inspection or building the enforcement layer for shadow AI, book a technical deep dive at deepinspect.ai.