Shadow AI Monitoring: What You Can Actually See and Where the Inspection Layer Has To Sit
Most shadow AI monitoring stops at the DNS layer or the CASB. Both miss the actual data leaving the organization because the prompt is the data, and the prompt sits inside an encrypted POST body. This piece walks through the four monitoring layers, what each one sees, where each one is blind, and the inspection architecture that produces evidence an EU AI Act or HIPAA auditor will accept.

86% of IT leaders are completely blind to employee AI interactions (Cloud Radix). 77% of employees who use unauthorized AI tools admit to pasting sensitive business data into the prompts. The detection time for shadow AI breaches sits at 247 days according to the IBM Cost of Data Breach Report, six days longer than the standard breach detection median. The gap between what monitoring tools see and what the organization needs to see is the entire reason that detection window stays open.
I want to walk through the four layers of monitoring most organizations have available, what each one actually inspects, and where the inspection has to sit to produce evidence a regulator or insurer will accept.
Four monitoring layers
Most enterprise networks already have four candidate layers for AI traffic monitoring: DNS resolution logs, network-edge TLS metadata, CASB integration with sanctioned SaaS, and inline inspection at the AI request boundary. Each layer has a different visibility profile. Treating them as interchangeable is the most common architectural mistake.
DNS logs show which domains are resolved by which corporate devices. This produces a list of which employees have used api.openai.com, claude.ai, copilot.microsoft.com, and the long tail of consumer AI tools. The data class is "fact of use." The data leaving the organization remains invisible.
Network-edge TLS metadata adds SNI, certificate fingerprint, byte counts, and connection timing. The AI provider was contacted, the session ran for ninety seconds, fourteen kilobytes were sent and three megabytes were received. The data class is "shape of use." The prompt content remains encrypted.
CASB integration with sanctioned SaaS exposes AI feature usage inside the SaaS product (Microsoft 365 Copilot, Google Workspace Gemini, Notion AI). The data class is "documented enterprise-app activity." Shadow AI usage outside the CASB-integrated products remains unseen.
Inline inspection at the AI request boundary terminates the TLS connection in a proxy under organizational control, inspects the prompt and response, applies policy, and produces a per-decision audit record. The data class is "evidence." The earlier three layers do not produce this.
What DNS-only monitoring sees
A DNS-only deployment shows that an employee resolved api.openai.com from a corporate laptop at 14:32:07. It does not show what was sent, what was returned, or whether the connection succeeded. A security team operating from DNS evidence alone can answer "did this person use ChatGPT this week" and nothing more specific.
The strength of DNS monitoring is coverage. Every corporate device using corporate DNS produces records, which captures the long tail of shadow AI tools the security team has not yet heard of. The weakness is granularity. A regulator asking "what data did this employee send to OpenAI on March 12" gets no answer from DNS logs.
DNS-only monitoring is acceptable as a discovery tool for the first phase of an AI governance program. Treating it as the steady-state control is where organizations get into trouble. The Article 19 retention requirement for high-risk AI system logs requires identification of natural persons involved and detail sufficient to reconstruct risk situations. DNS records satisfy neither.
What TLS metadata adds
TLS metadata adds session-level structure that DNS lacks. The certificate fingerprint identifies the destination service (OpenAI vs Anthropic vs a self-hosted endpoint). Byte counts hint at upload payload size (a 200 KB upload to api.openai.com is unlikely to be a single prompt). Session duration shows how long the conversation ran.
The hard limit is the encrypted payload. Without TLS inspection configured for AI provider domains, the security team can infer activity shape but cannot inspect content. Configuring TLS inspection for AI domains is operationally heavy: it requires certificate pinning bypasses, has implications for application stability when providers rotate certificates, and produces a stream of decrypted traffic that has to be inspected by something.
When organizations do configure TLS inspection for AI traffic, they typically route the decrypted stream into network DLP. Network DLP was built to classify documents flowing through SMTP and HTTP file transfers. It is poorly tuned to inspect the context window of a multi-turn conversation with an LLM. The architecture works for detecting credit card numbers in plaintext. It fails to recognize a source-code snippet pasted into a prompt as a secret leak.
What CASB integration covers
A CASB integrated with the corporate Microsoft 365 tenant exposes Copilot prompts and responses. Integrated with the corporate Google Workspace tenant, it exposes Gemini in Docs and Sheets. Integrated with the corporate Notion workspace, it exposes Notion AI activity. The CASB sees what the SaaS vendor's audit API exposes, which is enterprise-app activity inside the bounds of corporate identity.
This coverage misses three large categories. Personal-account access to consumer AI from corporate devices, where the user authenticates with a personal Google or Microsoft account, bypasses the CASB because the SaaS-side activity is on a different tenant. Direct API calls from engineering tools to api.openai.com or api.anthropic.com using static API keys bypass the CASB because no SaaS sits in the path. Embedded AI features inside SaaS tools that the CASB does not yet integrate with operate outside CASB visibility entirely.
CASB integration covers approved enterprise AI usage well. It is structurally unable to produce evidence about the broader shadow AI population.
Inline inspection at the AI request boundary
Inline inspection terminates the outbound AI traffic in a proxy under organizational control. The proxy decrypts the request, inspects the prompt, applies policy (block, redact, allow), forwards the request to the model API if permitted, inspects the response, and commits an audit record before returning the response to the application.
This is the only layer that produces records containing the verified natural person behind the request, the data classification applied to the prompt, the policy that governed the decision, and the outcome. It is the only layer whose records satisfy EU AI Act Article 19, NIST AI RMF Pillar 3 action lineage, and HIPAA audit-control requirements for AI-handled PHI.
The architectural cost is the operational footprint of a proxy in the AI traffic path. The performance cost is enforcement overhead, which production deployments measure under 50 ms in DeepInspect internal testing. Against the 500 ms to 5-second LLM inference baseline, the overhead is invisible. The compliance cost of not having this layer becomes visible the first time a regulator asks for evidence.
What an integrated monitoring architecture looks like
A monitoring architecture that produces audit evidence combines all four layers. DNS logs cover the discovery surface and catch shadow AI tools the security team has not yet listed. TLS metadata adds session shape for unsanctioned tool sessions where inline inspection cannot be inserted. CASB covers sanctioned SaaS-embedded AI usage at the enterprise-app level. Inline inspection produces the per-decision record at the AI request boundary for sanctioned and unsanctioned tools routed through the corporate proxy.
The four layers serve different purposes and produce different evidence. Stacking them produces a defensible monitoring program. Substituting any single layer for the full architecture produces gaps a regulator or insurer will identify.
DeepInspect
This is exactly what DeepInspect does. DeepInspect sits inline between authenticated users and any HTTP-based LLM endpoint. Every request is inspected against organizational policy: identity context, data classification, sanctioned tool list, per-role permissions. The decision is enforced before the request reaches the model, and a per-decision audit record is committed before the response returns to the application.
The record contains the verified natural person, the policy version, the data class, the outcome, and a tamper-evident signature. For organizations building or refreshing their shadow AI monitoring program ahead of the August 2 EU AI Act enforcement date, the inline inspection layer is the one that turns network observation into regulatory evidence. Book a demo today.
Frequently asked questions
- Can we monitor shadow AI without breaking employee privacy expectations?
Monitoring AI traffic is no different in principle from monitoring email or web browsing for security purposes, and most jurisdictions accept the practice provided the organization gives notice in the employee handbook and the monitoring is proportionate to the security purpose. The disclosure framing matters: a clear notice that AI prompts containing certain data classes will be inspected and blocked at the corporate network boundary is a defensible position. Surreptitious monitoring without notice is the position that creates legal exposure. Coordinate with HR and legal on the disclosure language and the data-retention period before deploying inline inspection.
- Does monitoring at the AI provider's audit log replace inline inspection?
OpenAI Enterprise, Anthropic Claude Enterprise, and similar provider-side audit logs cover only the sessions authenticated to that enterprise tenant. Personal-account access to the same providers from corporate devices is invisible to the enterprise tenant's audit log. Direct API access using non-enterprise keys is invisible. Other AI providers the organization has not yet contracted with are invisible. Provider-side logs are a useful supplementary control for the sanctioned-usage subset. They are unable to substitute for inspection at the organizational network boundary, which is where the full shadow AI surface presents.
- How long should monitoring records be retained?
Retention is set by the longest applicable obligation. EU AI Act Article 19 requires at least six months for high-risk AI system logs. HIPAA requires six years for records related to PHI access. Financial-services record-keeping under SEC, FINRA, and EU equivalents extends to seven years or longer. The practical retention period for a regulated organization is the longest single obligation that applies to any data class that may transit AI traffic. Architectures should support seven-year retention as a reasonable default, with the option to extend for jurisdictions with longer requirements.
- What is the smallest viable monitoring stack for an organization just starting an AI governance program?
The minimum viable starting point is DNS logging for discovery, CASB integration for the sanctioned SaaS surface, and inline inspection in front of the one or two AI endpoints the organization has already approved (typically OpenAI and one internal model). This three-layer stack covers approximately 80% of the typical shadow AI surface for a mid-market organization, produces evidence sufficient to satisfy most current audit requests, and provides the runway to expand inline inspection coverage as additional AI tools enter the approved list. The remaining 20% (personal-account access on corporate devices, embedded AI inside SaaS the CASB does not yet integrate) is addressed in the next phase once the foundation is operating.
- Is network DLP sufficient to monitor AI traffic if we configure TLS inspection?
Network DLP can inspect decrypted AI traffic if TLS inspection is configured for AI provider domains, but DLP classification engines were designed to inspect documents and structured data flowing through file-transfer protocols. The classification of prompt content (a paragraph of natural language, possibly including code, possibly including names, possibly including business context) is a different problem from classifying a CSV upload. Most network DLP tools produce high false-positive rates on prompt traffic and miss the AI-specific data classes (vector embeddings of sensitive content, model fingerprinting attempts, prompt injection payloads) entirely. Network DLP is the right tool for the problem it was built for. AI request inspection is a different problem that benefits from purpose-built tooling.