How to Find Shadow AI Inside Your Organization: A Five-Source Detection Pipeline
Shadow AI lives in the browser tab next to the approved SaaS. The detection stack the security team built for shadow IT does not surface the signal. This piece walks through a five-source detection pipeline (network egress, endpoint telemetry, IdP claims, expense aggregation, approved-route gap analysis), the joining identity that ties the sources together, and the prioritization framework for triaging the patterns the pipeline surfaces.

Cloud Radix found 86% of IT leaders are completely blind to shadow AI interactions. IBM's Cost of Data Breach Report 2026 found that one in five breached organizations experienced breaches linked to shadow AI. The detection gap is widely acknowledged. The detection mechanism is not. Most security teams operate on the assumption that the shadow IT stack (DNS log review, OAuth grant inventory, expense report analysis, SSO integration request review) will surface shadow AI the same way the stack surfaces shadow SaaS. The assumption fails on the AI surface because the AI traffic flows out from corporate-baseline browsers, the consumer AI tools do not ask for corporate OAuth or SSO, and the consumer tier of every major AI tool is free.
I want to walk through a five-source detection pipeline that closes the visibility gap, the joining identity that ties the sources together, the prioritization framework for triaging the patterns the pipeline surfaces, and the migration pattern from off-route shadow AI to approved-route AI usage.
Source 1: Network egress inspection
The corporate network egress carries the AI traffic out to the AI provider endpoints. A web proxy, SASE node, or NGFW that terminates TLS for the egress traffic reads the AI provider POSTs.
The detection rules read three signals on the egress. The first signal is the destination domain. A POST to api.openai.com, api.anthropic.com, generativelanguage.googleapis.com, chat.openai.com, claude.ai, gemini.google.com, perplexity.ai, or any other AI provider domain produces a route candidate. The second signal is the request body classification. A classifier passes over the request body and tags the data classes the prompt reaches: PII, source code, financial data, customer behavioral data. The third signal is the response body classification. A response body that returns content matching a sensitive-data pattern (model summarized customer records back to the user) flags the round-trip.
The egress source produces high-fidelity signal on devices that traverse the corporate network. The source has blind spots on devices that bypass the corporate network (BYOD on home Wi-Fi, mobile data, public networks).
Source 2: Endpoint telemetry
The endpoint side of the same activity is read by EDR or DLP agents that run on the corporate device. The agent reads three additional signals.
The first signal is the clipboard activity. A copy from a source system (Salesforce, the support tool, the warehouse query interface) followed by a paste into a browser tab open to an AI domain produces a clipboard-to-AI signal. The second signal is the form submission to AI domains. The browser-side instrumentation reads the form submission event, captures the form fields, and flags fields containing patterns that match the data classifications the deployer cares about. The third signal is the file upload to AI domains. The browser-side instrumentation reads the file upload event and flags files matching sensitive-file patterns.
The endpoint source produces signal on the corporate device regardless of the network path. The source has blind spots on devices outside the corporate fleet (personal laptops, personal phones).
Source 3: IdP claims and OAuth grants
The IdP surfaces the natural-person identity of every employee. The OAuth grant log surfaces the AI integrations the employee authorized into corporate scopes.
The detection rules read two signals on the IdP side. The first signal is the OAuth grant to AI integrations on corporate scopes. An employee that authorizes an "AI assistant" application to read their corporate calendar or email through the IdP surfaces a grant the security team can review. The second signal is the SCIM provisioning of users into AI tools. A SCIM event that creates the user in an AI tool surfaces the tool the team is using and the user pool the team has provisioned.
The IdP source has high fidelity for AI tools that integrate via corporate OAuth or SCIM. The source has blind spots for consumer AI tools the employee uses with personal accounts.
Source 4: Expense aggregation
The expense system carries the trail of subscriptions employees paid for and then expensed back to the company. The detection rules read three signals.
The first signal is the SaaS line item against an AI vendor. A line item against OpenAI, Anthropic, Perplexity, ChatGPT Plus, Claude Pro, or a vertical AI tool surfaces the personal-tier subscription. The second signal is the recurring monthly amount that matches the AI provider price tier (USD 20 for ChatGPT Plus, USD 20 for Claude Pro, USD 30 for Copilot, USD 20 for Perplexity Pro). The third signal is the line-item description that mentions an AI tool by name.
The expense source has low absolute coverage (most consumer AI usage is free tier and unexpensed) but high signal density on the records that surface. A single expense line is often the first hint of a team's standardized tool.
Source 5: Approved-route gap analysis
The fifth source is internal to the approved AI deployment. The deployer's approved AI route produces a per-decision audit record series via the inspection layer at the AI request boundary. The detection rules compare the approved-route activity per employee to the egress and endpoint activity per employee and surface the off-route gap.
The signal is the delta. An employee who shows three approved-route requests and 47 egress posts to AI provider domains in the same week has a 47-to-3 off-route ratio. The signal is the ratio, not the absolute count. The employee with three approved-route requests and three egress posts is at parity. The employee with 47 off-route activities is the migration candidate.
Joining the sources on a single identity
The five sources produce different event shapes. The joining axis is the natural-person identifier the SSO carries.
The join key is the SSO claim. The egress proxy reads it from the device's authentication state. The EDR agent reads it from the device login. The IdP supplies it natively. The expense system carries the employee identifier on each expense line. The approved-route inspection layer reads it from the SSO claims propagated to the AI request.
The output is a per-employee timeline of AI activity across all five sources, ordered by time, ready for the security team's triage.
The prioritization framework
The triage queue surfaces hundreds of patterns per week in a typical mid-market deployment. The prioritization framework reads four axes per pattern.
The first axis is the data classification the off-route activity touched. PII, regulated-data (PHI, PCI), and source code rank above general-purpose content. The classification comes from the classifier passes the egress and endpoint sources applied.
The second axis is the volume. A user with one off-route activity per month sits below a user with daily off-route activity. The volume reads from the joined timeline.
The third axis is the role. A user in a regulated-data role (compliance, finance, healthcare-adjacent) ranks above a user in a general-productivity role. The role comes from the IdP.
The fourth axis is the migration friction. A user already provisioned on the approved AI tool is a one-step migration (turn off the off-route habit). A user not yet provisioned is a two-step migration (provision plus onboard). The provisioning state comes from the SCIM source.
The triage team works the queue top-down. The pattern with the highest combined score gets the first conversation, the migration, and the policy enforcement at the egress.
DeepInspect
DeepInspect runs as the inspection layer on the approved AI routes and produces the per-decision audit record series. The product terminates the AI provider TLS, reads the request and response, evaluates identity-bound policy per route, applies pass, modify, redact, or block decisions, and commits per-decision audit records to a tamper-evident store. The record series is source 5 of the shadow AI detection pipeline.
The product also integrates with the corporate egress proxy and the EDR telemetry to consume sources 1 and 2, joins the events on the natural-person identifier, and emits the per-employee shadow AI timeline the security team triages. The triage tooling surfaces the prioritization-framework score per user and the migration recommendation.
If your security team is trying to close the 86% blind-to-shadow-AI gap before the next regulator review, let's talk today.
Frequently asked questions
- How long does the discovery pipeline take to surface a baseline?
A typical deployment produces a meaningful baseline within two weeks of the five sources being wired in. The first week captures the steady-state pattern across the employee population. The second week catches the periodic patterns (monthly expense cycles, quarterly review cycles) the first week misses. The triage queue typically begins at week three.
- Does the pipeline require us to terminate TLS on the corporate egress?
The egress source produces the highest-fidelity classification when TLS is terminated and the request body is read. The deployer that does not terminate TLS at the egress can still produce a lower-fidelity signal (the destination domain and the JA3 fingerprint) and rely more heavily on the endpoint and approved-route sources. The five-source pipeline tolerates a missing source as long as at least three of the five remain.
- How does the pipeline handle privacy concerns about reading employee AI activity?
The pipeline reads activity on corporate devices and corporate network paths, both of which are within scope of the typical acceptable-use policy. The discovery records carry the activity classification (data classes touched, route used) and a reference to the source that produced the signal. The records do not store the verbatim prompt content unless the deployer explicitly opts in for higher-risk classifications. The data-protection officer reviews the pipeline scope before deployment.
- What happens after the migration to the approved route?
The approved-route inspection layer produces the per-decision audit record series for the migrated user's AI activity. The user's activity satisfies the AI Act Article 12 record obligation and the deployer's internal AI policy. The off-route signal for the migrated user falls to baseline. The triage queue's job for that user is done.
- Does the pipeline catch a user who uses an AI tool through an in-house chatbot the user built?
The in-house chatbot the user built reaches the AI provider endpoints the same way the consumer AI tools do. The egress source reads the request body the chatbot emits and produces the same signal. The in-house chatbot becomes a route candidate the security team reviews and migrates onto the approved route along with the user.