Inline Enforcement for Enterprise AI.
DeepInspect is a policy enforcement gateway for enterprise AI. The product sits in the request path between user-facing applications, agents, and any AI endpoint, including OpenAI, Anthropic, Azure OpenAI, Google Gemini, AWS Bedrock, and self-hosted models. Every request traverses the gateway, which evaluates identity, environment, data sensitivity, and the active policy profile before the request reaches the model or tool. The decision is deterministic and version-traceable, produced synchronously on the request path.
The enforcement path is short and predictable. A request arrives at the gateway with headers that identify the calling user, application, and session — attached by the application, which has already authenticated the end user through its own identity provider. The gateway verifies its own access token on the request to confirm the caller is authorized to reach the gateway, reads the identity context the application propagated, retrieves the current policy profile for the route, and runs the deterministic policy engine against the request payload. The engine inspects payload contents for sensitive data classifications, evaluates role-based restrictions, and produces one of the enforcement outcomes: allow, redact, tokenize, or block. The model response returns through the gateway and is captured into the same request record. The outcomes and the inputs that produced them are written to the forensic store with a per-record cryptographic signature before the response is released to the caller. Symmetric response-side enforcement, where the same outcomes apply to the model's reply before it reaches the caller, is on the roadmap.
For every AI interaction, the gateway:
Fail-closed behavior applies at every layer. An unreachable model endpoint returns a block, rather than a silent pass-through. A policy evaluation error returns a block. A policy version past its validity window returns a block. Default-deny is the starting state of the policy engine, and access is granted only where a matching rule exists in the active profile.
Cross-provider governance is a core property of the gateway. The policy layer is decoupled from the provider-specific transport, so a rule written for a Claude deployment applies the same enforcement to a GPT-4 deployment without being rewritten. The same policy profile governs OpenAI, Anthropic, Azure OpenAI, Google Gemini, Bedrock, and self-hosted endpoints uniformly.
DeepInspect enforces AI usage at the point of action.
How Does DeepInspect Enforce AI Policies Inline?
Every request follows the same path: application to gateway to model, then the model response back through the gateway to the application. The gateway handles authentication, policy evaluation, payload transformation, and record writing in a single pass. Sidecars and async processes stay out of the request path to preserve the sub-50ms overhead budget.
For each evaluated request, the gateway answers four questions. Does the caller hold the identity and role required by the policy? Does the request payload contain sensitive data classifications that require redaction or tokenization? Does the active policy profile permit this combination of actor, data, and destination model? Do downstream constraints, like rate limits or destination availability, apply? If any answer produces a block or a transformation, the gateway applies the action deterministically and records the outcome before releasing the response.
How Do Applications Integrate with the Gateway?
Applications integrate by pointing their AI client at the DeepInspect gateway URL. The gateway is payload-agnostic and does not require the application to adopt an OpenAI-compatible schema — whatever request format the application already uses to talk to its model continues to work. The gateway accepts the request, verifies its own access token to confirm the caller is authorized to reach it, reads the end-user identity context the application has attached to the call, evaluates the active policy profile for that route, and forwards the approved request to the upstream model provider. The existing application code stays intact, and the URL is the only configuration that changes.
Agent frameworks that orchestrate multi-step tool use route tool calls through the gateway the same way. Model Context Protocol (MCP) invocations and other tool calls traverse the same path, the gateway recognizes the tool-call payload, applies policy to each individual tool invocation, and returns the approved or blocked response to the orchestrating agent. Per-tool allowlists scope which agent identities can reach which tools, and the data-classification engine inspects tool-call arguments and tool responses on both sides of the MCP server. Tool-access policy evaluates in the same engine that evaluates model-call policy, so a single policy profile governs both dimensions. Agent and MCP tool governance →
Policy profiles are authored in the control plane. A profile groups a set of policies together with their per-role action maps. Policies reference identity claims, data classifications, destination categories, and temporal attributes. When a profile changes, the control plane replays recent production traffic against the draft profile in a staging environment, which surfaces the decisions the new profile would produce before the change takes effect. Promotion to production is explicit and recorded, and the previous profile version stays available for rollback.
Data classifiers run inside the gateway in the request path and use a combination of deterministic pattern detectors and classification models. PHI, PII, PCI, and customer-owned data classes each have dedicated detector profiles. Per-role action overrides let the same policy profile produce different outcomes for different user roles, so a Finance user might receive tokenized data while an HR user receives redacted data and other roles are blocked outright.
What Else Runs Alongside Deterministic Enforcement?
Beyond the deterministic detectors, DeepInspect supports user-defined policies expressed in natural language. A compliance officer describes a governance rule in plain language, and the gateway evaluates it at request time using a configured LLM or SLM. The LLM or SLM is selectable per customer, so the evaluation model stays inside the customer’s preferred trust boundary. Natural-language evaluation is non-deterministic by design, and the evaluation, the inputs, and the decision are captured in the forensic store alongside the deterministic rule evaluations that ran in the same request.
Cost and token usage are tracked per AI interaction. The gateway records the input-token count, output-token count, and model-reported cost for every forwarded request, giving the finance team a picture of where AI spend accumulates and the security team a usage baseline to alert against.
The complete transaction, including the original request, the transformed request, the upstream response, and the transformed response, is written to a customer-configurable object store. Customers choose the storage target and the retention policy that matches their compliance posture. Offline forensic analysis runs on a scheduled cadence against this transaction data to surface anomalous behavior that inline deterministic detection is unable to catch. The offline analysis uses a customer-configured LLM or SLM, so the analysis model stays inside the customer’s preferred trust boundary.
Webhooks in the control plane push audit data to customer-configured endpoints, which lets existing SIEM, data-warehouse, and observability stacks consume the audit stream through standard integration patterns.