Built for Security Teams.
DeepInspect is split across two planes. The control plane handles policy authoring, versioning, and configuration. The enforcement plane runs inline with application traffic and applies the active policy to every AI request. The two planes are separated so a policy change is explicit, reviewable, and version-controlled before it takes effect in production. Operations teams roll forward and roll back policies through the control plane without touching application code.
What Are the Core Components of the DeepInspect Platform?
The platform ships as a deployable gateway, a control-plane service, and a forensic store. The gateway is stateless and horizontally scalable, which allows it to be replicated across availability zones and scaled independently of the control plane. End-user authentication stays with the calling application and its identity provider (OIDC, SAML, or the enterprise’s own SSO). The application attaches the resulting identity context to each call, and the gateway verifies a DeepInspect-issued access token to confirm the caller is authorized to reach it before reading that identity context into every policy evaluation. The policy engine evaluates rules deterministically, so the same inputs always produce the same enforcement outcome across environments.
Failure behavior: Fail-closed by default.
How Does the DeepInspect Gateway Deploy in Production?
Traffic enters the gateway from the application layer on the ingress side. The gateway verifies its own access token to confirm the caller is authorized to reach it, reads the identity context the application has attached to the request, evaluates policy on the request, transforms the payload if required, and forwards the approved request to the downstream model or tool. The model response returns through the gateway, where it is captured into the request record. Each request lands one signed entry in the forensic store at the close of the request lifecycle, before the response is released to the caller. Symmetric response-side policy enforcement against the same rule set is on the roadmap; today, response content is recorded but not enforced inline.
(Policy Enforcement)
Runtime behavior
Deployment modes
Self-hosted deployments run inside the customer’s VPC or on-premises environment and keep request payloads inside the customer network boundary. Cloud-hosted deployments run in DeepInspect’s managed environment for customers that prefer a SaaS operational model. Hybrid deployments place the gateway as a network proxy in front of egress traffic, which fits environments where retrofitting application-level integration is impractical.
Scaling and High Availability
Gateway instances are stateless and interchangeable. A fleet of gateway pods behind a load balancer scales horizontally with request volume, with each pod holding a read-only copy of the active policy version in local memory. Policy updates propagate through the control plane as atomic version switches, so a given request evaluates against exactly one policy version end to end.
The forensic store is an append-only ledger backed by durable object storage. Each record carries its own HMAC-SHA256 signature computed at commit time over the canonicalized record body, so a single record can be verified independently without traversing the rest of the ledger. Writes are acknowledged synchronously before the gateway releases the response, which guarantees that the record persists even if the application-side response is lost in transit. Readers consume the store through a query API that supports time-window and actor filters and returns each record with its signature intact.
The control plane runs as a separate service with its own availability profile. A control-plane outage freezes policy versions at the last-known-good state on every gateway pod, which lets enforcement continue during a management-plane incident. Replay and rollback workflows resume when the control plane recovers, and every record committed during the incident remains independently verifiable through its per-record signature.
Routing, Failover, and Cost Telemetry
The same gateway that evaluates policy also selects the upstream model. Per-route configuration declares a primary provider, eligible failover providers, and tier-based routing rules that pick a cost-appropriate model for the inbound payload. The selection runs after policy admits the request, so cost optimization never overrides the constraint that policy permits a given destination for a given data class.
Health checks track latency, error rate, and rate-limit headroom per provider. When a provider degrades, the gateway shifts traffic to the next eligible provider in the route configuration without changing the calling code. Token counts and provider-reported cost land on the same forensic record as the policy decision, so cost telemetry and security telemetry come from one source of truth.