How is Databricks AI Gateway different from DeepInspect?

Databricks AI Gateway is the Databricks-native control surface for LLM traffic inside Databricks Model Serving, with Unity Catalog-bound principals, per-principal rate limiting, AI guardrails, and payload archives in Delta tables. DeepInspect is an identity-bound policy enforcement layer at the HTTP request boundary that operates against any LLM endpoint, classifies prompt data, evaluates per-route policy bundles, and commits per-decision audit records formatted for EU AI Act Article 12 review. Databricks AI Gateway covers the Databricks-resident traffic. DeepInspect covers the workload's full AI traffic surface.

Can Databricks AI Gateway replace DeepInspect for a regulated workload?

For workloads that run entirely inside Databricks Model Serving and where the audit format the regulator accepts is the Databricks payload archive, possibly. For workloads that span Databricks endpoints and external SaaS LLMs, the payload archive captures only the Databricks-resident half of the traffic. The external SaaS LLM calls that go directly from the application to OpenAI, Anthropic, or another provider do not pass through Databricks AI Gateway and produce no payload archive record. The audit pipeline ends up with partial coverage, which the regulator under Article 12 or Fannie Mae LL-2026-04 will catch.

Can DeepInspect replace Databricks AI Gateway?

For deployments that already manage Databricks Model Serving traffic with Databricks-native controls and want a cross-endpoint regulatory audit layer above it, DeepInspect addresses Databricks model serving endpoints as one of the cleared upstreams. For deployments that want the Databricks-native attribution model, per-principal rate limiting on Databricks principals, and the Delta-table payload archive feature, DeepInspect does not replace Databricks AI Gateway. The two compose.

How does the deployment topology work when both are in production?

Application traffic addresses DeepInspect as the OpenAI-compatible endpoint. DeepInspect evaluates the policy and commits the audit record. DeepInspect forwards the cleared request to the upstream endpoint, which may be a Databricks model serving endpoint or an external SaaS LLM. If the upstream is Databricks, Databricks AI Gateway captures the Databricks-side attribution and payload. Databricks-internal notebook traffic that addresses a Databricks model serving endpoint passes through Databricks AI Gateway directly, and the DeepInspect audit pipeline ingests the payload table rows for the cross-endpoint consolidated audit record.

What about the Databricks AI Guardrails feature versus DeepInspect's classification engine?

The Databricks AI Guardrails feature applies keyword filters and PII detection at the gateway layer. The PII detection uses the Databricks-managed model and operates on the prompt and the completion. DeepInspect's classification engine operates against a configurable set of regulated data types (PII, PHI, MNPI, source code, source-licensed content, regulated jurisdictional data), with the classification outcome attached to the audit record and the policy bundle making the pass-block-modify decision based on the classification and the identity context. The two can run together for layered controls. The Databricks Guardrails catch some content at the gateway layer; DeepInspect's classification and policy enforcement carry the regulatory audit obligation.

DeepInspect vs Databricks AI Gateway: Where the Mosaic Layer Stops and Regulatory Audit Starts

Databricks AI Gateway is the Databricks-native control surface for LLM traffic, shipped as part of Mosaic AI Gateway. It handles routing across Databricks Foundation Model APIs, external provider endpoints (OpenAI, Anthropic, Bedrock, Vertex), and customer-provisioned models. The gateway exposes usage attribution against Unity Catalog identities, rate limits per principal, AI guardrails for keyword and PII filtering, and payload tables that record requests and responses to Delta tables for offline review. DeepInspect sits at the HTTP request boundary outside Databricks and answers a different question. It enforces identity-bound policy on prompt content for any LLM endpoint the calling application addresses, classifies prompt data against the regulated data types the organization recognizes, and commits a per-decision audit record that a reviewer under EU AI Act Article 12 or a Fannie Mae LL-2026-04 lender record review accepts.

I want to walk through what Databricks AI Gateway does, what DeepInspect does, and where the responsibilities split for regulated workloads that span Databricks and non-Databricks endpoints.

TL;DR

Databricks AI Gateway is the LLM traffic control plane inside the Databricks lakehouse, with Unity Catalog-bound identities, per-principal rate limiting, AI guardrails, and Delta-table payload logging. DeepInspect enforces identity-bound policy on prompt content for any LLM endpoint and produces per-decision audit records formatted for regulatory review. Workloads that span Databricks endpoints and external SaaS LLMs run DeepInspect in front of the application and use Databricks AI Gateway for the Databricks-internal traffic, which preserves the Databricks-native attribution and adds the cross-endpoint regulatory audit layer.

Databricks AI Gateway: what it is and where it sits

Databricks AI Gateway runs as a layer inside Databricks Model Serving. The configuration applies to model serving endpoints, including the Foundation Model APIs (Pay-per-token endpoints for Llama, DBRX, Claude on Databricks, GTE embeddings, BGE embeddings, and others) and the external model endpoints that Databricks brokers (OpenAI, Anthropic, Cohere, Bedrock, Vertex, Azure OpenAI). The gateway sits between the calling Databricks workspace user or service principal and the model endpoint.

The feature set covers operational concerns inside the Databricks identity boundary. Rate limiting applies per Unity Catalog principal (workspace user, service principal). Usage tracking attributes token spend per principal and per endpoint, with the data landing in Unity Catalog system tables. The AI guardrails feature applies keyword filters, PII detection on the request and response, and topic filtering at the gateway layer. The payload table feature writes inbound prompts and outbound completions to Delta tables in Unity Catalog, where downstream Databricks notebooks consume the data for offline review.

The architectural sweet spot for Databricks AI Gateway is the Databricks-resident workload. A data science team running inference inside Databricks Model Serving gets attribution, rate limiting, and a payload archive on the Databricks-native operator surface. The control plane assumes the caller is a Databricks principal and the model endpoint is a Databricks model serving endpoint.

What DeepInspect is and where it sits

DeepInspect sits at the HTTP request boundary, addressable from any application that calls any LLM endpoint over HTTP. It evaluates identity-bound policy on every request, classifies prompt data against the regulated data types the organization recognizes, and commits a per-decision audit record with cryptographic integrity. The decisions are deterministic, fail-closed, and independent of the model's behavior.

The feature set covers identity attribution at the model API call from the application's identity primitive (the natural-person identity, the tenant, the role, the route context, not the API key alone), per-route policy enforcement for different application surfaces (the support route, the developer route, the legal route, the underwriting route), prompt-level data classification (PII, PHI, MNPI, source code, source-licensed content, regulated jurisdictional data), policy decisions that pass, block, or modify the request, and the per-decision audit record format that downstream audit pipelines consume.

The architectural sweet spot for DeepInspect is the regulated workload that spans LLM endpoints. An organization addressing Databricks model endpoints from some applications and external SaaS LLMs from other applications needs a single audit record format across both. The deployer obligations under Article 26, the audit obligations under Article 12, the lender record obligations under Fannie Mae LL-2026-04, and the sector regimes (HIPAA, DORA, FedRAMP, ISO 42001) apply to the prompt and the decision regardless of which endpoint served the inference.

Where the two products overlap

Both products produce records of LLM requests. Both products attach identity metadata. Both products can apply content filters at the request boundary. The overlap is at the surface level. The underlying scope and audit format differ.

Databricks AI Gateway's identity context is the Unity Catalog principal: the workspace user or service principal that called the model serving endpoint. The payload table captures the inbound prompt and the outbound completion, with the principal and the endpoint attached. The AI guardrails apply keyword and PII filters at the gateway.

DeepInspect's identity context is the natural-person identity attached at the application's identity primitive, with the tenant, the role, the route, and the policy version. The audit record carries the policy decision, the data classification outcome, the policy version active at the time, and the cryptographic integrity signature. The audit format is what a regulator reviewing the deployment under Article 12 expects to see.

Both products produce records. Only one of them produces a record format that survives the regulatory review without translation.

Feature comparison

| Feature | Databricks AI Gateway | DeepInspect | |---|---|---| | HTTP proxy for LLM traffic | Inside Databricks Model Serving | Standalone, in front of any endpoint | | Multi-provider routing | Yes, via external model endpoints | Forwards to a configured upstream | | Usage attribution | Unity Catalog principal | Natural-person identity from IdP | | Per-principal rate limiting | Yes | Out of scope | | Payload archive | Delta tables in Unity Catalog | Per-decision audit record | | AI guardrails (keyword, PII) | Yes (gateway-level) | Yes, plus classification engine | | Per-route policy bundle | Endpoint-level | Yes, per-route policy bundle | | Identity attribution at the model API call | Databricks principal | Natural-person from IdP | | Per-decision audit record format | Payload table row | Cryptographically signed audit record | | Article 12 audit format | Payload archive plus translation | Native format | | Fannie Mae LL-2026-04 lender record format | Payload archive plus translation | Native format | | Cross-endpoint coverage (Databricks + external SaaS) | Databricks-bound | Any HTTP LLM endpoint |

Pick Databricks AI Gateway if

Pick Databricks AI Gateway if the LLM workload runs primarily inside Databricks Model Serving, the callers are Databricks principals, and the attribution model fits the Unity Catalog identity boundary. The Databricks-native operator surface gives the data science team a single console for model traffic, usage attribution, and payload archives.

Pick Databricks AI Gateway if the team's lakehouse practice already uses Unity Catalog as the identity and access boundary and the AI guardrails (keyword filters, PII detection) cover the use case at the depth the team needs.

Pick DeepInspect if

Pick DeepInspect if the AI workload spans Databricks endpoints and external SaaS LLMs (OpenAI direct, Anthropic direct, Azure OpenAI from a non-Databricks application, vendor-embedded AI) and the audit pipeline needs a single record format across all of them. The cross-endpoint coverage is the architectural distinction.

Pick DeepInspect if the workload is subject to EU AI Act Article 12, Fannie Mae LL-2026-04, HIPAA, DORA, FedRAMP, ISO 42001, or any sector regime that requires identity-bound per-decision audit records. DeepInspect produces the record format that the regulator accepts.

Pick both if the deployment needs Databricks-native lakehouse integration and cross-endpoint regulatory audit. The composition pattern works in production today.

Composition pattern in production

The deployment topology that runs in production combines the two layers based on which application surface owns the call. For application traffic, the application points its OpenAI-compatible SDK at DeepInspect. DeepInspect verifies the caller's identity from the application's identity primitive, applies the data classification rules, evaluates the policy bundle for the route, commits the per-decision audit record, and forwards the cleared request to the upstream endpoint. If the upstream is a Databricks model serving endpoint, the request transits Databricks AI Gateway, which applies the Databricks-native rate limit and payload archive. If the upstream is an external SaaS LLM, the request goes directly to the provider.

For Databricks notebook traffic that calls a Databricks model serving endpoint from within the workspace, Databricks AI Gateway captures the principal, the payload, and the usage attribution. The DeepInspect audit pipeline ingests the payload table rows for the cross-endpoint audit record consolidation, applying the policy version and the data classification metadata in the lakehouse layer.

The audit pipeline carries the natural-person identity (or the Databricks principal for notebook traffic), the route, the policy version, the data classification outcome, the policy decision outcome, the upstream endpoint that served the request, and the integrity signature.

Pricing approach

Databricks AI Gateway is included in the Databricks Model Serving offering. Pricing follows the Databricks consumption model for serving endpoints, payload table storage, and Unity Catalog system table usage. The Databricks team publishes the per-endpoint and per-token rates separately.

DeepInspect's pricing is communicated through sales conversations and depends on the deployment regime, the workload volume, and the audit-record retention requirements. The cost is meaningfully lower than the cost of an audit miss under EU AI Act Article 12, Fannie Mae LL-2026-04, or a sector regime.

DeepInspect

DeepInspect sits between calling applications and any LLM endpoint over HTTP. It evaluates identity-bound policy on every request, classifies prompt data against the regulated data types the organization recognizes, commits per-decision audit records with cryptographic integrity, and produces the record format that EU AI Act Article 12 and Fannie Mae LL-2026-04 reviewers accept. The architecture composes with Databricks AI Gateway by addressing Databricks model serving endpoints as one of the cleared upstreams, which preserves Databricks AI Gateway's lakehouse-native attribution and adds the cross-endpoint regulatory audit layer.

The composition gives organizations the Databricks-native attribution and payload archive they want from Mosaic AI Gateway and the per-decision audit records they need across Databricks endpoints and external SaaS LLMs. The audit pipeline consumes one record format regardless of which upstream served any given request, which keeps the regulatory review tractable across a mixed deployment.

If you are running Databricks AI Gateway today and the EU AI Act August 2 deadline applies to a workload that spans Databricks and external endpoints, let's talk.