← Blog

Databricks AI Gateway Alternatives: When the Mosaic Layer Does Not Cover the Workload

Databricks AI Gateway, part of Mosaic AI Gateway, is the Databricks-native control surface for LLM traffic inside Databricks Model Serving. Teams whose AI workload spans Databricks endpoints and external SaaS LLMs (or who run inference outside Databricks entirely) pick a different layer. This piece walks through the credible Databricks AI Gateway alternatives across four use cases: open-source operational gateway, hosted multi-provider routing, application-side observability, and identity-bound enforcement for regulated workloads. Each option is evaluated against what Databricks AI Gateway covers and where the alternative fits better.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Comparisons & Alternativesdatabricks-ai-gatewaymosaic-ai-gatewayalternativescomparisoninline-enforcementeu-ai-act
Databricks AI Gateway Alternatives: When the Mosaic Layer Does Not Cover the Workload

Databricks AI Gateway, part of Mosaic AI Gateway, is the Databricks-native control surface for LLM traffic inside Databricks Model Serving. It attributes usage to Unity Catalog principals, applies AI guardrails (keyword filters, PII detection), writes payload tables to Delta tables in Unity Catalog, and routes across Databricks Foundation Model APIs and external model endpoints that Databricks brokers (OpenAI, Anthropic, Bedrock, Cohere, Vertex, Azure OpenAI). The product fits Databricks-resident workloads where the lakehouse identity boundary and the Databricks-native operator surface match the team's workflow. Teams whose AI workload spans Databricks endpoints and external SaaS LLMs, or who run inference outside Databricks entirely, or who are subject to EU AI Act Article 12 or sector audit requirements that the Databricks payload archive falls short of, pick a different layer. I want to walk through the credible Databricks AI Gateway alternatives, by use case, and where each one fits.

TL;DR

Databricks AI Gateway covers the Databricks-resident LLM traffic use case with Unity Catalog-bound identities, per-principal rate limiting, AI guardrails, and Delta-table payload archives. Alternatives by use case: Kong AI Gateway or LiteLLM for an open-source operational gateway outside Databricks, Portkey for a hosted multi-provider gateway with observability, Helicone or Langfuse for application-side observability, and DeepInspect for identity-bound policy enforcement and per-decision audit records that span Databricks and non-Databricks endpoints under Fannie Mae LL-2026-04 and EU AI Act Article 12.

Use case 1: open-source operational gateway outside Databricks

Teams whose LLM traffic does not run primarily inside Databricks Model Serving pick an operational gateway that does not depend on the Databricks runtime.

Kong AI Gateway

Kong AI Gateway is the AI-focused plugin family on the Kong data plane: multi-provider LLM routing via the AI Proxy plugin, semantic caching, prompt templates, prompt guards (regex allow and deny lists), and per-consumer token attribution. The product fits teams already running Kong as their HTTP data plane that want AI plugins on the same operator surface.

LiteLLM

LiteLLM is an open-source LLM proxy with an OpenAI-compatible API surface across 100+ providers. The proxy handles routing, retries, fallbacks, virtual keys with per-team budgets, and rate limits. Self-hosted deployment runs as a Python process. Teams that want an SDK-compatible multi-provider proxy without operating Kong pick LiteLLM.

The architectural distinction versus Databricks AI Gateway is the identity boundary. Both Kong AI Gateway and LiteLLM are agnostic to the identity model and the runtime substrate; they run in front of any LLM endpoint. Databricks AI Gateway assumes the caller is a Unity Catalog principal and the endpoint is a Databricks model serving endpoint.

Use case 2: hosted multi-provider gateway with observability built in

Teams that want a hosted (or self-hosted enterprise) LLM gateway with multi-provider routing plus observability on the same control plane pick a closed-source platform.

Portkey

Portkey is an LLM gateway and observability platform with routing across 200+ providers, retries, fallbacks, conditional routing, caching, load balancing, cost tracking, traces, evaluations, prompt management, and guardrails. The hosted tier covers small and medium deployments; the enterprise tier supports self-hosted deployment.

The architectural distinction versus Databricks AI Gateway is the runtime model. Portkey runs as a hosted gateway (or self-hosted enterprise) that addresses any provider; Databricks AI Gateway runs inside Databricks Model Serving and addresses Databricks-brokered endpoints. Teams whose AI workload spans many providers and many calling applications outside the lakehouse pick Portkey.

Use case 3: application-side observability

Teams that want trace and span visibility into LLM application behavior, prompt experimentation, and evaluation pipelines pick an observability-first product.

Helicone

Helicone is an open-source LLM observability platform with an async proxy and a self-hosted gateway. The dashboard exposes captured calls by user, model, route, custom property, latency, and cost. Caching, rate limiting, retries, and fallbacks ship as observability-adjacent features.

Langfuse

Langfuse is an open-source LLM observability platform that captures traces via in-process SDKs. The trace captures multi-step spans, prompt template versions, evaluation scores, and user feedback. The platform supports prompt experimentation workflows and side-by-side completion comparison.

The architectural distinction versus Databricks AI Gateway is the trace model. Helicone and Langfuse capture application-side traces with custom property metadata that the application threads through. Databricks AI Gateway writes payload tables tied to the Databricks request identifier and the Unity Catalog principal. Teams whose AI engineering team needs offline review of LLM behavior outside the lakehouse pick Helicone or Langfuse.

Use case 4: identity-bound enforcement and regulatory audit records

Teams subject to EU AI Act Article 12, Fannie Mae LL-2026-04, HIPAA, DORA, FedRAMP, ISO 42001, or any sector regime that requires identity-bound per-decision audit records pick an enforcement-first product.

DeepInspect

DeepInspect sits at the HTTP request boundary as a separate enforcement layer. It evaluates identity-bound policy on every request, classifies prompt data against the regulated data types the organization recognizes, and commits a per-decision audit record with cryptographic integrity. The decisions are deterministic, fail-closed, and independent of the model's behavior.

The architectural distinction versus Databricks AI Gateway is the audit format and the cross-endpoint scope. Databricks AI Gateway's payload archive captures the Databricks-resident half of the workload. DeepInspect's per-decision audit records cover the full AI traffic surface across Databricks endpoints and external SaaS LLMs, with one record format that the regulator under Article 12 accepts. The record carries the natural-person identity (from the application's identity primitive, not the Databricks principal alone), the policy version active at decision time, the data classification outcome, the policy decision outcome, and the cryptographic integrity signature.

DeepInspect composes with Databricks AI Gateway for workloads that span Databricks and non-Databricks endpoints. The composition pattern: application traffic addresses DeepInspect, which evaluates the policy and commits the audit record before forwarding to the upstream (Databricks endpoint or external SaaS LLM); Databricks-internal notebook traffic still passes through Databricks AI Gateway directly, with the DeepInspect audit pipeline ingesting the payload table rows for the consolidated cross-endpoint audit record.

Picking between the alternatives

The right alternative depends on what the team needs from the LLM traffic layer outside the Databricks runtime.

  • Open-source operational gateway outside Databricks: Kong AI Gateway (on Kong data plane) or LiteLLM (Python proxy).
  • Hosted multi-provider gateway: Portkey.
  • Application-side observability: Helicone (proxy) or Langfuse (SDK).
  • Identity-bound enforcement and regulatory audit records: DeepInspect.
  • Cross-endpoint workload (Databricks plus external SaaS LLMs) with regulatory audit: DeepInspect plus Databricks AI Gateway (composed).

Most production deployments for AI workloads that touch Databricks end up with two layers: the Databricks-native control plane for the lakehouse-resident half and a cross-endpoint audit layer for the full traffic surface. The two compose because the lakehouse-internal attribution and the cross-endpoint regulatory audit obligation are different responsibilities.

DeepInspect

DeepInspect sits between calling applications and any LLM endpoint over HTTP. It evaluates identity-bound policy on every request, classifies prompt data against the regulated data types the organization recognizes, commits per-decision audit records with cryptographic integrity, and produces the record format that EU AI Act Article 12 and Fannie Mae LL-2026-04 reviewers accept. The architecture composes with Databricks AI Gateway by addressing Databricks model serving endpoints as one of the cleared upstreams for application traffic, while Databricks AI Gateway covers the lakehouse-internal notebook traffic.

The composition gives organizations the Databricks-native attribution and payload archive they want from Mosaic AI Gateway and the per-decision audit records they need across Databricks endpoints and external SaaS LLMs. The audit pipeline consumes one record format regardless of which upstream served any given request, which keeps the regulatory review tractable across a mixed deployment.

If you are running Databricks AI Gateway today and the EU AI Act August 2 deadline applies to a workload that spans Databricks and external endpoints, let's talk.

Frequently asked questions

What is the closest alternative to Databricks AI Gateway for non-Databricks workloads?

For the operational gateway use case alone, Kong AI Gateway and LiteLLM are the closest open-source alternatives. Both run in front of any LLM endpoint and do not assume a lakehouse runtime. For the payload archive use case, Helicone and Langfuse capture application-side traces that serve a similar offline-review purpose.

Can Databricks AI Gateway cover a workload that spans Databricks and external SaaS LLMs?

For the Databricks-resident half of the traffic, yes. The payload archive captures the calls that pass through Databricks Model Serving. For the external SaaS LLM calls that go directly from the application to OpenAI, Anthropic, or another provider without transiting Databricks, the payload archive does not capture them. The cross-endpoint audit pipeline ends up with partial coverage, which the regulator under EU AI Act Article 12 or Fannie Mae LL-2026-04 will catch.

Can I run Databricks AI Gateway and DeepInspect together?

Yes. The composition pattern is DeepInspect at the request boundary for application traffic (handling identity-bound policy, classification, and the per-decision audit record across Databricks and non-Databricks upstreams), and Databricks AI Gateway for the lakehouse-internal notebook traffic that addresses Databricks model serving endpoints directly. The DeepInspect audit pipeline ingests the Databricks payload table rows for the consolidated cross-endpoint audit record.

When does the Databricks AI Gateway use case stop covering the workload?

When the workload's calling applications run outside Databricks and address external SaaS LLMs directly, the payload archive does not capture the cross-endpoint traffic. When the workload is subject to EU AI Act Article 12, Fannie Mae LL-2026-04, HIPAA, DORA, FedRAMP, ISO 42001, or any sector regime that requires identity-bound per-decision audit records carrying the natural-person identity (not the Databricks principal alone), the payload archive falls short of the format the regulator applies.

What about Databricks AI Guardrails versus DeepInspect's classification engine?

Databricks AI Guardrails apply keyword filters and PII detection at the Databricks gateway. The PII detection uses Databricks-managed models. DeepInspect's classification engine operates against a configurable set of regulated data types (PII, PHI, MNPI, source code, source-licensed content, regulated jurisdictional data), with the classification outcome attached to the per-decision audit record and the policy bundle making the pass-block-modify decision based on classification, identity, and route. The two can run together for layered controls: the Databricks Guardrails catch some content at the lakehouse gateway, and DeepInspect's classification and policy enforcement carry the regulatory audit obligation across the full traffic surface.