LiteLLM vs an AI Security Gateway: What Each One Does and Where They Compose
LiteLLM is an open-source LLM proxy that normalizes the API surface across more than 100 model providers and handles routing, retries, fallbacks, cost tracking, and basic key management. An AI security gateway sits at the same network position but answers a different question: identity-bound policy on prompt content, data classification at the request boundary, and a per-decision audit record that holds up under EU AI Act Article 12 review. The two products compose in production deployments. This piece walks through what each one does, where they overlap, and where the architectural responsibilities split.

LiteLLM is an open-source LLM proxy that normalizes the API surface across OpenAI, Anthropic, Bedrock, Azure OpenAI, Vertex, Cohere, Mistral, and more than a hundred other model providers. It exposes a single OpenAI-compatible API endpoint and translates the request to the upstream provider's native format. The feature set covers routing, retries, fallbacks, model-cost tracking, basic key management, and team-scoped budgets. An AI security gateway sits at the same network position but answers a different question: identity-bound policy on prompt content, data classification at the request boundary, and a per-decision audit record that holds up under EU AI Act Article 12 review. The two products are compatible and compose in production deployments. The architectural question is which layer owns which responsibility.
I want to walk through what LiteLLM does, what an AI security gateway does, where the responsibilities overlap, and how the two layers compose.
TL;DR
LiteLLM normalizes the LLM API surface and handles operational concerns: routing, retries, fallback, cost tracking, and team budgets. An AI security gateway enforces identity-bound policy on prompt content and produces per-decision audit records for regulatory review. Production deployments use LiteLLM for routing and an AI security gateway in front of LiteLLM (or in front of the upstream provider directly) for policy enforcement.
LiteLLM: what it is and where it sits
LiteLLM is an open-source Python package that ships an LLM proxy server. The proxy speaks the OpenAI API surface (Chat Completions, Embeddings, Images) and translates inbound requests to the native API of the upstream provider. The application code that uses the OpenAI SDK can point at LiteLLM and call Claude, Gemini, Llama, or any other supported provider without rewriting the integration.
The LiteLLM feature set covers operational concerns. Routing rules send requests to different providers based on the request attributes (cost-optimized routing, latency-optimized routing, regional routing). Retries and fallbacks handle provider failures (a 429 from OpenAI falls back to Anthropic). Cost tracking attributes spend to teams, projects, and users based on the calling key. Virtual keys let an administrator issue per-team API keys that LiteLLM authenticates against its own database. Team budgets and rate limits provide soft caps that LiteLLM enforces.
The architectural sweet spot for LiteLLM is the multi-provider operational layer. A team that wants the OpenAI SDK as its single integration surface, plus the ability to swap providers without code changes, plus per-team spend tracking, gets all of that from LiteLLM. The deployment is open-source self-hosted, which is attractive for engineering teams that prefer to own the operational substrate.
An AI security gateway: what it is and where it sits
An AI security gateway sits at the same network position as LiteLLM but answers a different question. It enforces identity-bound policy on the prompt content. It classifies prompt data against the regulated data types the organization recognizes. It commits a per-decision audit record with cryptographic integrity that an EU AI Act Article 12 reviewer or a Fannie Mae LL-2026-04 lender record reviewer will accept. The decisions are deterministic, fail-closed, and independent of the model's behavior.
The feature set covers identity attribution at the model API call (natural-person identity attached from the application's identity primitive, not the API key alone), per-route policy enforcement (different rules for the support route, the developer route, the legal route), prompt-level data classification (PII, PHI, MNPI, source code, source-licensed content), policy decisions that pass, block, or modify the request, and the per-decision audit record format that downstream audit systems consume.
The architectural sweet spot for an AI security gateway is the regulated workload. An organization that is the data controller for prompts crossing into a model provider needs evidence that satisfies the deployer obligations under Article 26, the audit obligations under Article 12, the lender record obligations under Fannie Mae LL-2026-04, and the sector-specific regimes (HIPAA, DORA, FedRAMP, ISO 42001) that the workload is subject to.
Where the two products overlap
Both products run as an HTTP proxy at the same network position. Both products can authenticate the caller, attach metadata to the request, and write a log of every request. The overlap is at the surface level. The underlying responsibilities differ.
LiteLLM's authentication is API-key based against its own database of virtual keys. The virtual key is associated with a team, a budget, and a rate limit. The log carries the virtual key and the request fingerprint. The log is structured and useful for operations.
An AI security gateway's authentication is identity-token based against the organization's identity provider. The identity carries the natural person, the tenant, the role, and the route context. The audit record carries the policy version, the data classification outcome, the policy decision outcome, and the cryptographic integrity signature. The audit record is structured to consume regulatory review.
Both products produce records of requests. Only one of them produces records of policy decisions with the metadata that regulatory review expects.
Where the responsibilities split
The clean split in production deployments is operational vs regulatory. LiteLLM owns the operational concerns: which provider serves which request, what the retry policy is, how much each team spent this month, what the per-team rate limit is. The AI security gateway owns the regulatory concerns: who is calling, what data is in the prompt, what policy evaluated the request, and what the per-decision audit record says.
The deployment topology that follows is either of two patterns. In the first pattern, the application calls the AI security gateway, the security gateway evaluates policy and forwards to LiteLLM, LiteLLM routes to the upstream provider. The security gateway sees the natural-person identity and the prompt content; LiteLLM sees the request after the security gateway has cleared it. In the second pattern, LiteLLM and the security gateway are co-deployed in front of the provider, with the security gateway evaluating policy on the request and LiteLLM handling the upstream routing on the cleared request. The first pattern is more common in production because it preserves the clean responsibility split.
Pick LiteLLM if
Pick LiteLLM if the primary need is multi-provider routing with operational features like retries, fallbacks, cost tracking, and team budgets, and the regulatory audit requirement is satisfied elsewhere (or the workload does not trigger a regulatory audit requirement). LiteLLM is the strongest open-source choice for the operational layer and the self-hosted deployment model suits teams that prefer to own the substrate.
Pick LiteLLM if the application surface needs to be the OpenAI SDK while the actual model traffic spans providers. The translation layer is the largest single benefit of LiteLLM for engineering teams that want a single integration surface across many providers.
Pick an AI security gateway if
Pick an AI security gateway if the workload is subject to EU AI Act Article 12, Fannie Mae LL-2026-04, HIPAA, DORA, FedRAMP, or any sector regime that requires identity-bound per-decision audit records. The security gateway produces the record format that the regulator accepts. LiteLLM's operational logs satisfy the existence requirement but fail the traceability requirement.
Pick an AI security gateway if the organization is the data controller for prompts that cross into model providers and the security team needs prompt-level data classification, identity attribution, and policy enforcement. The security gateway adds the regulatory layer that LiteLLM was not designed to provide.
Pick both if the deployment needs operational multi-provider routing and regulatory audit. The composition pattern works in production today.
Composition pattern in production
The deployment topology that runs in production combines the two layers. The application calls the AI security gateway (the addressable endpoint that the application points its OpenAI SDK at). The security gateway verifies the caller's identity from the application's identity primitive, applies the data classification rules, evaluates the policy bundle, commits the per-decision audit record, and forwards the cleared request to LiteLLM. LiteLLM routes the request to the upstream provider (OpenAI, Anthropic, Bedrock, etc.) based on its operational rules and returns the response. The security gateway commits the response handling decision and forwards to the application.
The audit record carries the natural-person identity, the route, the policy version, the data classification outcome, the policy decision outcome, the upstream provider that LiteLLM selected, the model and version that served the request, and the integrity signature. The operational log carries the LiteLLM routing decision, the retries, the fallback if any, the cost attribution, and the team budget consumption. The two layers compose without overlap.
Pricing approach
LiteLLM is open-source under the MIT license. Self-hosted deployment is free. The hosted LiteLLM proxy and the LiteLLM enterprise offering have their own pricing that the LiteLLM team publishes separately. The total cost depends on the hosting model and the support tier.
DeepInspect's pricing is communicated through sales conversations and depends on the deployment regime, the workload volume, and the audit-record retention requirements. The cost is meaningfully lower than the alternative of buying a regulatory audit miss in the EU AI Act, Fannie Mae, or HIPAA review.
DeepInspect
DeepInspect sits between the calling applications and any LLM endpoint over HTTP. It evaluates identity-bound policy on every request, classifies prompt data, commits per-decision audit records with cryptographic integrity, and produces the record format that EU AI Act Article 12 and Fannie Mae LL-2026-04 reviewers accept. The architecture composes with LiteLLM by sitting in front of it, which preserves LiteLLM's operational benefits while adding the regulatory layer that LiteLLM was not designed to provide.
The composition gives organizations the multi-provider routing they want from LiteLLM and the per-decision audit records they need for the workload to survive regulatory review. The audit pipeline consumes one record format regardless of which upstream provider LiteLLM selected for any given request, which keeps the regulatory review tractable.
If you are running LiteLLM in production and the security review is asking for identity-bound audit records, let's talk.
Frequently asked questions
- How is LiteLLM different from an AI security gateway?
LiteLLM normalizes the LLM API surface and handles operational concerns like routing, retries, fallbacks, cost tracking, and team budgets. An AI security gateway enforces identity-bound policy on the prompt content and produces per-decision audit records that satisfy regulatory review. The two products run at the same network position but answer different questions. LiteLLM's logs satisfy the existence requirement of an audit; an AI security gateway's audit records satisfy the traceability requirement that the EU AI Act Article 12 and the Fannie Mae LL-2026-04 review apply.
- Can LiteLLM replace an AI security gateway?
For unregulated workloads, possibly. LiteLLM's virtual keys, team budgets, and request logs cover the operational layer that a small team needs. For workloads subject to EU AI Act Article 12, Fannie Mae LL-2026-04, HIPAA, DORA, FedRAMP, or any sector regime that requires identity-bound per-decision audit records, LiteLLM's logs alone fall short of the record format that the regulator expects. The records lack natural-person identity attribution at the model API call, the policy version that evaluated the decision, the data classification outcome, and the cryptographic integrity signature that decouples the audit record from the application that took the action.
- Can an AI security gateway replace LiteLLM?
For deployments that already have multi-provider operational routing handled elsewhere, possibly. An AI security gateway typically supports forwarding to a configurable upstream endpoint, which means it can address a specific provider directly without needing LiteLLM as a translation layer. For deployments that want the operational features LiteLLM provides (cost tracking, team budgets, retries, fallbacks across many providers), the AI security gateway does not replace LiteLLM. The two layers compose, which is the common production pattern.
- How does the deployment topology work when both LiteLLM and an AI security gateway are in production?
The application calls the AI security gateway. The security gateway evaluates the policy and commits the audit record. The security gateway forwards the cleared request to LiteLLM. LiteLLM selects an upstream provider based on its routing rules and forwards the request. The response flows back through LiteLLM to the security gateway to the application. The security gateway sees the prompt content and the identity context. LiteLLM sees the request after the security gateway has cleared it. The audit record carries both the security gateway's policy outcome and the LiteLLM routing outcome so the operator can reconstruct the full request path.
- What about other multi-provider proxies (Portkey, Helicone, OpenRouter)?
The same composition pattern applies. Portkey, Helicone, and OpenRouter all sit at the multi-provider operational layer. Each one offers a variation on routing, caching, observability, and cost tracking. None of them produce the identity-bound per-decision audit record that the regulatory review expects. The architectural responsibility split is the same: the multi-provider proxy owns the operational layer and the AI security gateway owns the regulatory layer. The composition in production is identical regardless of which multi-provider proxy the team chose.