← Blog

AI Gateway vs LLM Router: The Architectural Distinction That Matters for Enforcement

An LLM router picks the cheapest or fastest model for a given prompt. An AI gateway evaluates whether the request is permitted before any model receives it. The router optimizes cost and latency. The gateway enforces identity-bound policy and produces a per-decision audit record. This piece walks through the architectural distinction, where the two functions overlap, and why an enterprise running regulated workloads needs the gateway capability regardless of whether routing is in scope.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Platform & Architectureai-gatewayllm-routerengineeringarchitecturepolicy-enforcement
AI Gateway vs LLM Router: The Architectural Distinction That Matters for Enforcement

An LLM router and an AI gateway sound interchangeable in marketing copy. They are not. A router answers "which model should this request go to" by looking at cost, latency, and capability fit. A gateway answers "is this caller permitted to make this request at all" by looking at identity, data classification, and policy state. The router optimizes spend and tail latency. The gateway enforces deterministic policy and produces the audit record. Most production deployments need both, but they belong to different control planes and they fail under different conditions.

I want to walk through what each one actually does at the request layer, where the two overlap, and which capabilities a regulated workload cannot get from a router alone.

The LLM router

An LLM router intercepts an LLM call and decides which provider, model, or self-hosted endpoint handles it. The decision is usually driven by a small ruleset: prompt length, expected output tokens, latency budget, cost per call, or a quality score from a side-channel evaluator.

Open-source examples like LiteLLM, OpenRouter, and Portkey operate primarily at this layer. The unified API surface lets a single client library reach OpenAI, Anthropic, Bedrock, Vertex, or a self-hosted Llama deployment without changing call sites. The routing decision typically happens in a few hundred microseconds of policy evaluation plus the upstream call's full latency.

Routers are valuable. They consolidate provider credentials, normalize response formats, and turn provider switching from a multi-week refactor into a configuration change.

What the router does not decide

A router does not decide whether the caller is permitted to make the call. It does not decide whether the prompt contains data the caller is allowed to send to a model. It does not produce an admissible record of who made the decision under which policy. It optimizes the choice of model among the set of providers the calling service is already authorized to reach.

The AI gateway

An AI gateway sits in the same physical position as the router but operates at a different decision layer. The gateway evaluates each request against three sets of inputs before any model receives it. The verified identity of the natural person and the agent. The classification of the data inside the prompt. The policy in effect at the moment for that identity, that data class, and that target model.

The decision is binary at the policy level: permit, redact, or deny. If the request is denied, no model receives it. If the request is redacted, a modified version reaches the model. If the request is permitted, the gateway records the decision with full context before forwarding.

The gateway's three deterministic outputs

Per-request, the gateway produces three artifacts. A decision (permit, redact, deny) that the upstream caller observes. A per-decision audit record that captures identity, role, policy version, data classification, decision outcome, and timestamp. An enforcement effect on the model traffic that follows from the decision. The first two are independent of the model's behavior. The third is the actual control over what the model receives.

Feature comparison

The two control planes overlap on the surface and diverge underneath. The table below maps the capabilities most enterprise buyers ask about.

| Capability | LLM router | AI gateway | |---|---|---| | Provider abstraction | Yes, native | Yes, downstream of policy | | Cost-based model selection | Yes, native | Optional, lower priority | | Latency-based failover | Yes, native | Yes, with policy on data class | | Identity-bound policy | No | Yes, native | | Per-decision audit log | No | Yes, native | | Fail-closed under upstream outage | Configurable | Default | | PII detection and redaction | Plug-in, optional | Native | | Tamper-evident records | No | Yes | | Policy versioning at the decision moment | No | Yes | | Conformity with EU AI Act Article 12 | No | Yes |

The router's job ends at "the cheapest model that meets the latency budget answered the call." The gateway's job ends at "the request was either denied with a recorded reason, redacted under a recorded policy, or permitted with a recorded decision."

Pick a router if

The deployment ships with three properties. The data flowing through the LLM is non-sensitive or already-anonymized at the source. The audit obligation is satisfied at a higher layer (e.g. a separate service captures decisions for compliance reporting). The variable that matters most is per-token cost and time-to-first-token.

In that profile, the router earns its keep on the cost curve. A 30% reduction in average per-call cost on a high-volume workload pays for the operational overhead of running the router in production.

Pick a gateway if

The deployment ships with three different properties. The workload is in scope of EU AI Act Article 12, Fannie Mae LL-2026-04, HIPAA's security rule, or a comparable per-decision records mandate. Different callers carry different authorization to the same model based on role, data classification, or per-route policy. The audit record needs to be admissible without depending on the application that originated the call.

In that profile, the gateway is the control point. The cost-optimization story is secondary to the records and enforcement story.

The 22-second window argument

I walked through the Mandiant M-Trends 2026 finding on the speed of attacker handoff. The median time between initial access and handoff to a secondary threat group collapsed from over eight hours in 2022 to 22 seconds in 2025. At that tempo, asynchronous controls cannot prevent damage. A router that picks the cheapest model after the prompt is assembled records the choice in a billing log. The control that stops the request before it reaches the model is the gateway.

End-to-end enforcement overhead at the gateway measures under 50 ms in production tests at DeepInspect. LLM inference takes 500 ms to 5 seconds. The overhead is invisible relative to the model's response time.

DeepInspect

This is the architecture DeepInspect was built to provide. DeepInspect sits at the AI request boundary as a stateless proxy between any application and any LLM. Per-request, the gateway evaluates identity, data classification, and policy state. The decision is recorded with full context before the model receives the request. Routing decisions, when they apply, happen downstream of the policy decision.

The split is the point. The policy layer answers "permitted under which policy" and the routing layer answers "which model under what cost target." Conflating the two collapses the audit record into the routing log, which is exactly the suppression and selective-logging pattern the self-attestation argument was meant to defeat.

If you need both routing economics and gateway enforcement, the gateway runs in front of the router. The gateway decides whether the call is permitted. The router, if invoked, picks the model among the authorized set.

Book a demo today.

Frequently asked questions

Can the same component do both routing and gateway enforcement?

A single component can run both functions if the architecture cleanly separates the two decision layers. The risk is conflation: a router that adds policy plugins as an afterthought tends to produce a routing log with policy annotations rather than a policy-decision record. The cleanest pattern is to run the gateway as the inbound entry point and let the router operate downstream of the policy decision. Identity-bound rate limits, data-classification redaction, and per-decision records belong to the gateway. Cost-aware model selection belongs to the router. When the two stay in their lanes, the audit record retains its evidentiary value and the router retains its cost-optimization headroom.

Does a router satisfy EU AI Act Article 12?

A router does not produce the records Article 12 and Article 19 require. The router's per-request log records the model choice and the upstream response. The Article 12 obligation is broader. It needs the identity of the natural person, the input data leading to a match, the policy that governed the decision, and a tamper-evident record committed before the response returns. Router logs typically lack identity context (the upstream service identity is not the natural person) and lack policy state (the routing rule is not the deployment policy). A gateway that produces an admissible per-decision record satisfies the obligation. A router added to a gateway does not break the record. A router run without a gateway leaves the obligation unmet.

Where does LiteLLM sit in this taxonomy?

LiteLLM is primarily a router with a plugin surface that adds policy and observability. The project ships with auth, rate limiting, and budget controls inside the router process. The June 2026 CVE wave, including CVE-2026-12773, illustrates the architectural tension: when the router process holds long-lived provider credentials and exposes its own auth, the router itself becomes a high-value attack surface. The gateway pattern avoids that surface by binding every call to a verified identity and holding no long-lived provider keys at the policy layer. The two patterns are complementary in principle and frequently confused in practice.

How does the gateway handle multi-provider failover without becoming a router?

A gateway can implement failover without taking on routing-as-cost-optimization. The failover policy specifies a primary endpoint, a fallback endpoint, and the conditions that trigger the fallback. The policy is data-classification-aware: PII-tagged traffic might be permitted to fail over to a HIPAA-aligned endpoint but not to a public model, regardless of cost or latency. The same per-decision record captures which endpoint received the call and why. The gateway is doing routing in the limited sense of choosing among permitted endpoints. The router pattern, by contrast, optimizes across endpoints with no policy constraint.

What's the right place to put PII detection, router or gateway?

At the gateway. PII detection is a classification decision that feeds the policy decision. If detection runs after routing, the routing decision has already used the prompt content (length, embedding similarity, etc.) before the classification was applied. Classification at the gateway lets the policy decide whether to permit the prompt at all, redact specific fields, or deny the request, and the record captures the classification outcome at the same instant. Routers that ship PII plug-ins typically run the detection inside the routing process, which means the prompt has already reached the routing process's memory before the classification fires.