Can one process implement both the gateway and the router?

Yes, and many production deployments do. The architectural split holds inside a single process: the request must pass the policy decision before the routing choice, and the audit record must commit before the router acts. When one process handles both, the internal control-flow order matters more than the process boundary.

Which layer handles rate limiting?

Rate limiting is usually enforced at the gateway when the limit is per-user, per-role, or per-tenant, because those attributes come from the verified identity. Rate limiting at the router usually enforces per-upstream-endpoint limits (respecting provider quotas) or per-model latency budgets. Deployments often need both: identity-aware limits at the gateway to enforce policy quotas and per-endpoint limits at the router to respect provider constraints.

Does the router see the prompt content?

Usually yes, because routing decisions can depend on prompt characteristics (token count, embedded tool calls, requested output format). The router should not classify the prompt for policy purposes; that work belongs to the gateway. If the router needs to inspect prompt content for routing (for example, to route long-context prompts to a long-context model), the inspection is metadata-level, not policy-level.

What about response-side controls?

Response-side controls (content filtering, PII redaction on outputs, transparency-marker attachment) sit in the gateway, not the router. The gateway evaluates the response against the same policy that authorized the request and can rewrite, redact, or block the response before it returns to the caller. The router's role in the response path is transport (return the upstream response); the policy decision on the response is the gateway's.

How does this map to Kong or LiteLLM?

Kong ships as an API gateway with routing, rate limiting, and (via plugins) policy evaluation. LiteLLM ships as a unified LLM proxy with routing, cost tracking, and (via LiteLLM Proxy) some access-control features. Both bundle gateway and router responsibilities. The question for an architecture review is where identity resolution happens, where policy evaluation happens, and whether the audit-write path is isolated from the operational log path. If the bundle's policy plugin runs after the routing choice, the bundle is behaving as a router with an attached policy checker, not a gateway.

Does this split apply to open-source projects like [OpenRouter](https://openrouter.ai/) or [LangChain routing](https://python.langchain.com/docs/expression_language/how_to/routing/)?

OpenRouter and LangChain routing utilities sit at the router layer. They choose which model to call. They do not implement identity-aware policy enforcement or produce compliance-grade audit records. Deployments that use OpenRouter or LangChain routing still need a separate gateway layer in front of them for policy enforcement and audit.

LLM gateway vs LLM router: what each component does and why the enforcement layer sits in only one of them

Ask three architecture teams to describe their LLM gateway and you get three descriptions of an LLM router. The two components sit at different layers, produce different audit records, and fail for different reasons. Confusing them under a single "gateway" label produces predictable audit gaps: identity context missing from routing decisions, policy state missing from the audit record, and no per-request evidence of which model actually served which prompt.

The gateway is the enforcement layer. The router is the traffic-shaping layer. They can share a process boundary. They do not share responsibilities.

I want to walk through what each component does at the request layer, the fields each one records, and why the decision to permit or deny a request must sit in the gateway rather than the router.

The gateway

The LLM gateway is the identity-aware policy enforcement point between an authenticated user or agent and any LLM. It is model-agnostic. It sits at the AI request boundary and produces a per-decision audit record for every request that passes through it.

What the gateway evaluates

Per request, the gateway resolves the caller identity from the incoming credential (OAuth token, mTLS certificate, or agent identity token), attaches the role and authorization context, classifies the prompt against a data sensitivity taxonomy, evaluates the request against per-route and per-role policies, and produces the permit, redact, or deny decision. Only after the permit decision does the request move to the routing layer.

What the gateway records

The per-decision audit record contains the verified identity, the role in effect, the data classification applied to the prompt, the policy version, the decision outcome, the timestamp, and a cryptographic signature. This record is committed before the request reaches the router or the model. The application that made the request has no write access to the audit record.

The router

The LLM router is the traffic-shaping component that decides which model, endpoint, or provider handles a permitted request. It operates after the gateway's policy decision.

What the router evaluates

Per request, the router evaluates the routing rules against the request metadata: the model family requested, the workload class, the latency budget, the cost budget, the current provider health, the fallback chain, and any tenant-level provider allowlist. The router returns the chosen upstream endpoint.

What the router records

The router's log records the upstream endpoint choice, the reason for the choice (primary, fallback, health-driven), the upstream latency, and the retry state if any. This record is operational, not compliance evidence. It answers "which model served this request" without recording who asked or what the policy permitted.

Why the enforcement layer sits in the gateway

Three architectural reasons place the enforcement decision at the gateway rather than the router.

Identity context must precede routing

The routing decision often depends on the tenant, the user's role, or the workload class. The router that reads those fields from the request metadata trusts the caller to be honest about them. A gateway that resolves identity from the credential and attaches the verified role prevents the router from routing based on caller-asserted metadata. The order of operations matters: verify identity, then route.

The audit record must be produced before the router acts

If the router acts first, the audit record shows only what happened after the routing choice. A denied request never produced an audit record because it never reached the router. A permitted request produces an audit record after the routing side effect. The compliance question "was this request permitted under the policy in effect" has no evidence in the router's log because the router did not make that determination.

Fail-closed behavior lives at the gateway

The default deny posture applies to the policy decision. A router that fails open on a policy ambiguity routes the request to a fallback model. A gateway that fails closed on a policy ambiguity blocks the request and produces an audit record showing the denial. The failure semantics differ; the compliance obligation aligns with the gateway's posture.

What the enforcement boundary permits

The gateway's permit decision constrains what the router is allowed to route to. The permit outcome does not just mean "let this through"; it includes the set of upstream endpoints the request is authorized to reach.

For example, a request from a healthcare tenant carrying a PHI classification receives a permit decision that authorizes routing to the HIPAA-BAA-covered upstream endpoints and forbids routing to any endpoint outside the BAA. The router reads the authorized-endpoint set from the permit decision and chooses within it. If the router's fallback chain contains an endpoint outside the authorized set, the router either skips the fallback or fails the request.

The pattern generalizes. The gateway's permit decision carries three fields: the classification tag, the authorized-endpoint set, and the policy version. The router respects all three.

Beyond the two-component split

Kong, LiteLLM, Portkey, and MLflow AI Gateway all ship as bundles that combine some gateway and some router functionality in a single process. The bundling is a packaging choice. The architectural split holds regardless of how the process boundary is drawn. When evaluating a bundle, the question is not whether the vendor calls it a gateway. The question is whether the request evaluation sequence resolves identity from the credential, applies policy against the verified identity, and produces the per-decision audit record before the routing choice.

Deployments that bundle gateway and router into a single process still need to isolate the audit-write path from the router's operational log path. Otherwise the router's log becomes the audit record, and the router's failure modes become the audit record's failure modes.

DeepInspect

This is the architecture DeepInspect was built to provide. DeepInspect sits at the AI request boundary as the identity-aware policy enforcement layer between authenticated users or agents and any LLM. Every request is evaluated against per-route, per-role policies using the identity context the credential carries. The permit decision carries the classification tag, the authorized-endpoint set, and the policy version, which the downstream routing layer respects.

Every decision produces a per-decision audit record with identity, role, policy version, data classification, decision outcome, and timestamp. The record is signed, tamper-evident, and committed on the gateway's write path before the request reaches the router. The routing layer's operational log records the upstream endpoint choice for cost and observability purposes; the audit record answers the compliance question independently.

Book a demo today.