Can a single product be both API gateway and AI gateway?

A product can carry both functions if the implementation cleanly separates the two decision layers. The risk is conflation: an API gateway with prompt-inspection plug-ins added afterward tends to record the prompt decision as an annotation on the access log rather than as an independent record with its own integrity controls. The cleanest pattern is to run the AI gateway as a distinct concern from the API gateway, with the AI gateway's records held in a separate, append-only log store that survives the API gateway's rotation policies. Operationally, the two functions can share infrastructure. Architecturally, they need separate decision and record layers.

Does the AI gateway support gRPC or only HTTP?

The DeepInspect implementation targets HTTP-based LLM APIs. The major model providers (OpenAI, Anthropic, Bedrock, Vertex, Azure OpenAI) ship HTTPS APIs, and self-hosted LLMs (vLLM, TGI, llama.cpp servers) usually expose an HTTP-compatible interface. gRPC support is feasible at the protocol level but is not the primary deployment pattern in production today. Where a deployment requires gRPC, the AI gateway can run as a sidecar or as a downstream of a gRPC-to-HTTP shim.

What happens if the prompt classifier fails to identify PII in the request?

Two configurations are possible. Fail-open: the gateway forwards the request and records the classification as low-confidence in the audit record. Fail-closed: the gateway denies the request and records the reason. The default for regulated deployments is fail-closed, with a soft-policy mode that records the call as redacted under the conservative-fallback rule for prompts the classifier could not classify. The audit record captures which mode was active and what the classifier returned. The architectural property is that policy can change without changing the application, and the records survive policy changes by carrying the policy version at the moment of decision.

Where do retries and circuit breakers belong, API gateway or AI gateway?

Retries and circuit breakers are a service-resilience concern and belong at the gateway closest to the call's failure mode. Network-level retries (connection timeouts, 5xx from the upstream service) belong at the API gateway. AI-traffic-specific retries (rate-limit responses from the model provider, policy redactions that the application can retry after redaction) belong at the AI gateway. Circuit-breaker policies that say "if the policy classifier is down, fail-closed on PII-tagged routes" belong at the AI gateway because the decision depends on the policy state.

AI Gateway vs API Gateway: What Changes When the Payload Is a Prompt

An API gateway sits between clients and services and decides whether the HTTP call should pass, where it should go, and at what rate. An AI gateway sits in the same network position and decides whether the call should pass given who is asking, what data the prompt contains, and what policy applies to that combination. The API gateway treats the body as a payload to forward. The AI gateway treats the body as the substance of the policy decision. Most enterprise teams discover this distinction the first time a compliance review asks "what did the model see at 11:47 last Tuesday" and the API gateway's access log answers the question one layer too high.

I want to walk through where the two control planes overlap, where the AI gateway adds a decision layer the API gateway cannot reach, and where the two should sit together in production.

The API gateway

An API gateway is the reverse proxy that normalizes ingress for service traffic. The canonical responsibilities are authentication, authorization, rate limiting, routing, observability, and protocol translation. Products in this category include Kong, AWS API Gateway, Apigee, Tyk, and the open-source Envoy ingress patterns.

The decisions the API gateway makes are typically header-based. The auth token in the Authorization header. The API key in the X-API-Key header. The route in the URL path. The rate limit accumulator keyed on the auth subject or the API key. The body of the request is forwarded as-is to the upstream service. The gateway does not look inside.

What the API gateway optimizes for

The API gateway optimizes for service-level concerns: keeping bad clients out, distributing load, providing a single observability surface for east-west traffic. It is service-side glue. The model is "the upstream service knows what to do with this payload; my job is to deliver it under the right preconditions."

That model holds when the payload semantics live in the upstream service. It breaks when the payload contains data the policy decision depends on, which is exactly the case for AI traffic.

The AI gateway

An AI gateway runs in the same network position as the API gateway and makes a decision the API gateway is structurally unable to make. The decision uses three inputs: the verified identity of the natural person and the agent, the classification of the data inside the prompt, and the policy in effect at the moment for that identity, that data class, and that target model.

The three inputs the API gateway does not have

Identity context for AI traffic is rarely the upstream service identity. The service calls OpenAI with a static API key the service holds. The natural person on whose behalf the service is acting is invisible at the API key layer. The AI gateway needs identity context to be supplied (Pillar 1 of the NIST AI agent identity and authorization framework) and enforces against it (Pillars 2 and 3).

Data classification at the prompt level is invisible to the API gateway. The gateway forwards the body. The classification of the body, what kind of data it contains, what protected attributes appear, what risk class applies, has to be evaluated at the AI gateway layer.

Policy versioning at the moment of decision is the third missing dimension. The API gateway logs the auth subject and the upstream response status. It does not record the policy version that governed the decision because the policy lives in the upstream service.

The three artifacts the AI gateway produces

The AI gateway produces a decision (permit, redact, deny), an enforcement effect on the model traffic, and a per-decision audit record. The records are signed and tamper-evident. They are committed before the model response returns to the application. They satisfy EU AI Act Article 12 and Article 19 at the granularity those provisions require.

Feature comparison

The two control planes occupy the same network position and answer different questions.

| Capability | API gateway | AI gateway | |---|---|---| | Auth at the request boundary | Yes, header-based | Yes, identity-context-based | | Rate limiting by API key | Yes | Yes, plus per-identity, per-role, per-data-class | | Routing across upstream services | Yes | Yes, with policy on data class | | Prompt-content classification | No | Yes | | Identity-bound policy on payload | No | Yes | | Per-decision audit record | Access log only | Yes, tamper-evident, signed | | Policy versioning at decision moment | No | Yes | | PII redaction in the request body | No | Yes | | Fail-closed on prompt classifier failure | No | Configurable | | Compatibility with EU AI Act Article 12 records | No | Yes |

The API gateway's job is to keep the wrong clients out and to deliver the right requests to the right services. The AI gateway's job is to ensure the model never receives a request the policy does not permit, and to leave behind a record that proves it.

Pick an API gateway if

The deployment ships with three properties. The payload semantics live in the upstream service, not the gateway. The auth model is service-to-service or client-to-service with header-based tokens. The observability story is request-level (latency, error rates, throughput) rather than decision-level.

In that profile, the API gateway is the right tool. It handles the ingress concerns at the right granularity and gets out of the way of the upstream service.

Pick an AI gateway if

The deployment ships with three different properties. The payload contains data the policy decision depends on. Different callers carry different authorization to the same model based on role, data classification, or per-route policy. The audit obligation needs to be admissible without depending on the application.

In that profile, the AI gateway is the control point. The API gateway, if also present, sits in front and handles the service-level ingress; the AI gateway sits between the API gateway and the LLM providers and handles the prompt-level decision.

Stack them together

The two can run in series in production. The API gateway terminates the inbound HTTPS, validates the client auth, applies rate limiting at the service level, and routes the request to the AI gateway. The AI gateway extracts identity context from the request (typically a delegated-token header the upstream service attaches), classifies the prompt, applies the policy, records the decision, and forwards the permitted call to the upstream model.

The two control planes carry their own responsibilities. The API gateway handles client-to-service ingress. The AI gateway handles service-to-LLM egress with policy. The audit record at the AI gateway is the artifact that survives compliance review, not the access log at the API gateway.

DeepInspect

This is the architecture DeepInspect was built to provide. DeepInspect sits at the AI request boundary as a stateless proxy between any application and any LLM. The deployment pattern most common in production has DeepInspect running downstream of an existing API gateway, with the upstream application supplying identity context via signed headers. Every request DeepInspect processes is evaluated against the identity, the data classification of the prompt, and the policy in effect.

Every decision produces a per-decision audit record containing identity, role, policy version, data sensitivity, decision outcome, and timestamp. The record is signed and tamper-evident. The record is committed before the model response returns to the application. The API gateway's access log records the service-to-AI-gateway hop. The AI gateway's per-decision record describes what actually happened to the prompt.

Book a demo today.

The application is responsible for attaching the natural-person identity to the request. This is Pillar 1 of the NIST AI agent identity and authorization framework. The pattern is a signed delegation token: the application identifies the natural person via its own session, fetches a short-lived token from the identity provider that binds the natural-person identity to the application's outbound call, and includes the token as a header when calling the AI gateway. The AI gateway verifies the token signature, extracts the identity context, and uses it in the policy decision. If the upstream application does not attach identity, the gateway records the call as service-account-attributed, which is a known shortcoming the deployer can address at the application layer over time.