AI Gateway: The Architectural Component That Sits Between Calling Identities and LLM Endpoints
An AI gateway is the architectural component that sits between calling identities (users, agents, services) and LLM endpoints, terminates the AI provider TLS, evaluates identity-bound policy, applies a pass, redact, or block decision, commits a per-decision audit record, and forwards the request. The category covers four distinct shapes today: developer-tooling proxies, enterprise observability gateways, identity-aware enforcement gateways, and inference-side guardrails libraries. Only one of the four produces the audit record EU AI Act Article 12 reviewers accept.

An AI gateway is the architectural component that sits between calling identities (users, agents, services) and LLM endpoints. The gateway terminates the AI provider TLS at the gateway boundary, reads the request body, evaluates identity-bound policy against the prompt, applies a pass, redact, or block decision, commits a per-decision audit record, and forwards the request to the model. The term covers four distinct product shapes in the market today, and the audit record each shape produces differs in a way that matters when an EU AI Act Article 12 reviewer or a DORA Article 19 supervisor pulls the logs.
I want to walk through the four shapes the AI gateway category covers, the inspection target each shape actually sees, the audit record each shape commits, and the architectural pattern that satisfies the regulatory record-keeping mandate.
The four shapes that get called an AI gateway
The four product shapes share a similar label but address different operational problems.
Shape 1: developer-tooling proxies
Tools like Helicone, LangSmith, Langfuse, and similar developer-tooling proxies sit between application code and the LLM provider, capture prompts and responses for debugging and prompt-engineering workflows, provide latency and cost dashboards, and support prompt versioning and evaluations. The audience is the application developer. The audit value is operational telemetry: which prompts work, which fail, how much each call cost, and how the application's prompt corpus evolves.
These proxies are configured per-application or per-engineering-team. The identity context they capture is the API key or the application identifier. The proxy does not see the enterprise IdP identity behind the request, the data classification, or the policy state. The record is useful for development and incident debugging but does not satisfy the contemporaneous, identity-bound, classification-aware record the regulator expects.
Shape 2: enterprise observability gateways
Tools like Portkey, LiteLLM (deployed as a service), and the AI gateway features in API management platforms (Kong, Apigee, AWS API Gateway extensions) provide enterprise-grade routing, rate limiting, cost tracking, and a unified API across multiple LLM providers. The audience is the platform team. The audit value is operational governance: traffic shaping, rate limits, vendor failover, and cost attribution.
The observability gateway typically operates on the LLM API surface and may not see the calling user's enterprise identity unless explicit identity propagation is wired through the calling application. The record covers the API call but not the policy state or the per-prompt classification at decision time.
Shape 3: identity-aware enforcement gateways
The third shape is the architectural pattern that satisfies the record-keeping mandate. The enforcement gateway terminates the AI provider TLS, attaches the enterprise IdP identity to the request through header propagation or an SSO-aware proxy mode, evaluates per-route, per-role policies against the prompt classification, applies a pass, redact, or block decision, and commits the per-decision record with identity, classification, policy state, and decision outcome. The record is the artifact the regulator accepts.
DeepInspect sits in this shape. The audience is the CISO and the head of compliance. The audit value is regulatory: the records satisfy EU AI Act Article 12, Article 19, DORA Article 19, and adjacent contemporaneous-record obligations.
Shape 4: inference-side guardrails libraries
Llama Guard, NeMo Guardrails, AWS Bedrock Guardrails (the inference-side components), and similar libraries run inside the application or inside the inference path. They apply pattern-based or model-based filters to prompts and responses. The audience is the application engineer. The audit value is content safety at the model boundary, not policy enforcement at the request boundary.
The library is part of the same software that the application runs. The audit record the library produces is generated by the system whose behavior the regulator is auditing. The self-attestation problem applies. The library can complement an identity-aware enforcement gateway but does not replace it for the regulatory record.
The inspection target the term should refer to
In a regulated enterprise deployment, "AI gateway" should refer to the inspection layer that satisfies four operational requirements simultaneously.
Identity attached at the request layer
The enterprise IdP identity travels with each request. The gateway sees the user, the agent, and the role behind the call, not only the calling application's service credential.
Prompt-level classification at request time
The prompt is evaluated for sensitive categories (PII, PHI, financial data, source code, internal financial projections) at the moment of the call. The classification is committed to the audit record before the model receives the prompt.
Per-route, per-role policy enforcement
The gateway applies policies that depend on the route (which model, which endpoint, which task), the role (which user population is calling), and the data classification. The policy decides pass, redact, or block per request. The decision is contemporaneous.
Per-decision audit record
The audit record contains the identity, the role, the classification, the model and version called, the policy version, the decision outcome, and a cryptographic signature. The record is written by the gateway, independent of the application and independent of the LLM provider.
Compliance angle
The EU AI Act Article 12 record-keeping mandate, Article 19 log content requirements, and Article 26 deployer obligations all map to the identity-aware enforcement gateway pattern. The mandate takes effect August 2, 2026 for high-risk systems. DORA Article 19 imposes parallel obligations on financial entities. The Fannie Mae LL-2026-04 governance framework, which takes effect August 6, 2026, applies the same principles to mortgage origination. The architectural pattern that satisfies the August 2026 deadlines is the one that produces the record the regulator accepts.
DeepInspect
This is exactly what DeepInspect does. DeepInspect sits at the AI request boundary as an external enforcement gateway that operates as a stateless proxy between authenticated users or agents and any LLM endpoint. Every HTTP request is evaluated against per-route, per-role policies using identity context the calling application supplies. The per-decision audit record is committed by the proxy, independent of the application and independent of the LLM provider, before the model response returns.
The record contains a verified identity for the requester, the role and authorization context, the data classification applied to the prompt, the AI vendor and model actually called, the policy version that governed the decision, the decision outcome, and a cryptographic signature that prevents post-hoc modification. The enforcement is inline, fail-closed, deterministic, and operates with under 50 ms of overhead in internal testing, which keeps the gateway within the model inference budget.
Book a technical deep dive at deepinspect.ai.
Frequently asked questions
- Is an AI gateway the same as an API gateway?
An API gateway operates at the HTTP layer for any API. An AI gateway is an API gateway specialized for LLM provider endpoints, with prompt-aware policy evaluation, model and version routing, and per-decision audit records that capture prompt classification. The two can be the same product where the API gateway adds AI-specific features; they can be separate products where the API gateway handles the general API surface and the AI gateway handles the LLM-specific surface.
- Where does an AI gateway sit relative to the LLM provider?
The gateway sits on the HTTP path between the calling identity and the LLM provider endpoint. The application calls the gateway as if it were the LLM provider. The gateway terminates the provider TLS, evaluates policy, commits the audit record, and forwards the request to the actual provider. The provider responds to the gateway; the gateway commits the response record and returns it to the application.
- Does the gateway add latency?
The latency overhead depends on the policy complexity and the deployment topology. A gateway co-located with the calling application or deployed at the network edge typically adds under 50 ms in internal testing. The LLM inference itself usually takes 500 ms to 5 seconds. The gateway latency is a small fraction of the end-to-end response time.
- Can a model-side guardrails library replace an AI gateway?
The library and the gateway address different problems. The library applies content filters at the model boundary; the gateway applies identity-bound policy at the request boundary. A regulator asking for the audit record under Article 12 expects the gateway-style record with identity, classification, policy state, and decision outcome. The library's output is content safety, not the audit record. The two can coexist in the same deployment.
- What about gateways that the LLM provider runs?
LLM providers including OpenAI, Anthropic, AWS Bedrock, and Azure OpenAI provide their own admin and audit features. The features capture the provider-side view of the API call. The view does not include the enterprise IdP identity unless explicit identity propagation is configured, and the audit record is generated by the provider whose system is the subject of the audit. The regulator generally expects an enterprise-side record that is independent of the provider being audited.