AI gateway
An AI gateway is a network proxy that sits between authenticated callers (users, services, agents) and one or more LLM endpoints. The gateway terminates the caller TLS, extracts identity from the request, classifies the prompt payload, evaluates a per-route policy in the request path, and either forwards the request to the model or returns a block. Every decision produces an audit record that names the subject, the data class, the policy version, and the outcome. An AI gateway differs from a generic LLM proxy because identity context is a first-class input to the decision, not metadata appended after the fact.
How an AI gateway sits in the request path
The gateway terminates the caller TLS connection and decrypts the prompt payload. A classification model labels the payload (PII, PHI, source code, customer data, model-specific category). The policy decision point looks up the per-route and per-role rule for the verified subject and the data class. The gateway forwards the sanitized request to the destination LLM or returns a block to the caller. The DeepInspect benchmark holds the decision under 50 ms, which fits inside the latency budget that production AI applications already accept since LLM inference itself runs 500 ms to 5 seconds.
What an AI gateway records
Every request produces a per-decision audit record. The record carries the verified subject claim, the route, the prompt classification verdict, the policy version hash, the decision reason code, the outcome, the destination model, and the latency. EU AI Act Article 12 references this granularity as traceability and event logging. Fannie Mae Lender Letter LL-2026-04 (effective August 6, 2026) requires the lender to produce per-decision evidence on demand. NIST AI RMF Pillar 3 calls the same primitive action lineage. The audit format is the same; the regulator vocabulary differs.
Related reading
- Identity-Aware AI Gateway Architecture: How Inline Enforcement Binds Decisions to Users and Agents
An identity-aware AI gateway sits at the AI request boundary, attaches verified identity context to every model API call, evaluates per-route and per-role policies, and commits a per-decision audit record before the model response returns to the calling application. The architecture closes the post-authentication gap that most enterprise AI deployments have inherited from the credential-pooling pattern used by SDKs and proxy frameworks. This piece walks through the architectural building blocks, the call path, the audit primitives, and where the identity-aware gateway sits relative to existing IAM, API gateway, and DLP infrastructure.
- AI Inline Enforcement Architecture: Where the Policy Decision Sits and What It Has To Commit
AI inline enforcement runs the policy decision in the request path, before the model API call returns to the calling application. The architecture places a deterministic policy decision point between the application identity and the model endpoint and commits a per-decision audit record before the response forwards. This piece walks through the architectural components, the decision-time data shape, the failure modes the implementation has to handle, and the regulatory profile that the inline placement satisfies (EU AI Act Article 12, NIST AI agent identity and authorization Pillar 2 and Pillar 3, Fannie Mae LL-2026-04, DORA Article 6).
- AI Firewall: What It Actually Inspects, Where It Sits, and the Audit Record It Produces
The phrase "AI firewall" gets applied to four very different products. The category collapses when you ask what each one inspects, where in the request path the inspection happens, and whether the record series survives EU AI Act Article 12 review. This piece walks through the four product shapes that get marketed as AI firewalls, the architectural property each one has and lacks, the inspection target the term should refer to in a regulated deployment, and the audit record the inspection layer commits at decision time.