Does fail-closed hurt user experience?

Fail-closed produces clear errors during gateway incidents. Users see a refusal rather than a missing or degraded response. In workloads where availability matters more than enforcement, the tradeoff favors fail-open. In workloads where enforcement matters, fail-closed is the design.

Can the gateway fail-open temporarily during planned maintenance?

Yes, with the operator's explicit decision. A planned maintenance window where the operator accepts the policy gap can run in fail-open mode. The maintenance record is the artifact that documents the decision and the duration.

What about the case where the policy engine is the bottleneck?

A policy engine that is the bottleneck under load is an architectural problem distinct from the fail-mode question. The fix is to scale the policy engine, not to fail-open by default. Caching policy decisions per identity and per route reduces the engine load substantially.

How do you test the fail mode?

Synthetic monitoring that injects controlled failures (policy engine timeout, identity service unreachable) and verifies the gateway's behavior is the operational practice. The synthetic monitoring runs continuously and produces alerts when the fail mode does not behave as configured.

Does fail-closed apply to read-only requests too?

The decision can be per-route or per-method. Some operators allow fail-open for low-stakes read-only operations and fail-closed for any write-side or action-taking request. The classification depends on what the workload actually does.

How does the fail mode interact with circuit breakers?

A circuit breaker is a separate control that backs off the gateway's calls to the upstream model when the upstream is unhealthy. Circuit breakers and fail mode are orthogonal: the circuit breaker addresses upstream failure; the fail mode addresses gateway-internal failure. Both can be active simultaneously.

AI Gateway Fail-Open vs Fail-Closed: The Decision That Shapes Your Audit Trail

An AI gateway that sits inline between authenticated callers and the LLMs they use has to answer a structural question. When the gateway cannot reach the policy decision (the policy engine is down, the identity service is unreachable, a configuration cannot be loaded, a key for signing the audit record is unavailable), does the request go through (fail-open) or get refused (fail-closed)? The answer shapes the audit trail, the regulatory posture, and the production behavior under degraded conditions. Most product teams want to fail-open because the user experience degrades visibly when the gateway refuses; most security and compliance teams want to fail-closed because an undecided request is an unaccountable request. The right answer is neither universal: it depends on the workload, the regulatory posture, and the workload's actual failure cost.

I want to walk through the tradeoffs of fail-open versus fail-closed, the cases where each mode is appropriate, the data-driven defaults, and the operational patterns that hold up under audit.

What the two modes actually do

Fail-open means that when the gateway cannot complete the policy decision, the gateway lets the request through to the upstream model and writes a degraded audit record that records the failure of the decision step. The user gets the response. The audit trail shows that policy was not evaluated for that request.

Fail-closed means that when the gateway cannot complete the policy decision, the gateway refuses the request and returns an error to the caller. The user gets a clear error. The audit trail shows the refusal and the reason.

The third option, fail-partial, exists in some architectures: the gateway lets the request through but applies a degraded policy (default-deny on sensitive routes, default-allow on routine routes). Fail-partial is operationally complex and is usually a sign that the architecture has not committed to a default. I leave it aside and focus on the binary choice.

The case for fail-open

Three arguments support fail-open in specific cases.

Availability is the headline argument. A customer-facing AI workflow that depends on the gateway has its availability bounded by the gateway's availability. If the gateway's availability is 99.9% and the upstream model's availability is 99.95%, the system's availability is bounded by the gateway. Fail-open removes the gateway from the availability ceiling at the cost of removing it from the security ceiling.

The cost of refusal in non-sensitive workloads is the second argument. A free public AI tool that refuses requests during a gateway incident frustrates users and may not be worth the gain from refusing. The refusal does not protect anything substantive because the workload itself does not protect anything substantive.

The fail-over to a degraded mode is the third argument. A workload that has a secondary path (a smaller model, a cached response, a templated answer) can fail-open to the secondary path. The user gets a degraded response that is better than no response. The audit trail records the fail-over.

The case for fail-closed

Three counter-arguments support fail-closed in regulated or sensitive workloads.

Identity-and-authorization is the headline argument. A request that bypasses the policy decision is an unauthenticated and unauthorized request from the perspective of the policy regime. In a workflow where the user's authorization to call the model depends on the verified identity and the per-route policy, fail-open is operationally indistinguishable from no enforcement.

Compliance evidence is the second argument. The Article 19 logging obligation, the Article 26 monitoring obligation, and the Article 73 reporting obligation all depend on the operational record of what the system decided per call. An audit trail that includes "policy was not evaluated" entries during a gateway incident leaves a regulator with the conclusion that the workload ran without enforcement for the duration. Fail-closed produces clear records ("refused, gateway incident in progress") that survive the audit.

Blast-radius limitation is the third argument. A workflow where the wrong response is high-cost (clinical, financial advice, legal interpretation) has more to lose from an unenforced response than from a refused request. Fail-closed limits the blast radius. Fail-open lets the blast through.

A workload-first framework

The fail-open versus fail-closed decision is not a gateway-wide setting in mature deployments. It is per-route, per-policy, and sometimes per-identity. The framework that holds up looks like this.

For each workload that the gateway serves, classify the workload on two axes. The first axis is the cost of a refused legitimate request (loss of UX, loss of revenue, missed window). The second axis is the cost of a permitted illegitimate request (data exposure, regulatory finding, safety harm, blast radius).

A workload with low refusal cost and high permitted-illegitimate cost should fail-closed. A workload with high refusal cost and low permitted-illegitimate cost can fail-open. A workload with both high should be redesigned to reduce one of the two before the gateway's fail mode is chosen.

The mapping produces a per-route default. The gateway then enforces the per-route default when it cannot reach the policy decision for that route.

Where the audit record fits

The audit record produced under each mode is different and important.

Under fail-closed, the audit record is the refusal: identity, request shape, policy version (last known), reason for failure (policy engine timeout, identity service unreachable, signing key unavailable), timestamp. The record is a clear signal in the trail that requests were refused for an operational reason. The regulator sees a workload that maintained its enforcement posture during the incident.

Under fail-open, the audit record is the degraded approval: identity, request shape, the absence of a current policy decision, the reason the decision could not be made, timestamp, and crucially, the model's response. The record signals to the regulator that the workload accepted requests it could not evaluate. The regulator's question is what the policy would have decided.

Mature gateway implementations write both kinds of records to the same audit trail with a clear status field. The status field is the entry point for the post-incident review.

Operational practices that hold up

Three operational practices recur in deployments where the fail-open or fail-closed decision was right.

Per-route classification documented in the policy file. The classification lives next to the policy, not in a separate document. Reviewers see the route, the policy, and the fail mode in one place.

Synthetic monitoring of the fail path. The on-call team has a runbook step that checks whether the fail mode is being exercised correctly during a gateway incident. Synthetic requests during an incident verify the behavior.

Post-incident review that includes the fail-mode behavior. The incident review covers the gateway's behavior under the failure, not just the failure itself. The review answers whether the fail mode protected what it should have protected and whether the user-facing behavior was acceptable.

What the regulation has to say

The EU AI Act does not prescribe a fail mode. The closest the regulation comes is Article 15 (accuracy, resilience, and cybersecurity), which requires high-risk AI systems to be designed and developed in a way that they achieve an appropriate level of accuracy, resilience, and cybersecurity, and to perform consistently across the lifecycle. A workload that fails-open under a gateway incident has performed inconsistently from the perspective of the policy regime that was supposed to apply.

The Article 26 deployer obligation to monitor operation and suspend use when the system presents risks pushes in the same direction. A deployer that detects gateway incidents and the gateway is failing open has a continuity-of-enforcement question to answer. Fail-closed produces the clearer evidence record.

A pattern from a real production incident

A financial-services deployer ran an internal copilot through a gateway with fail-open as the default. During a 47-minute gateway incident, the copilot answered 12,000 requests that bypassed the policy engine. The post-incident review found that 14 of those requests would have been refused by the policy. The 14 included two requests for cross-tenant data and one for regulated information that the requester was not entitled to. The post-incident finding shaped the policy change: fail-closed became the default for any route that touches cross-tenant data or regulated information categories.

The pattern recurs. Fail-open is comfortable until the first incident. The first incident teaches the cost.

DeepInspect

This is the fail-mode architecture DeepInspect operates on. DeepInspect sits inline between authenticated users or agents and the LLMs they call, applies per-route fail-mode configuration (closed by default for regulated routes, configurable for less sensitive routes), and writes an audit record under both modes that records the policy state at the moment of the request.

For the gateway-level reliability and audit posture, DeepInspect's per-route fail-mode design lets the operator match the fail mode to the workload's risk profile. The audit trail is consistent across modes, so the post-incident review and the regulatory reporting both have the records they need.

If you are designing the fail-mode policy for a regulated AI workload, book a demo today.