← Blog

Fail-closed AI gateway design: why the default failure mode is the security mode

A fail-closed AI gateway returns HTTP 503 when the policy decision point cannot reach a verdict, blocking the request rather than forwarding it. A fail-open gateway returns HTTP 200 with the upstream model response, treating the policy outage as a pass. The choice between the two postures determines whether a policy outage produces a security incident or a contemporaneous deny record. EU AI Act Article 12 and Article 26 expect the deny record. The four failure categories that test the design are policy timeout, identity provider outage, redaction engine outage, and audit write outage.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Platform & Architecturefail-closedai-gatewaypolicy-enforcementeu-ai-actaudit-logsavailability
Fail-closed AI gateway design: why the default failure mode is the security mode

A fail-closed AI gateway returns HTTP 503 Service Unavailable when the policy decision point cannot reach a verdict within the configured timeout. A fail-open gateway returns HTTP 200 with the upstream model response, treating the policy outage as an implicit pass. The two configurations produce different incident shapes during the same partial outage. The fail-closed gateway produces a deny record and a customer-visible error. The fail-open gateway produces a successful LLM response with no policy record, which is the exposure shape EU AI Act Article 12 and Article 26 treat as a record-keeping failure. The architectural default determines which incident shape the operator gets on the bad day.

I want to walk through what fail-closed means at the AI gateway, the four failure categories that test the design, the code-level pattern that distinguishes the two postures, and the regulatory alignment that makes fail-closed the only defensible default for a high-risk AI deployment.

The architectural argument for fail-closed defaults

A policy decision point exists to evaluate a request against the per-user, per-role, per-route, per-classification policy. The decision exists for a reason: the policy expresses what the organization permits and what the organization records. When the policy decision point is unreachable, the gateway has two paths. The fail-open path forwards the request to the LLM and returns the model response, on the implicit theory that availability of the AI application outweighs the policy gap. The fail-closed path blocks the request, returns 503, and writes a denied-decision record with the failure category. The fail-closed path treats the policy as load-bearing. The fail-open path treats the policy as advisory. A policy that exists for record-keeping purposes under EU AI Act Article 12 is load-bearing by construction.

Policy timeout handling

The first failure category is policy timeout. The policy decision point evaluates the request against the policy bundle within a configured budget, typically 50 to 100 milliseconds. When evaluation exceeds the budget, the gateway must choose. The fail-closed pattern is:

[@portabletext/react] Unknown block type "code", specify a component for it in the `components.types` prop

The fail-open pattern is to log a warning, set the decision to "pass", and forward to the LLM. The two patterns produce divergent audit records. The fail-closed record contains the deny decision with the policy_timeout reason and the policy version that was in force. The fail-open record contains a pass with a "policy_unavailable" flag, which Article 12 reviewers read as a record of an undecided request. The deny record satisfies the contemporaneous record requirement. The pass-with-flag record does not.

Identity provider outage handling

The second category is IdP outage. The gateway verifies identity per request against the configured IdP. When the IdP is unreachable, the gateway cannot validate the JWT signature against the IdP public key or refresh the SSO session. The fail-closed default is to deny every request whose identity cannot be verified. The fail-open default is to accept the cached identity claim and forward the request. The cached-claim path is the path that produced the May 2026 Snowflake incident shape, where a stale session token granted access after the underlying identity had been revoked. The fail-closed default for IdP outage trades short-term availability for the property that no request reaches the LLM without verified identity. The audit record contains the deny decision with the idp_unreachable reason and the last successful verification time.

Redaction engine outage handling

The third category is the redaction engine. The redaction engine evaluates the prompt for sensitive content (PII, secrets, classified data) and applies the configured action: pass, redact, or block. When the redaction engine is unreachable, the gateway cannot determine the data classification. The fail-closed default treats the unclassified prompt as the highest-risk classification and applies the corresponding policy, which in most enterprise configurations is block. The fail-open default forwards the prompt as classified, exposing the gap. The fail-closed pattern writes the audit record with the redaction_engine_unavailable reason and the conservative classification applied. The pattern aligns with the NIST AI RMF MEASURE function expectation that uncertainty is recorded as uncertainty, not as a verified safe outcome.

Audit write outage handling

The fourth category is audit-write outage. The gateway commits the per-decision audit record to the customer's audit sink (S3, Splunk HEC, Datadog Logs, Snowflake table) before the model response returns. When the audit sink is unreachable, the gateway has the same two options. The fail-closed pattern blocks the request:

[@portabletext/react] Unknown block type "code", specify a component for it in the `components.types` prop

The fail-open pattern queues the record for retry and forwards the request. The fail-closed pattern is the only path that satisfies EU AI Act Article 12 contemporaneous record requirement and DORA Article 19 retention requirement. A request that reaches the LLM without a committed audit record is, for regulatory purposes, an unrecorded interaction. The fail-closed default treats the audit commit as a precondition for the LLM call, not a side effect. The latency cost is the round-trip to the audit sink, which in practice is 10 to 30 milliseconds for a regional configuration.

DeepInspect

DeepInspect implements fail-closed defaults for each of the four failure categories. The policy decision point evaluates against the policy bundle within a 50 millisecond budget; on timeout, the proxy writes the policy_timeout deny record and returns HTTP 503. The identity layer verifies the JWT signature against the IdP public key per request; on IdP unreachability, the proxy writes the idp_unreachable deny record and returns 503. The redaction engine evaluates the prompt against the configured classifiers; on engine unavailability, the proxy applies the highest-classification policy and records the redaction_engine_unavailable reason. The audit sink commit is a precondition for the LLM forward; on sink unreachability, the proxy returns 503 with audit_unavailable.

The behavior produces a customer-visible error during partial outages of the policy infrastructure. The behavior also produces the contemporaneous deny record that EU AI Act Article 12, Article 26, DORA Article 19, and Fannie Mae LL-2026-04 each expect. The architectural choice is whether the bad day produces an availability incident or a record-keeping incident. The fail-closed default produces the availability incident, which the operator can mitigate through redundancy of the policy infrastructure.

Book a demo today.

Frequently asked questions

Does fail-closed mean every dependency outage takes down AI access?

Fail-closed means every uncovered dependency outage produces a deny decision. The architectural mitigation is redundancy of the dependencies. The policy decision point runs as a multi-replica deployment with health checks and load-balanced routing. The IdP is fronted by the IdP vendor's high-availability layer (Okta, Entra ID, Auth0 each operate multi-region active-active). The redaction engine is co-located with the proxy to eliminate the cross-region failure mode. The audit sink is a high-availability service the customer operates. The fail-closed default forces the operator to engineer the dependencies for the availability the AI application requires, rather than papering over the dependency gap with fail-open behavior.

Where in the EU AI Act is the fail-closed expectation actually written?

EU AI Act Article 12 requires automatic recording of events over the lifetime of a high-risk AI system, with retention. Article 26 requires the deployer to maintain logs that are automatically generated by the system. Neither article uses the phrase "fail-closed". The implicit requirement is that a request reaching the AI system without a generated log is a violation of the record-keeping mandate. A fail-open gateway that forwards requests during an audit-write outage produces unlogged requests, which is the violation shape. Fail-closed is the architectural default that prevents the unlogged-request outcome.

How does fail-closed interact with circuit breakers?

A circuit breaker is the engineering pattern for handling repeated downstream failures: open the circuit, fail fast, retry after a backoff. The circuit breaker and fail-closed are compatible. The circuit breaker controls when the gateway stops trying the dependency; fail-closed controls what the gateway returns when the dependency is unreachable. The fail-closed configuration sets the circuit-open behavior to return 503 with the deny record. The fail-open configuration sets the circuit-open behavior to return the LLM response with a policy_unavailable flag. The circuit breaker is the mechanism; fail-closed is the policy.

What is the latency budget for the fail-closed audit commit?

The audit commit happens before the model response returns. The latency cost depends on the audit sink. S3 with cross-region replication is 50 to 100 milliseconds. Splunk HEC over HTTPS is 20 to 50 milliseconds. Datadog Logs intake is 30 to 60 milliseconds. A Kafka producer with acks=1 to a customer-operated cluster is 5 to 15 milliseconds. The fail-closed pattern requires the operator to pick an audit sink whose commit latency fits the LLM call budget. In practice, the Kafka-to-cold-storage pattern with a producer ack on the hot path satisfies both the latency budget and the durability requirement.

Does fail-closed apply to read-only LLM calls?

The fail-closed default applies to every LLM call the gateway forwards. The distinction between read-only and write-side is a property of the downstream application, not the LLM call. An LLM call that reads a customer's documents to produce a summary is read-only from the application's perspective and a data-flow event from the policy perspective. The data-flow event is the unit Article 12 records. The fail-closed default applies the same way to summarization, classification, RAG retrieval, and generation. The category does not change the record-keeping requirement.