AI policy version control: how to treat gateway policy like code
AI gateway policy that governs which users can call which models with which data lives in YAML, evolves with the organization, and carries the same regression risk as application code. Treating the policy as code means git-backed storage, semantic versioning of policy bundles, audit-log tagging of decisions with the policy version hash, blue/green policy rollout, and shadow-mode evaluation before promotion. The NIST AI RMF MAP and MANAGE functions ask the questions the version-control discipline answers.

An AI gateway policy is the YAML or JSON artifact that expresses which identity, in which role, can call which model, on which route, with which data classification, with which redaction. The policy evolves with the organization. New roles get added. New models get approved. New classifications get carved out. A policy change that broadens access by one role on one route can unblock a team or open an exposure path, depending on the surrounding context. A policy change that narrows access can satisfy a compliance review or break a production workflow. The change set carries the same regression risk as application code. Treating the policy as code, with the git-backed storage and the deployment discipline that application code receives, is the architectural baseline that the NIST AI RMF MAP and MANAGE functions expect of an operating AI deployment.
I want to walk through the five disciplines that compose policy version control: git-backed storage, semantic versioning of bundles, audit-log tagging with the policy version hash, blue/green rollout, and shadow-mode evaluation. The result is a policy lifecycle that produces the same auditability and rollback property that mature application deployments produce.
Git-backed policy storage
The policy lives in a git repository. The repository structure separates the policy bundles by environment (dev, staging, prod) and by tenant where the deployment is multi-tenant. Every policy change is a commit, every commit is signed, every signed commit goes through the same code review the application code receives. A sample policy bundle:
The git history is the policy history. The reviewer sees the diff between the previous policy and the proposed policy. The CI pipeline runs the policy validator (schema check, conflict check, dead-rule check) before the merge. The discipline is the standard GitOps pattern applied to a non-code artifact.
Semantic versioning of policy bundles
The policy bundle carries a semantic version. The major version increments on incompatible changes (a removed rule, a changed default, a schema change). The minor version increments on additive changes (a new rule, a new role, a new route). The patch version increments on rule refinements that do not change the policy contract. The 2.14.3 in the bundle above maps to: major 2 (current policy generation), minor 14 (14 additive evolutions since the major), patch 3 (3 refinements since the minor). The version is the artifact the audit record references. The version is the unit the rollback operates on. The version is the lookup key the regulator uses to reconstruct the policy in force at the time of a specific decision.
Audit-log tagging with the policy version hash
Every per-decision audit record carries the policy version hash. The hash is the SHA-256 of the policy bundle artifact, not the human-readable version string, so that the lookup is unambiguous even across renamed branches and rebased commits. A sample audit record:
The version hash gives the auditor the exact policy artifact that governed the decision. The signed git commit gives the auditor the change history that produced that artifact. The chain runs from the audit record to the policy bundle to the git commit to the human reviewer. The chain is what the EU AI Act Article 12 record-keeping mandate produces when the policy is treated as code.
Blue/green policy rollout
A policy change goes through blue/green deployment. The current policy (blue) continues to handle production traffic. The new policy (green) gets deployed alongside, with a portion of traffic routed to it. The rollout pattern measures the decision distribution under the new policy against the decision distribution under the current policy. If the decision distribution diverges in a way the operator did not anticipate (the new policy blocks 3 percent more support_agent calls than the old policy on the same traffic), the operator has the data to investigate before the rollout reaches 100 percent. The pattern requires two pieces of infrastructure: a traffic splitter at the gateway entry and a parallel evaluation path that emits the decision the green policy would have made without enforcing it.
Shadow-mode evaluation
Shadow mode is the deeper case of the blue/green pattern. The green policy evaluates every request, in parallel with the blue policy, and emits a decision record without affecting the request path. The shadow record carries the same fields the production audit record carries, with a shadow flag. The operator compares the shadow record against the production record for every request, computes the divergence rate per role, per route, per classification, and decides whether the divergence reflects an intended policy change or an unintended regression. The shadow mode is the mechanism that catches the rollback scenarios. A policy that looked correct in code review and passed CI validation can produce a 12 percent block-rate divergence on the segment of traffic the reviewer did not have data for; the shadow comparison surfaces the divergence before production traffic sees the new policy.
DeepInspect
DeepInspect treats the policy as the primary versioned artifact. Policies live in a customer-controlled git repository, with the schema validated by the DeepInspect policy CI. The policy bundle artifact is signed and tagged with the semantic version and the bundle hash. The gateway loads the policy from a registry the customer operates, with the version hash exposed as a Prometheus metric and as a field on every audit record. Blue/green rollout is configured per route, with the traffic split adjustable in real time and the divergence metric exposed on the operations dashboard. Shadow mode runs against the candidate policy for the operator-defined evaluation window before the promotion, with the shadow audit records committed to the customer audit sink under a separate prefix to keep them distinct from the production record.
The pattern produces the property the NIST AI RMF MAP function asks for (the policy artifact is identifiable, versioned, and traceable to the decision it governed) and the property the MANAGE function asks for (the rollback path is real, the divergence is measured, the audit record reflects the policy in force).
Book a demo today.
Frequently asked questions
- Why a hash on the audit record when the version string is already there?
The version string is the human-readable label the policy team uses to discuss the bundle. The hash is the cryptographic identifier of the artifact. The two can diverge: a release manager can re-tag a build, a rebase can change the commit graph, a hotfix can ship under the same version string with different content. The hash gives the auditor the unambiguous artifact reference, independent of the labeling. The pattern matches the way container deployments record both the image tag and the image digest: the tag is for humans, the digest is for the record.
- How does shadow mode handle stateful policy effects like rate limits?
Shadow mode handles stateless decisions cleanly: the candidate policy evaluates the request and emits the would-have-been decision. Stateful effects require care. A rate-limit decision in shadow mode does not increment the production rate-limit counter; otherwise the shadow evaluation affects the production decision. The pattern is to evaluate the rate-limit predicate against a parallel counter that the shadow policy maintains, with the counter reset between shadow runs. The same applies to circuit-breaker state, retry budgets, and quota tracking. The principle is that shadow mode is observation, not enforcement, and the implementation has to honor the principle through state separation.
- What is the policy CI pipeline expected to catch?
The policy CI runs schema validation (the YAML conforms to the policy schema), conflict detection (two rules do not produce contradictory decisions for the same identity and route), dead-rule detection (a rule is never reachable because a higher-priority rule shadows it), reference validation (the roles, routes, and classifications named in the rules exist in the corresponding registries), and signature verification (the bundle commit is signed by an authorized reviewer). The CI pipeline does not run the policy against production traffic; that is the shadow-mode evaluation. The CI pipeline catches the static defects; the shadow mode catches the behavioral regressions.
- How long should shadow mode run before promoting a policy?
The duration depends on the traffic volume and the segment coverage. The operational target is to observe the candidate policy against every role, every route, and every classification at least 100 times before promotion. For a gateway handling 10,000 decisions per hour with even distribution across 20 routes and 5 classifications, the 100-observation target hits in approximately 10 minutes. For a gateway with skewed traffic, where a tail role triggers once an hour, the target requires 100 hours of shadow mode. The discipline is to wait until the coverage target is met, not until the calendar window expires.
- Does the policy version need to align with the application version?
No. The policy lifecycle is independent of the application lifecycle. The application can deploy a new release without a policy change. The policy can promote a new bundle without an application change. The two artifacts are versioned separately, with the audit record carrying both: the application version (from the request context) and the policy version (from the gateway). The independence lets the policy team iterate on the policy without coordinating with every application team, which is the property the MAP and MANAGE functions implicitly require for an organization with multiple AI applications and a single policy regime.