How is this different from regular log shipping?

Regular log shipping moves application logs to a central store. The integrity guarantees are typically operational (TLS in transit, access controls at rest) rather than cryptographic (signatures, hash chain, checkpoints). For an audit record that has to survive forensic examination, the cryptographic controls add the proof of integrity that the operational controls alone do not produce.

Does the gateway need to be tamper-resistant hardware?

The signing key has to be protected. Hardware security modules are one way. Cloud-provider key management services with restricted operator access are another. The deployment chooses based on the regulatory and operational context.

What about logs from before this architecture was in place?

Pre-existing logs that lack the chain-of-custody substrate are still usable in some contexts (operational investigation, internal review) but have weaker standing in regulatory or legal proceedings. The architectural fix produces forward-going integrity; historical logs remain what they are.

How does this work for high-volume workloads?

The per-record signing adds latency on the order of single-digit milliseconds with current hardware. The hash chain adds dependent sequencing that has to be managed for throughput. Production deployments use batching at the chain level (sign a window of records as a Merkle root) to amortize the cost.

What's the role of an external transparency log?

The transparency log is the off-system store of checkpoints. Publishing checkpoints to a transparency log the operator does not directly control adds an external witness to the chain. The transparency log does not store the records themselves; it stores the cryptographic anchors.

How does this map to the OWASP and NIST frameworks?

OWASP AISVS 1.0 includes verification requirements on audit logging in its operational chapters. NIST AI agent identity and authorization Pillar 3 names action lineage. Both frameworks point at the same substrate: per-decision records, signed, retained, verifiable. The chain of custody is the implementation pattern.

AI Audit Log Chain of Custody: What Forensic Integrity Requires at the Request Boundary

An AI audit log that has to survive a regulatory inquiry or a legal proceeding needs more than the data it captures. The log needs a chain of custody: the proof that the record at the moment of inquiry is the record that was written at the moment of the decision, that nobody has modified it in between, and that the writer and the reader are the entities they claim to be. The chain of custody applies to the AI request-and-response log as much as to physical evidence in any other regulated context. The standard a regulator or a court applies is not "we have the log"; the standard is "we have the log and we can prove what happened to it from the moment of write to the moment of presentation."

I want to walk through the requirements for a defensible chain of custody, the failure modes that break it, the cryptographic and operational controls that produce a defensible chain, and the architectural pattern that holds up under examination.

What chain of custody actually requires

Chain of custody in any evidence context establishes five facts about a piece of evidence.

The identity of the evidence at the moment of capture. What was captured, who captured it, when, and where.

The handling between capture and storage. Who moved the evidence, when, and what was done with it during the move.

The storage. Where the evidence sits between collection and use, who can access it, and how the storage records access.

The integrity of the evidence over time. Whether the evidence at the moment of use is identical to the evidence at the moment of capture, and the proof of that identity.

The handoff. Who presents the evidence and on whose authority, with the chain of prior holders documented.

For an AI audit log, each fact translates to a specific artifact at the system layer.

The translation to AI audit logs

The identity at the moment of capture corresponds to the per-decision record the gateway writes. The record names the request, the response, the identity that initiated the call, the policy version that applied, the model version that was invoked, and the timestamp. The record is what was captured.

The handling between capture and storage corresponds to the path the record takes from the gateway's write step to the durable storage. The path includes any in-flight buffer, any transport, any intermediary system. Each hop introduces a custody question: did the record arrive unchanged?

The storage corresponds to the durable log store. The store has access controls, an access log, and a retention policy.

The integrity over time corresponds to cryptographic protection on the record. A signature at write time, verified at read time, establishes that the record has not been modified.

The handoff corresponds to the workflow that produces the record for the inquiry. Who extracted it, who verified the signature, who attests to the record being the one that was written.

Failure modes that break the chain

Three failure modes recur in AI audit deployments.

The first is the writeable log store. A log store that the application that wrote the record can also modify after the fact has no integrity guarantee. The custody chain ends at the moment the record is in a store the writer can change. Self-attestation by the application that ran the AI call is the canonical example: the application wrote the record, the application can modify it, the record cannot be presented as independent evidence.

The second is the unwitnessed transport. A record that travels from the gateway to a log store over an unsigned channel can be altered in transit by anyone in the path. The transport has to be cryptographically sealed (TLS plus signing at the source) or the chain breaks at the wire.

The third is the unsigned record. A record that sits in a tamper-resistant store but that does not carry a signature provable against a key the writer controlled has the same problem in reverse: the store can claim the record was written at time T, but there is no proof the record is what was written at time T versus what the store decided to expose at time T plus delta.

The cryptographic controls

A defensible chain of custody for AI audit logs uses four cryptographic controls.

A per-record signature. The gateway signs each record at the moment of write with a key the gateway controls. The signature includes a timestamp from a clock the gateway trusts. The signature is over the record content (the prompt, the response, the identity, the policy version) so any modification to the content invalidates the signature.

A periodic hash chain. The records in a window are linked into a hash chain. Each record's signature includes a reference to the prior record's hash. Removing a record from the chain breaks the link. The chain is the defense against selective deletion.

A signed checkpoint. At regular intervals (every hour, every day), the gateway publishes a signed checkpoint that names the current head of the hash chain. The checkpoint can be published to an external trusted store or to a transparency log. The checkpoint is the defense against bulk replacement of the chain.

A separate read-only verification key. The verification key (the public key that validates the signatures) is distributed independently of the log store. The reader of the log uses the verification key to confirm signatures and the hash chain. The verifier does not depend on the log store for the key.

The operational controls

The cryptographic controls do not stand alone. They depend on operational practices.

Key management. The signing key has to be protected. A key that the same operator can extract from the gateway and use to forge records does not protect anything. Key handling typically involves a hardware security module or a key management service with restricted operator access and an access audit trail.

Clock integrity. The timestamp in each record has to come from a trusted clock. A clock that the writer can skew can produce records dated to fit an attacker's narrative. Clock integrity typically depends on an NTP source the operator trusts and a clock-skew alarm.

Access controls on the log store. Read access has to be auditable. Write access has to be append-only and authenticated. The store records who read the log and when.

Retention controls. The log has to be retained for the regulatory or contractual period. A log that aged out before the inquiry has no evidence to present. For EU AI Act Article 19, the minimum is six months. Sector-specific obligations can be longer (HIPAA seven years, financial-services obligations of similar duration).

A pattern that holds up

A defensible architecture for AI audit logs has four components in production.

The gateway as the writer. The gateway sees every request and response, has the identity context, and produces the per-decision record. The record is signed at the moment of write.

A separate log store, append-only, with the gateway as the only writer and explicit read access for the audit team and the regulator. The store is independent of the application that ran the AI call.

A transparency-log checkpoint published to a store outside the operator's direct control. The checkpoint is the periodic snapshot that prevents bulk replacement of the chain.

A verification client used by the audit team and the regulator that validates signatures, checks the hash chain, and confirms the records against the checkpoint. The client is the read-side of the custody chain.

How this lines up with regulatory expectations

The EU AI Act Article 12 requires automatic logs that ensure traceability over the system's lifetime. The Article 19 retention obligation requires the logs to be retained by the deployer for at least six months. The Article 73 incident reporting and the Article 26 monitoring obligations depend on the records. The chain of custody is what turns the records into evidence.

The Fannie Mae LL-2026-04 mortgage AI governance framework requires lenders to produce evidence of AI decisions on subcontractor work. The chain of custody is what makes the evidence usable in a regulatory inquiry. Application-controlled logs do not satisfy the standard; the records need to live where the application that ran the decision cannot modify them.

The NIST AI agent identity and authorization framework Pillar 3 names "action lineage" as a control surface. The lineage is the record of who authorized this, under which policy, at what moment, with what outcome. The lineage is the chain of custody applied to the per-decision record.

What an inquiry actually asks for

A regulatory inquiry into an AI decision typically requests four artifacts.

The per-decision records for the specific decision under inquiry, including the prompt, response, identity, policy version, and model version.

The proof that the records have not been modified since they were written. The verification client output that confirms signatures and the hash chain.

The access log on the log store, showing who has read the records since they were written.

The chain-of-custody attestation from the operator, naming the systems and processes that produced and protected the records.

A deployer that can produce all four within the inquiry's response window has a defensible position. A deployer that has only some of the artifacts has a partial position and an explanation to make for the gaps.

DeepInspect

This is the chain-of-custody substrate DeepInspect was built to provide. DeepInspect sits inline between authenticated users or agents and the LLMs they call, writes a per-decision audit record signed at the moment of capture, links the records into a hash chain, and exposes the verification artifacts the audit team and the regulator need.

For the chain-of-custody use case specifically, DeepInspect produces the per-decision records, the signatures, the hash chain, and the periodic checkpoints. The application that ran the AI call is not in the trust path for the records. The records survive the application's failure modes (suppression, crash-loss, selective omission).

If you are preparing for a high-risk AI deployment that has to produce evidence under regulatory inquiry, let's talk today.