← Blog

Why you need an AI system of record for audit readiness

UK AISI put agent task-completion duration on a two-month doubling curve. Quarterly audit cadences fall behind almost immediately. The gap looks like an audit calendar problem, but the mechanism underneath is a missing system of record for AI decisions, written synchronously at decision time, identity-bound, and signed inline.

ai-securityai-governanceauditcomplianceagentic-aisystem-of-record

The UK AI Security Institute's April 27 findings put agent task-completion duration on a two-month doubling curve. A workflow that takes four hours today takes eight hours in June, sixteen hours in August, and runs multi-day by November. Every frontier model AISI red-teamed had exploitable weaknesses. Norton Rose Fulbright's Rethinking Governance for Agentic AI (April 2026) reaches the same operational conclusion from the legal side: existing oversight assumptions break when agents act for weeks without a human in the loop. Internal audit teams I talk to are still on a quarterly cadence.

Underneath the cadence problem is a missing system of record for AI decisions.

AI System of Record Photo by Viktor Talashuk on Unsplash

The assumption that broke

Historically, compliance has rested on a handful of authoritative systems. HRIS for employees, ERP for financial transactions, CRM for customer interactions, etc. Auditors pulled records from these silos because the write-time data was authoritative and the records were evidence-grade.

AI has no equivalent. Most enterprises reconstruct AI decisions from five lossy sources: LLM provider logs, application audit logs, SIEM events, gateway proxy logs, and developer traces. The reconstruction happens after an incident is opened. By then the agent has been retired, the prompts have rolled out of 14-day retention, and the model has shipped a new version. I covered the building blocks of evidence-grade audit in How to Build a Defensible AI Audit Trail. This post is about where that evidence has to live.

What an AI system of record contains

Every AI decision writes one record at decision time, and that record is the evidence the auditor reads directly instead of reconstructing from logs.

Illustrative per-call schema:

{
  "decision_id": uuid,
  "parent_decision_id": uuid,
  "timestamp": iso-8601,
  "human_identity": verified-session-id,
  "agent_identity": agent-id + version,
  "model": provider + model-id + version,
  "input_classification": [pii, phi, source-code, internal],
  "policy_evaluated": policy-id + version,
  "policy_decision": allow | deny | redact,
  "policy_rationale": ruleset-trace,
  "tool_calls": [tool-id, args-hash, result-hash],
  "output_classification": [...],
  "audit_signature": cryptographic_signature
}

The record is generated on the same code path that evaluated the policy, with the decision context captured in memory before the response returns. A cryptographic signature is computed over that payload at capture, so the signed record is durable and tamper-evident from the moment of the decision. The signing key lives outside the writing service's trust boundary (HSM or a separate signing service), so the process that emits the record cannot rewrite it after the fact, and a periodic hash-chain anchor catches retroactive deletes. The fan-out to long-term storage can be queued; the signature freezes the payload at the source.

How the record differs from a log

Engineering teams sometimes assume their existing log aggregator already serves this role. Five mechanical properties separate the two:

PropertyLog aggregatorSystem of record
Capture pointEmitted from app code after the factGenerated at the enforcement boundary, at decision time
AuthorityCopy of state held elsewhereAuthoritative record of the decision
SchemaLoose, source-dependentEnforced, versioned
Tamper evidenceNoneHMAC computed over the payload at capture
Loss modelCan drop under backpressureSigned payload is durable from the moment of capture
Audit pathReconstruction across sourcesDirect query

Why this matters when autonomy doubles

Quarterly audit assumes the work being audited completed inside the audit period. Multi-day and multi-week agent workflows cross audit boundaries. The October auditor is reviewing a chain that began in July, escalated in August, and triggered an external system action in September. The full chain is scattered across providers, gateways, and SIEMs.

A system of record has the chain. Every decision is linked by parent_decision_id. The query result in October is identical to the result in September.

Where the system of record sits

Two architectural properties matter, and both are easy to get wrong:

  • Inline. Every inference and every tool call passes through the enforcement boundary. The SOR write is the same code path as the policy evaluation. No sampling, no async pipeline that can drop records under load. The case for inline enforcement at machine speed is laid out in 22-Second Breach Windows Mean Your AI Enforcement Must Be Inline.
  • Identity-bound. Each record carries the verified human identity that authorized the agent. Only a propagated, verified human identity carries that signal; service-account API keys collapse it back to the agent. The identity has to be propagated into the request and verified at the boundary. The post-authentication gap that makes this hard is covered in Securing the Inference Lifecycle.

Sidecar log shippers and post-hoc aggregation produce a partial reconstruction, not a system of record.

What changes operationally

Three shifts for compliance and platform teams:

  • An auditor queries the agent's decisions between two timestamps, filters for the calls that touched PHI, and pulls the denied requests with the policy that denied them. Audit operates as a query against the authoritative record.
  • Incident response loses its reconstruction step. Investigators query the SOR directly. Mean time to evidence collapses.
  • Agent decommissioning becomes provable. The retired agent's complete decision history is signed at write time and queryable indefinitely.

The same record satisfies overlapping regulatory requests. EU AI Act Article 12 logging. HIPAA's accounting-of-disclosures requirement when PHI flows through inference. Fannie Mae LL-2026-04's model decision provenance. One record schema satisfies multiple obligations.

DeepInspect

With agent autonomy doubling, the only way to keep pace is to write the evidence at decision time.

DeepInspect is the AI control plane that produces that record on every call. Identity, policy, classification, decision, and signature are emitted inline, per request, per tool call. The audit corpus is authoritative on the day the decision is made.

Let's talk about how DeepInspect can help you meet your audit requirements.