Why hash the prompt instead of storing the full text inline?

Long prompts and attachments push log record size past the ingestion limits of most SIEM systems. The hash-first pattern keeps the record compact while preserving the auditor's ability to verify the referenced content. The prompt content is stored separately in an object store with the same retention policy as the log record.

Does the schema support redaction of sensitive data in the prompt?

Yes. The prompt_ref.redaction_policy field records which redaction rules were applied. The redacted prompt is stored inline or by reference. The pre-redaction prompt is retained in a separate storage tier with restricted access, subject to the retention rules that apply to the underlying data category.

How does the schema handle streaming responses?

Streaming responses produce multiple response chunks. The log record captures the aggregate response after the stream closes. The token_usage field records the total prompt and completion tokens. Per-chunk logging is an optional extension for deployments that need finer granularity for latency analysis, and the extension is a separate record type with a foreign-key link to the parent record.

Can the schema be extended without breaking the retention record?

The record_version field is the extension anchor. A new schema version adds fields as optional. Existing records retain their original version and remain valid under the schema they were written against. The auditor's read path selects the schema by version. The extension pattern is standard forward-compatible JSON schema evolution.

What identity provider representations does the schema support?

Any provider that emits a stable subject identifier. OIDC sub claims, SAML NameID values, and non-human identity records from a workload identity provider all map to the subject field. The idp_subject_claim field carries the original claim string, which lets the SIEM correlate with the identity provider's own audit stream without ambiguity.

Does the schema record the model provider's request ID?

Yes, as an optional field under provider-specific extensions. OpenAI's request_id, Anthropic's request-id header, and AWS Bedrock's x-amzn-RequestId all populate a provider-specific request-ID field. The field is optional because not every deployment routes to a provider that emits one.

A JSON Schema for AI Audit Logs: The Fields a Regulator, an Auditor, and a SIEM All Need in the Same Record

An AI audit log has three consumers. The regulator reads it during an Article 12 audit of a high-risk deployment. The internal auditor reads it during an ISO 42001 certification review. The SIEM reads it during an active incident investigation. Most deployments produce three log formats and reconcile them after the fact. The reconciliation step is where the failure modes accumulate: records that appear in one stream and not the other, timestamps that differ across streams, identity representations that use different conventions per consumer.

A single JSON schema written at the inspection layer satisfies all three consumers on the same record. I want to walk through the schema field by field, the identity representation, the policy-state representation, and the storage-layer contract that makes the schema durable across the six-month minimum retention window Article 19 sets.

Schema at a glance

The full schema is under 40 fields. The required subset is 14. The optional fields cover redaction metadata, response-side annotations, and provider-specific request-ID mappings.

The record_id is a UUID assigned at ingestion by the inspection layer. The record_version pins the schema version. The occurred_at is the time of the AI request. The recorded_at is the time the log record was written to storage. The two timestamps rarely drift more than a millisecond, and the gap is itself a health signal.

Identity representation

Article 19 requires the log to identify natural persons involved. ISO 42001 requires the log to identify the responsible operator. The SIEM needs a stable subject identifier for correlation with the identity provider's audit stream. One schema satisfies all three by using a structured identity object.

The subject field is the stable identifier the identity provider issues. For a human, it maps to the user's sub claim in the OIDC token. For an agent, it maps to the agent's non-human-identity record. The on_behalf_of field carries the human identity when an agent acts on behalf of a person, which is the delegation pattern the NIST NCCoE AI agent identity and authorization project describes.

Request representation

The request field captures the AI call as it left the inspection layer, before the model saw it. The representation includes the model provider, the model name, the prompt (subject to redaction policy), and the tool-call surface if the deployment uses function-calling.

The prompt_ref field uses a hash-first pattern. The SHA-256 of the full prompt is always recorded. The prompt content is either inline (for short prompts) or referenced by an object-store pointer (for long prompts and attachments). The hash lets the auditor verify that the referenced content is the same content that was sent to the model. The redaction policy identifier lets the log record what redaction was applied without expanding the redaction rules inline.

Policy state representation

The policy field captures the state of the enforcement engine at the moment of decision. The regulator uses this field to answer the question "what rules were in force when this request was allowed or blocked."

The policy_version is a semver-style string. The policy_hash is a SHA-256 of the policy document that was active. The matched_rules array lists every rule that fired during evaluation. An allow decision may match multiple rules that all evaluate to allow. A block decision surfaces the rule that produced the block and the match reason.

Decision and response

The Decision object records the final outcome the inspection layer returned to the caller. The Response object, when present, captures the model's output for downstream analysis and any output-side policy enforcement.

The Response block is optional at the schema level because block decisions never reach the model, and no response is produced. Allow and redact decisions produce a response, and the response block is populated.

Storage-layer contract

The schema is durable only when the storage backend enforces the properties the schema assumes. Four contract points make the schema durable across the retention window.

Write-once storage. The log record is written to a storage backend that enforces immutability at the storage layer. S3 Object Lock in compliance mode, Azure Blob Storage immutability policies, or GCS bucket-lock all satisfy the contract.

Content-addressable prompts. The prompt_ref.hash field is verifiable against the referenced content. A regulator can compute the hash of the referenced object and confirm it matches the hash in the log record. Any tampering with the prompt content invalidates the hash.

Separate write and read paths. The AI application writes logs. The auditor reads logs. The separation removes the failure mode where the AI application can modify what the auditor sees.

Retention policy at ingestion. The retention clock starts at recorded_at, not at last modification. Object-lock policies keyed off ingestion time are the common pattern.

DeepInspect

This is the schema DeepInspect emits. DeepInspect sits inline between your users or agents and the LLM APIs they call. For every request and response, it produces a log record that matches the schema above and writes the record to an immutable storage backend at the moment of decision.

The schema is consumed by three downstream systems. Auditors read it during an Article 12 review. Compliance teams read it during ISO 42001 certification. SOC analysts read it during incident investigation. One record. Three consumers. No reconciliation.

Book a demo today.