What is the minimum field set we need for compliance?

EU AI Act Article 19 requires timestamps, period of use, reference databases checked, input data leading to a match, and identity of natural persons involved. The canonical AI audit field set above is a superset of that. HIPAA, DORA, and SOC 2 controls all add fields on top.

How long should we keep AI audit logs?

The floor is six months under EU AI Act Article 19. Financial services regulators typically require longer (DORA-scope institutions often run seven-year retention). Healthcare deployers under HIPAA typically run six-year retention. The retention plan should anchor to the longest applicable obligation.

Should AI audit logs go in the same SIEM index as security events?

Usually no. AI audit logs are higher volume and longer retention than typical security events. A dedicated AI index with its own tier policy keeps the storage economics manageable and isolates the regulatory retention from the security retention.

How do we handle PII in audit logs?

The gateway can redact or hash regulated PII at record time and retain a separate, more restricted record with full content for the cases where regulatory inquiry requires it. The split-record pattern is standard in financial services audit logging and applies cleanly to AI.

What is the right tamper-evidence approach?

A chained signature construction where each record's signature includes a hash of the previous record's signature provides tamper-evidence at the record sequence level. Object-storage immutability (write-once retention locks on S3 or equivalent) provides storage-layer tamper-evidence. The two are complementary.

AI Audit Log Formats for SIEM Ingestion: Field Mapping for Splunk, Sentinel, and Chronicle

A policy gateway at the AI request boundary produces a per-decision audit record for every AI call that traverses it. The record fields are the regulatory and forensic evidence stream: principal identity, agent identity if any, prompt classification, response classification, policy ID applied, decision outcome, request and response timestamps, and the request ID that ties the record back to the application's own logs.

That field set does not fit any traditional SIEM schema cleanly. SIEM ingestion was designed around firewall logs, endpoint events, IAM events, and application logs. Prompt classification and response classification are AI-specific. Agent-on-behalf-of identity is a relatively new field in identity logging that not all schemas support natively. The decision is a binary pass or block, which fits most schemas, but the explanation for the decision (the policy ID and the classification reason) requires structured custom fields.

I want to walk through the canonical AI audit field set, the mapping decisions for the three SIEM platforms most common in regulated environments, and the pitfalls when AI evidence has to survive a regulatory inquiry months after the original event.

The canonical AI audit field set

A complete per-decision record for an AI request through a policy gateway typically contains fifteen to twenty fields. The required core fields are: request_id (UUID tying request to response and to application log), timestamp_request, timestamp_response, principal_identity (the human user or upstream identity), agent_identity (the agent identity if the call is on behalf of another principal), acts_on_behalf_of (the chain of delegation), model_provider, model_name, prompt_class (the classification taxonomy result for the prompt), response_class (the classification taxonomy result for the response), policy_id (the version-pinned policy applied), policy_decision (pass / block / allow-with-redaction), decision_reason (the policy clause that drove the decision), latency_ms_gateway (overhead introduced by the gateway), and record_signature (the signature over the record for tamper-evidence).

Optional fields that frequently appear: tool_invocations (list of tools the agent called in this turn), pii_fields_detected (the regulated PII categories the classifier matched), redaction_applied (which tokens were redacted from the response), system_prompt_extraction_signal (the strength of any extraction-attempt detection), injection_signal_strength (the prompt-injection classifier's confidence).

The schema does not need to be exhaustive. It needs to be stable enough that a regulator's question six months after an incident can be answered by querying the records produced at the time.

Splunk: CIM, custom indexes, and the AI data model

Splunk's Common Information Model (CIM) is the normalization layer SOC analysts query against. The AI audit fields above do not map cleanly to a single CIM data model. The closest matches are Authentication (for principal_identity and agent_identity), Web (for request_id, timestamp_*, latency_ms_gateway), and Change (for policy_decision).

The practical pattern is a custom CIM extension or a dedicated AI data model. Splunk Enterprise Security customers have started shipping AI-specific data model add-ons that define fields like ai_prompt_class, ai_response_class, ai_policy_id, and ai_decision. The add-on approach lets SOC dashboards query across AI events and non-AI events using the standard CIM joins on identity and timestamp.

Index decisions matter for retention. Regulated AI logs in scope of EU AI Act Article 19 require a minimum six-month retention. Financial services and healthcare deployers typically need longer. A dedicated AI audit index with a longer hot-warm-cold tier policy than the general security index keeps the regulatory retention from blowing up the storage cost of high-volume security data.

The field-extraction work for the AI records should happen at index time, not search time. Search-time extractions are convenient but break under data-format drift. AI gateway record schemas are still maturing; index-time extraction with strict typing catches schema changes early instead of silently producing nulls in dashboards.

Microsoft Sentinel: ASIM and custom tables

Microsoft Sentinel's Advanced Security Information Model (ASIM) defines schemas for common event categories. The relevant ASIM schemas for AI audit data are Authentication, Web Session, and Audit. None covers AI-specific fields like prompt classification or response classification natively.

The recommended pattern is a custom log table for the AI audit records with a parser function that maps the gateway's native fields to a normalized form. The parser produces ASIM-conformant fields where possible (TargetUsername for the principal identity, SrcIpAddr for the calling network identity, EventResult for the policy decision) and carries the AI-specific fields as additional structured columns.

Kusto Query Language (KQL) joins between the AI custom table and the ASIM Authentication or Identity tables are the principal query path. An analyst investigating an identity that triggered an AI policy block can join from the AI custom table to the Authentication table on TargetUsername to see the full session context around the AI events.

Sentinel's Analytics Rules can fire on AI-specific patterns: policy_decision == "block" with decision_reason == "system_prompt_extraction_attempt" is a high-fidelity signal for a deliberate attack. The rule should join to user-and-entity behavior analytics for the calling identity to reduce false positives from research and red-team activity.

Google Chronicle: UDM mapping

Google Chronicle's Unified Data Model (UDM) was designed to normalize a wide range of security telemetry into a single schema. UDM has event_type enumerations that cover most categories of security-relevant events; AI-specific event types are emerging but not all are first-class yet.

The pragmatic mapping uses the closest UDM event type (USER_RESOURCE_ACCESS for an AI call evaluated against policy) and carries the AI-specific fields in the additional structured map. The principal and target blocks carry identity. The security_result block carries the policy decision and the reason. The metadata block carries the request and response timestamps and the request ID.

Chronicle's parser configuration lets a SOC define the field extraction at ingestion time. The parser should be version-controlled and tested against sample records before each schema change at the gateway. AI gateway schemas tend to add fields rather than rename them, but the parser's strict-mode behavior on new fields needs to be set deliberately.

Reference lists and entity context in Chronicle help correlate AI events with broader security context. An identity that appears in an AI block decision should be enriched with the user's HR-context entity record, the user's recent authentication events, and the user's recent endpoint activity. Chronicle's entity graph makes this a single query.

The retention and replay problem

AI audit records have a specific replay requirement that traditional SIEM data does not always carry. A regulator's inquiry months after an incident may ask to reconstruct the policy state at the moment of the decision, not just the decision itself. This requires the policy_id field to be version-pinned to a policy document that is itself retained for the same duration as the audit record.

Most SIEM retention plans do not retain the AI gateway's policy documents. The AI gateway has to retain them or commit them to a separate object store with the same retention. The SIEM record references the policy ID; the policy document store is the dereference target.

The retention plan also has to anticipate that the SIEM may be archived or migrated before the audit records age out. A regulator-evidentiary export format from the SIEM (typically JSON with cryptographic chaining for tamper-evidence) should be tested against a representative incident before it is needed for an actual inquiry.

DeepInspect

This is the gap DeepInspect's audit record set was designed to close for SIEM ingestion. DeepInspect sits inline between authenticated users or agents and the LLMs they call, enforces identity-bound policy on every request and response, and writes a per-decision audit record outside the calling application. The record schema covers the canonical AI audit field set above and emits in JSON with structured fields ready for Splunk, Microsoft Sentinel, or Google Chronicle ingestion with documented parser configurations.

The architecture is stateless and identity-aware: every request carries the principal identity and the agent identity, and the audit record commits before the response returns to the application. The policy ID is version-pinned to the policy document at the moment of decision, so a replay query against any historical record dereferences to the actual policy in effect at that moment. Record signatures use a chained construction so post-hoc edits are detectable.

For platform teams that have been deferring SIEM integration on AI traffic because the field set didn't fit ASIM, CIM, or UDM cleanly, the documented mapping is the starting point. Book a demo today.