Does Article 12 apply to general-purpose AI models or only deployer applications?

Article 12 applies to high-risk AI systems as classified under Annex III. The obligation falls on the deployer of the high-risk system. General-purpose AI model providers carry separate obligations under Article 53. If your organization is the deployer of a high-risk AI system, the logging obligation applies regardless of where the model is hosted.

Is hashing the prompt sufficient, or does the record have to store the prompt body?

Article 19 names the input data leading to the result as a required field. The Commission's Q&A guidance (April 2026 update) treats a cryptographic hash of the input combined with classification and policy metadata as compliant, on the basis that the deployer can reconstruct the request in a forensic context. Storing the prompt body in cleartext is not required and creates derivative GDPR exposure.

What is the relationship between Article 12 and Article 99 penalties?

Article 99 sets the penalty tier for high-risk non-compliance at €15 million or 3% of global annual turnover, whichever is higher. A failure to produce compliant logs under Article 12 falls into that tier. The supplying-misleading-information tier (€7.5 million / 1%) applies to a separate set of behaviors.

How does Article 12 interact with DORA Article 19 for financial deployers?

DORA Article 19 sets ICT-incident reporting obligations for financial entities. The two regimes are independent but share the audit-record substrate. A financial deployer that builds an Article-19-compliant log set will satisfy the documentation half of DORA Article 19 without separate infrastructure. The reporting cadence and the responsible authority differ.

Implementing EU AI Act Article 12 Logging: An Architectural Walkthrough

The EU AI Act Article 12 takes effect on August 2, 2026 for high-risk AI systems. The text reads: "high-risk AI systems shall technically allow for the automatic recording of events (logs) over their lifetime." Article 19 names the fields the record has to contain. Article 99 sets the penalty tier at €15 million or 3% of global annual turnover. Most production AI stacks today have no compliant record at all. This guide walks through the architecture that closes the gap.

I want to walk through the four decisions an implementer has to make at the request layer, the audit-record schema that holds up under a regulator's read, and the failure modes of the application-controlled logging pattern that most teams default to.

The four request-layer decisions

Every AI request that touches a high-risk system crosses four decision points. Each one produces one field in the audit record. Miss any of them and the record fails an Article 19 review.

1. Identity resolution

The record has to name the natural person involved in the request. A shared service credential, an OAuth client ID, a service account token - none of these are sufficient. The verified subject claim has to resolve to a human identifier (employee ID, customer ID, agent-on-behalf-of) before the request reaches the model. Identity resolution sits at the TLS termination point of the gateway. The gateway extracts the JWT, validates the signature against the identity provider, and writes the resolved subject into the request context.

2. Payload classification

The record has to name the data class the request carried. PII, PHI, source code, customer contracts, financial records. Classification runs against the decrypted prompt body. A combination of regex (SSN, credit card, MRN), named-entity recognition (person names, addresses), and content classifiers (intent, sensitivity tier) produces a verdict. The verdict is the data-class field in the audit record.

3. Policy lookup

The record has to name the policy version that decided the request. The decision point reads the per-route and per-role rule, returns pass or block with a reason code, and stamps the request with the policy version hash. The policy version hash is what a regulator uses to confirm the rule in effect at the time of the decision matches the rule the deployer attests they had in production.

4. Audit writing

The record has to be written before the model responds. Writing after the model returns the completion is too late. If the audit writer fails after the model has already executed, the deployer has an action with no evidence, which is the failure mode Article 12 is designed to prevent. The audit writer has to commit synchronously, and the gateway has to fail closed if the commit errors.

The audit-record schema

The minimum field set for Article 19 compliance:

The prompt content is hashed, not stored. Storing the cleartext prompt creates a derivative GDPR liability that most legal teams will not approve. The hash is sufficient for forensic reconstruction when paired with the policy version and the data class.

Retention

Six months minimum per Article 19. Most financial deployers carry longer obligations under sector law (MiFID II, SEC 17a-4), so the practical retention floor is five to seven years for finance. Healthcare deployers carry HIPAA's six-year baseline.

Tamper-evidence

Each record's writer_signature is an HMAC of the prior record's signature plus the current record's content. The chain is verifiable on read. A regulator who replays the chain can detect any single altered byte. This is the tamper-evident property that survives a forensic review.

Where application-controlled logging fails

Most production AI stacks write audit records from inside the application that issues the AI request. Three failure modes are structural:

Selective logging. The application logs success paths and quietly misses edge-case failures. The Article 19 record set is incomplete.

Suppression. The application that wrote the log can modify the log. A compromised application can rewrite history. The chain of evidence breaks.

Loss on crash. The application crashes after the model responds but before the audit writer commits. The action was taken; the evidence is gone.

The architectural fix is to move audit writing out of the application and into the gateway that sits between the caller and the LLM endpoint. The application no longer carries the audit responsibility. The gateway writes the record from a position the application cannot modify after the fact.

The implementation sequence

A working implementation against the August 2, 2026 deadline looks like this:

Deploy the AI gateway between identity-bearing callers and the LLM endpoints. Terminate TLS at the gateway, not at the application.
Wire identity. JWT or SAML claim has to resolve to the verified subject before the request leaves the gateway.
Wire classification. PII, PHI, contract content, source code. The classifier runs against the decrypted prompt body before the gateway forwards the request.
Wire policy. Per-route, per-role rules. Versioned, hashed, signed.
Wire the audit writer. Synchronous commit, fail-closed on error, HMAC chain on each record.
Confirm retention. Six months baseline. Sector-specific extensions where they apply.
Run a forensic dry-run. Generate 1,000 requests across the policy surface, verify each one produces a complete record, then verify the chain replays clean.

The gating question for any architecture under review is whether a regulator can reconstruct what happened on the day in question without depending on application logs. If the answer is no, the architecture does not pass.

DeepInspect

DeepInspect implements every decision point named above as a single deployment. Identity resolution at the TLS boundary, payload classification on the decrypted prompt body, per-route and per-role policy with versioned hashes, synchronous audit writing with HMAC chaining, six-month retention with extension for sector law. The reference implementation runs the full chain under 50 ms per decision and produces records that satisfy Article 19 field-for-field.

We have an Article 12 mapping document that ties each requirement to a specific control in the platform. Request it from your account team.

If you are facing the August deadline, let's talk.