← Blog

AI Data Lineage for Audit: Tracing a Model Decision Back to Its Inputs

AI data lineage for audit traces a model decision back to the inputs that produced it: the prompt content, the retrieval-augmented documents, the policy in force, the identity of the caller, and the version of the model. Most deployments produce lineage that stops at the prompt and never reaches the retrieval source. The lineage that survives a regulatory inquiry has eight elements, lives outside the application, and is signed at the gateway.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Compliance & Regulationdata-lineageauditai-governancecomplianceeu-ai-actrag
AI Data Lineage for Audit: Tracing a Model Decision Back to Its Inputs

When a regulator asks "what produced this AI decision?" the answer is a chain. The decision came from a model response. The model received a prompt. The prompt was assembled from a user query, system instructions, retrieval-augmented documents, and conversation history. Each component carries metadata about its source, its classification, and the policy that governed it. The lineage is the chain. Most deployments produce a chain that ends at the prompt and never reaches the retrieval source. The chain that survives the inquiry has eight elements, lives outside the application process, and is signed at the gateway.

I want to walk through the eight elements, the failure modes in the chains most deployments produce today, and the architecture that produces the chain a regulator accepts.

The eight elements of an auditable lineage chain

A lineage record that holds up under EU AI Act Article 12, NIST AI RMF MEASURE, and ISO 42001 Annex A audit cycles carries the eight elements below.

1. The natural-person identity

Who initiated the request. Resolved from the verified token at the gateway. The field is the one most deployments fail because the application calls the model with a service credential and the natural person is never attached.

2. The agent identity (if any)

If a software agent acted between the user and the model, the agent's workload identity. Records the chain of responsibility for the decision.

3. The prompt content

The full text of the prompt that reached the model. Stored in a separate object, referenced by hash from the lineage record. The hash binding makes the record tamper-evident: changing the stored prompt breaks the chain.

4. The retrieval sources

For RAG deployments, the document identifiers, the chunks, and the source classification of every document the retriever included in the prompt. The retrieval sources are the element most lineage implementations omit. A regulator asking why the model produced a specific clinical recommendation will trace back to the document the retriever pulled; if that document is not in the record, the lineage breaks.

5. The data classification applied

The classification rules that ran against the prompt and against the retrieval sources. Records which tags applied (PII, PHI, financial, regulated) and which policy rules were triggered.

6. The policy version in force

The version identifier of the policy plane at the moment the decision was made. Policies change. A decision made under policy version 2026-06-15 may not satisfy policy version 2026-07-01. The version identifier is the field that anchors the decision to a specific rule set.

7. The model version and the model decision

The model identifier (provider, model name, version) and the response the model produced. The response is stored separately and referenced by hash, same as the prompt.

8. The signature

A cryptographic signature over the eight-element record. The signature is produced by the gateway and verified by the audit pipeline. A record with a broken signature cannot be admitted as evidence.

The failure modes in application-produced lineage

A lineage chain produced by the application that ran the decision has three structural failures.

The first is selective recording. The application logs successful responses and skips errors. The regulator's request often centers on an error case (the model refused; the safety filter triggered; the response was incomplete); the record for that case is missing.

The second is loss on crash. The application records the lineage after the response is delivered. A crash between the response and the lineage write loses the record. The decision happened; the evidence did not.

The third is post-hoc edit. The same application that produced the decision controls the storage where the lineage record lives. The record can be modified by the same code path that produced it. A regulator asking for the original record has no way to verify that the stored record is what was written.

The three failures are the EU AI Act Article 12 concerns that pushed the requirement to record events automatically over the lifetime of the system. Article 19 specifies the retention (at least six months) and the identification of natural persons. Article 99 sets the penalty at 15 million euros or 3% of global annual turnover for failure to meet the obligation.

The architecture that produces the chain a regulator accepts

The chain survives when the lineage record is produced by a layer outside the application process and signed by a key the application does not hold.

[@portabletext/react] Unknown block type "code", specify a component for it in the `components.types` prop

The application no longer owns the lineage. The gateway produces it. The signature is generated by a key the application does not have. The storage is reached by the audit pipeline, not by the application. The three failure modes above resolve because the layer that records the chain is outside the failure domain of the layer that produced the decision.

A worked example: clinical decision support

A clinical AI deployment surfaces this lineage pattern most clearly. The clinician asks a question; the application retrieves three internal guidelines and one external source; the model produces a recommendation; the clinician follows it.

The audit-grade lineage record:

[@portabletext/react] Unknown block type "code", specify a component for it in the `components.types` prop

When a regulator asks why the model recommended dose X for patient Y, the lineage record points to the four retrieval sources, the policy version, the model version, and the prompt and response by hash. The audit pipeline retrieves the stored prompt and response from durable storage and verifies the hashes match. The chain is end-to-end and verifiable.

How this interacts with the EU AI Act and NIST AI RMF

EU AI Act Article 12 requires automatic recording over the lifetime of the system. The eight-element record above is the automatic record. Article 19 requires identification of natural persons, which is element 1. The signature in element 8 produces the immutability the regulator presumes.

The NIST AI RMF MEASURE function asks the organization to measure the running system's behavior. The lineage record is the measurement primitive: each decision contributes one record to the measurement set. The MANAGE function asks for controls; the gateway placement is the control.

ISO 42001 Annex A includes a control family on AI system traceability. The eight-element record is the traceability artifact.

DeepInspect

DeepInspect produces the eight-element lineage record at the gateway. The record carries the natural-person identity from the verified user session, the agent identity from the verified workload token, the prompt hash, the retrieval source identifiers, the data classification, the policy version, the model identifier, the response hash, and the gateway signature. The application produces the prompt and the retrieval; the gateway produces the audit-grade record.

The chain is verifiable end to end. The auditor reads the signed record, retrieves the prompt and response by hash from the durable store, verifies the hashes match, and walks the retrieval sources back to the documents. Book a mapping session at deepinspect.ai to walk through your lineage requirements against the gateway architecture.

Frequently asked questions

What about deployments that do not use RAG?

The element for retrieval sources is empty. The other seven elements still apply. The lineage record is shorter but the chain is the same.

Do we have to store the full prompt and response?

The signed record carries the hashes. The full prompt and response are stored in a separate object store, retention bounded by the regulatory requirement (EU AI Act Article 19 sets six months). The hashes in the signed record bind the stored content; modifications to the stored content break the chain.

How does the natural-person identity resolve when the user is acting under delegated authority?

The identity provider records both the authenticated user and the on-behalf-of relationship. The gateway records both in element 1. A clinical AI request initiated by an attending under a resident's authority records the attending and the resident. The audit chain shows both.

What is the storage cost?

The signed record is small (a few hundred bytes). The prompt and response storage scale with token volume. A deployment producing 1 million decisions per day with 1 KB average prompt and 2 KB average response generates roughly 3 GB of audit storage per day. The cost is bounded by the retention; six-month retention at this rate is roughly 540 GB.

How does this differ from MLflow or other ML lineage tools?

MLflow and similar tools track training-time lineage: dataset versions, model artifacts, training runs. AI data lineage for audit is the inference-time chain: which decision came from which inputs at which moment. The two are complementary. Training lineage answers what the model is; inference lineage answers what the model did.