← Blog

California AB 2013: What the Training Data Disclosure Means for Your AI Procurement

California AB 2013 took effect January 1, 2026. The law requires developers of generative AI systems made available to Californians to publish high-level documentation about the data used to train each model, including the source categories, the time period of collection, and whether personal information was included. The procurement team now has a public record to read before signing, and the audit team has a citable artifact for vendor due diligence. This walkthrough covers what the disclosure must contain, what it does not contain, and how the per-decision audit log fits.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Compliance & Regulationcalifornia-ab-2013ai-transparencytraining-dataprocurementcompliance

California AB 2013, signed September 28, 2024 and effective January 1, 2026, requires developers of generative AI systems or services made available to Californians to publish documentation about the data used to train each model. The disclosure has to live on the developer's website and cover models released or substantially modified on or after January 1, 2022. Five months in, every major model provider has published the disclosure and the variance across providers is the procurement team's reading list.

I want to walk through what the disclosure must contain, what it deliberately does not contain, and how the per-decision audit log on the deployer's side complements the developer's training-data record.

What the law requires the developer to publish

AB 2013's Section 22757.1 requires the documentation to include:

The sources or owners of the datasets.

A description of how the datasets further the intended purpose of the generative AI system or service.

The number of data points included in the datasets, which can be in general ranges.

A description of the types of data points within the datasets.

Whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain.

Whether the datasets were purchased or licensed by the developer.

Whether the datasets include personal information or aggregate consumer information as defined in the California Consumer Privacy Act.

Whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the generative AI system or service.

The time period during which the data in the datasets were collected.

The dates the datasets were first used during the development of the generative AI system or service.

Whether the generative AI system or service used or continuously uses synthetic data generation in its development.

What the disclosure does not contain

The law does not require the developer to publish the data itself, individual records, or the full training set. The disclosure is at the dataset level, in summary form. The procurement team gets the source categories and the time window; it does not get a row-level inventory.

The law also does not impose a quality standard on the data. A disclosure that says "publicly accessible internet text, scraped during 2020-2023, includes copyrighted material under fair-use rationale" satisfies the disclosure obligation; the procurement team makes its own judgment on what to do with that record.

What the procurement team gets

The disclosure gives the procurement team three things.

The first is the citable source for the vendor-due-diligence file. The disclosure can be referenced in the procurement record as an artifact the team read and consulted. The audit team's later question about whether the procurement team knew the training-data posture has a documented answer.

The second is a comparison reference across vendors. Two model providers in the same category whose disclosures differ materially are not equivalent for sensitive workloads. The disclosure lets the procurement team draw the comparison.

The third is the seed for the deployer's own deeper questions. The disclosure surfaces what the developer is willing to commit to publicly; the contract negotiation extends from there with the customer's specific data-handling requirements.

How the disclosure interacts with the EU AI Act

EU AI Act Article 53 imposes a similar but more detailed obligation on GPAI providers. The Article 53 transparency requirements include a "sufficiently detailed summary" of training-data content. AB 2013 and Article 53 cover the same ground at different depths; a developer that has published the Article 53 summary typically reuses material for the AB 2013 disclosure.

The procurement team that runs the disclosure check for California also reads the Article 53 summary where applicable. The two together give a fuller picture than either alone.

How the disclosure interacts with the deployer-side audit

AB 2013 governs the developer's training-data record. The deployer's per-decision audit log governs the runtime record: which user, which agent, which model, which policy version, which classification, which decision. The two records are independent but linked through the model identifier.

[@portabletext/react] Unknown block type "code", specify a component for it in the `components.types` prop

The training_data_disclosure_url and training_data_disclosure_version are pointers, not the disclosure text. The audit pipeline retains the pointer so the auditor can navigate from a decision back to the disclosure that was current at the time. A developer that updates the disclosure does not invalidate prior records; the version pointer retrieves the document the deployer relied on.

What the disclosure looks like across major providers

A five-month snapshot of major-provider disclosures:

OpenAI's disclosure lists publicly available internet text, third-party licensed datasets, and human-feedback data, with the time periods broken out per model family.

Anthropic's disclosure includes "Claude's training data is derived from a mix of publicly available information on the internet, non-public data from third parties, data provided by our users, and data generated internally." The disclosure breaks out the categories with collection windows.

Google's Gemini disclosure covers web text, code, image, audio, and video data with source categorization and Google's data-cleansing approach.

Meta's Llama disclosure covers the open data sources and the human-feedback training. The disclosure has been updated alongside the Llama 4 release.

Mistral and Cohere have published disclosures aligned to the same structure.

The variance is in the depth, the time windows, and the personal-information statements. The procurement team's read of the document is the audit team's evidence.

What enforcement looks like five months in

AB 2013 is enforced by the California Attorney General. The law's effective date is January 1, 2026; the law's enforcement to date has focused on developers who failed to publish at all. No major model developer has been sanctioned; the early enforcement signal is on the failure-to-disclose case.

The California Privacy Protection Agency has signaled coordination with the AG's office on AI rules generally. The AB 2013 disclosure record is read alongside the CCPA's automated decision-making rules where personal data is involved.

DeepInspect

DeepInspect ties the developer's AB 2013 disclosure to the deployer's per-decision audit log. The audit row carries the model identifier, the URL of the developer's disclosure at the time of the decision, and the version pointer. The audit pipeline retains the pointer alongside the decision; the deployer's record of which model produced which decision under which disclosure is reconstructable.

The gateway runs in-line with sub-50ms p95 enforcement overhead from internal DeepInspect testing. The deployer's own AB 2013 obligation (where the deployer is also a developer or fine-tuner) resolves against the same audit pipeline. Book a mapping session at deepinspect.ai to walk through the developer-deployer split against your current AI provider mix.

Frequently asked questions

Does AB 2013 apply to my company if we just use a model someone else trained?

The disclosure obligation falls on the developer. A pure user of a third-party model has no AB 2013 publication obligation but should reference the developer's disclosure in its own AI procurement and DPIA documentation.

What about fine-tuned models?

A fine-tuned model is treated as a new generative AI system or service if the modification is substantial. The fine-tuner takes on the disclosure obligation for the fine-tuning data, alongside the base model developer's disclosure for the base data.

How does this interact with the EU AI Act Article 53 summary?

Article 53 covers GPAI providers placing models on the EU market. AB 2013 covers developers making systems available to Californians. A model provider operating in both markets typically prepares a single summary that satisfies both, with the AB 2013-specific fields covered alongside the Article 53 elements.

What if the developer's disclosure changes after we sign?

The procurement record references the disclosure version at signing. The deployer's audit log captures the version pointer per decision. A material change to the disclosure is a procurement-review trigger; the audit log lets the deployer reconstruct which version applied to which decisions.

Does the disclosure tell us about model bias or accuracy?

No. AB 2013 covers training data, not model behavior. Bias and accuracy questions go through a separate evaluation, which is what the EU AI Act FRIA and the NYC LL 144 bias audit cover.