Does the May 2026 guidance change Annex III itself?

No. The guidelines clarify how the Commission expects providers and deployers to apply the existing Annex III categories to real systems. The categories themselves are unchanged. The change is in the interpretive criteria: the "intended purpose" test now reads against runtime behavior rather than documented intent, the "preparatory task" exemption narrows, and changes to the model or the policy gateway in front of the model can constitute substantial modification under Article 43(4).

When does the high-risk obligations enforcement start?

August 2, 2026 for high-risk systems under Annex III. The penalty regime under Article 99 reaches €15 million or 3% of global annual turnover, whichever is higher, for non-compliance with Chapter III, Section 2 obligations. Member state market surveillance authorities operate the enforcement.

Can a fundamental rights impact assessment substitute for the Article 12 logs?

No. The fundamental rights impact assessment under Article 27 is a pre-deployment artifact that describes the expected impact of the system. The Article 12 logs are runtime evidence of how the system actually behaved. Both are required. The FRIA cannot substitute for the runtime evidence.

How do the new criteria affect open-source AI systems?

The open-source carve-outs under the EU AI Act are narrow. A system that uses an open-source foundation model but deploys it in an Annex III use case is still subject to the deployer obligations. The May guidelines do not expand the open-source exemption. They reinforce that the classification follows the use case, not the licensing of the underlying components.

What about general-purpose AI providers?

The GPAI guidelines published the same week set parallel obligations for general-purpose AI providers under Articles 53 and 55. A GPAI provider whose model is integrated into a high-risk Annex III use case shares responsibility with the deployer under the shared responsibility model the Commission has been building.

What the EU Commission''s May 2026 High-Risk Classification Guidelines Change About Your AI Scope Assessment

Q: What is the deployer's traceability obligation?

Under Article 26, deployers of high-risk systems retain Article 12 logs for at least six months unless a longer period is required by Union or national law. The records have to include the period of use, the input data, the reference database, the output, and the identification of natural persons involved in human oversight. The deployer's logs are independent of the provider's logs.

On May 19, 2026, the European Commission published draft guidelines for GPAI providers alongside companion guidance that sharpens the criteria used to classify an AI system as high-risk under Annex III of the EU AI Act. The guidelines land 75 days before the August 2, 2026 enforcement date for the high-risk obligations under Chapter III, Section 2. Three categories that most enterprises had previously assessed as out-of-scope move into scope under the tightened criteria: HR screening that filters before a human reviewer, clinical decision support that influences treatment selection, and fraud detection that triggers account restrictions.

The guidelines do not change Annex III. They change how the Commission expects providers and deployers to apply Annex III to real systems.

I want to walk through the operative changes, classify three production deployments under the new criteria, and identify the per-decision evidence each system has to produce before the Article 12 traceability obligation goes live on August 2.

What the May 2026 guidelines actually change

The May 19 guidelines sit inside the broader GPAI framework but reach into Annex III through three operative shifts.

The "intended purpose" test now reads against runtime behavior

Annex III lists use cases that trigger high-risk classification. Providers historically argued that a system fell outside Annex III if the documented "intended purpose" was narrower than the listed use case. The May guidelines push back on documentation-only carve-outs. A system whose documented intended purpose is "candidate ranking for HR teams" but whose runtime behavior filters candidates before any human reviewer sees the list operates as Annex III point 4(a) regardless of the documentation. The Commission's clarification is that the classification follows the operational role of the system in the decision flow, not the marketing description of its intent.

The "preparatory task" exemption narrows

Article 6(3) provides an exemption when an AI system performs a narrow procedural task or improves the result of a previously completed human activity. The May guidelines narrow the exemption: an AI system that influences which records reach a human reviewer is no longer "preparatory" if its filtering meaningfully changes the human's decision distribution. A diagnostic support tool that surfaces three differential diagnoses to a clinician falls into Annex III point 5(a) under the tightened test because the surfaced options shape the clinician's subsequent decision.

Substantial modification triggers re-assessment

Article 43(4) requires re-assessment after a substantial modification. The May guidelines treat changes to the training data, the fine-tuning pipeline, or the policy gateway in front of the model as candidates for substantial modification. A fraud detection system whose underlying model is swapped from a proprietary classifier to a vendor LLM is a substantial modification even if the application interface is unchanged.

Three deployments and how they classify under the new criteria

The classification consequences land more clearly in concrete cases.

Case 1: HR screening for a global retailer

A global retailer runs a screening pipeline that scores every inbound applicant. The pipeline ranks candidates and surfaces the top 30% to a recruiter, with the bottom 70% routed to a templated rejection. Documentation describes the system as "decision support for recruiters." Under the May guidelines, the system is high-risk under Annex III point 4(a). The 70% who never reach a recruiter are subject to an AI-mediated decision without the human review that the previous "decision support" framing implied. The retailer has to file the system in the EU database under Article 49, run a fundamental rights impact assessment under Article 27, and maintain Article 12 traceability records for every applicant scored from August 2 forward.

Case 2: Clinical decision support at a hospital network

A hospital network deploys an AI tool that reads incoming patient notes and surfaces three suggested differential diagnoses to the attending physician. The tool also flags potential drug interactions and recommends imaging orders. Documentation calls the tool "informational only" and notes that the physician is the decision-maker. Under the May guidelines, the tool is high-risk under Annex III point 5(a) because the surfaced diagnoses influence the physician's subsequent decision-making. The hospital is a deployer under Article 26 and inherits the obligations that follow: documented oversight procedures, traceability of every recommendation, and the ability to demonstrate that the system's behavior matches the technical documentation provided by the developer.

Case 3: Fraud detection at a payments processor

A payments processor runs a fraud model that scores incoming transactions and either allows them, requires step-up authentication, or blocks them outright. The blocked transactions never reach a human reviewer. Documentation classifies the system as "operational risk management." Under the May guidelines, the system falls under Annex III point 5(b) (essential private services). The processor has to maintain Article 12 logs that include the policy version, the identity attributes used in the decision, and the model's confidence score for every blocked transaction.

The Article 12 traceability obligation that the guidelines reinforce

The May guidelines reinforce that traceability under Article 12 requires automatic recording of events over the lifetime of the system. The records have to include the period of use, the input data, the reference database (if any), the output, and identification of the natural persons involved in human oversight under Article 14. For each of the three deployments above, the obligation translates into a per-decision audit record that survives a market surveillance inspection.

The compliance gap most providers and deployers carry into August 2 is that the existing application logs were never designed to satisfy this requirement. Application logs capture engineering events. They lack identity context, policy version, data classification, and the decision outcome at the granularity Article 12 expects. An application log that records "request processed in 240ms" fails the traceability test.

What changes operationally before August 2

The runway is 75 days. For systems that classify as high-risk under the new criteria, the operational steps compress.

The provider has to revise the technical documentation under Article 11, file the system in the EU database under Article 49, run the conformity assessment under Article 43, and stand up the post-market monitoring pipeline under Article 72. The deployer has to run the fundamental rights impact assessment under Article 27, document the human oversight measures under Article 14, and maintain Article 12 records that the deployer (not the provider) controls.

The traceability records are where deployer and provider obligations collide. The provider's records describe the system. The deployer's records describe the use. Both have to exist, and both have to reconstruct a specific decision on demand from a market surveillance authority.

DeepInspect

This is the per-decision evidence layer DeepInspect produces. DeepInspect sits at the AI request boundary as a stateless proxy between authenticated users or agents and the LLM endpoints, enforces identity-bound policy on every request, and records a per-decision audit record that includes the identity, the policy version, the data classification, the decision outcome, and a tamper-evident signature.

For the three deployments above, the per-decision records are the source data for Article 12 traceability. The retailer can reconstruct the screening decision for any applicant who challenges the result. The hospital can show which prompts and policy versions governed a given clinical recommendation. The payments processor can show that the fraud-block decision was bounded by an explicit policy and that the policy was the one in force at the moment of decision.

If you are running an AI system that the May 2026 guidelines move into the high-risk classification and your traceability strategy depends on application logs, the August 2 enforcement date will surface the gap. Book a demo today.

Beyond the May guidelines

The May 19 guidelines fit into a longer Commission program. The GPAI guidelines published the same week set obligations for general-purpose AI providers, and the upcoming implementing acts under Article 43 and Article 72 will further specify the operational requirements. The pattern is consistent: the regulator expects per-decision evidence, structured by identity and policy version, retained over the system's lifetime, retrievable on demand.

The same evidence layer satisfies adjacent regimes. The NIST AI Risk Management Framework MEASURE function expects structured measurement of system behavior. ISO/IEC 42001 requires runtime evidence to support the AIMS. The architecture that produces Article 12 records produces what each of these regimes asks for.