Does Article 15 apply if we use a third-party API like OpenAI or Anthropic?

Yes. Article 15 attaches to the deployer of a high-risk AI system, regardless of where the model runs. If the deployer is a financial institution using OpenAI's API in a credit-scoring workflow that falls under Annex III, the deployer carries the Article 15 obligation. The model provider's accuracy and resilience claims are evidence the deployer can reference, but they do not discharge the obligation. The deployer has to demonstrate accuracy on the deployer's task, resilience under the threats relevant to the deployer's environment, and cybersecurity controls at the deployer's deployment boundary.

What counts as "appropriate" accuracy under Article 15?

The regulation leaves the threshold to the deployer to justify against the intended purpose. The justification has to be documented in the Annex IV technical documentation. In practice, "appropriate" means a figure that a regulator can compare against the harm profile of the use case. A credit-scoring system with a higher false-rejection rate than human underwriters will face questions. A clinical decision support system with sensitivity below the standard of care will face questions. The threshold is contextual, the documented justification is mandatory.

How does the cybersecurity obligation relate to general IT security?

Article 15's cybersecurity obligation covers the AI system as a software artifact, the training and operational data, the model, and the runtime path between the user and the model. General IT security controls on the surrounding infrastructure are necessary but not sufficient. The regulation expects controls that specifically address AI-relevant attacks: prompt injection, model extraction, training-data poisoning, adversarial examples, and unauthorized model access. The deployer's existing SOC 2 program covers part of the surface. The AI-specific surface needs its own controls and its own evidence.

What happens if accuracy degrades after deployment?

Article 9's continuous risk management obligation expects the deployer to detect and respond to accuracy degradation across the lifecycle. The post-market monitoring obligation in Article 72 reinforces it. The deployer has to maintain the evidence stream that supports both the steady-state accuracy claim and the response to a degradation event. Without that evidence, the deployer is unable to show the risk management system functioned.

How does Article 15 differ from the NIST AI RMF?

The NIST AI RMF describes a voluntary framework of practices the deployer should adopt to manage AI risk. Article 15 imposes a binding legal obligation on the deployer of a high-risk AI system in the EU market. The two frameworks describe overlapping controls, but the consequences differ. NIST RMF non-conformance is a procurement and reputational matter. Article 15 non-conformance is a penalty matter under Article 99, with fines up to €15 million or 3% of global annual turnover. A deployer that adopts the NIST RMF will have done much of the work Article 15 requires, but the deployer still has to map the practice to

EU AI Act Article 15: What the Accuracy, Resilience, and Cybersecurity Obligation Requires

On August 2, 2026, the EU AI Act high-risk system requirements take effect. Article 15 sets the technical floor every covered system has to meet across three properties: accuracy, resilience to misuse, and cybersecurity. The text is short. The infrastructure to satisfy it runs across model selection, deployment topology, and runtime enforcement. Penalties reach €15 million or 3% of global annual turnover. Most enterprise AI deployments today have no documented accuracy claim, no adversarial test plan, and no runtime cybersecurity boundary between the application and the model.

I want to walk through what Article 15 actually mandates, where Articles 9 and 12 reinforce it, and the architectural pattern that produces the three properties together. The August deadline leaves little room to retrofit any of them as bolt-on controls.

Mandate

Article 15 reads at one level of abstraction above implementation. The regulation's text reads:

Three architectural requirements collapse out of that text.

Accuracy declared and measured

The accuracy metric is declared in the technical documentation under Annex IV and has to be measurable in production. A free-form qualitative description fails the requirement. Most deployments today declare nothing. The accuracy figure has to bind to a defined task, a defined dataset, and a defined evaluation method.

Resilience under foreseeable misuse

The middle property in the Article 15 text covers system behavior under foreseeable misuse, distribution shift, and operational errors. The runtime architecture has to constrain inputs the system was not trained on. For LLM-based systems, that property includes resistance to prompt injection, jailbreaks, and adversarial role-play. Article 15 expects the deployer to demonstrate the system holds up.

Cybersecurity as a runtime property

Cybersecurity in Article 15 covers the AI system as a software artifact and the data and models it depends on. Attack surfaces include unauthorized model access, training-data poisoning, model extraction, model evasion, and adversarial examples. The deployer has to show the controls that mitigate each surface.

Compliance gap

Most enterprise AI deployments today fail one or more of the three properties in audit-defensible terms.

Accuracy is implicit and unmeasured

Accuracy is treated as a model-provider concern. The deployer takes the model card at face value and ships. When a regulator asks for the accuracy figure on the deployer's specific task and data, the response is an evaluation that does not exist. Article 15 expects a declared figure that the deployer measured on the deployer's task.

Resilience assumes the model's training carried it

System resilience under misuse is assumed to flow from the model provider's RLHF and refusal patterns. That assumption collapses under adversarial pressure. Stanford Trustworthy AI research and the AIUC-1 Consortium briefing (Help Net Security, March 2026) found that refusal behaviors of model-level guardrails degraded significantly under targeted fine-tuning and adversarial input. The deployer's runtime architecture has to add an enforcement layer that holds even when the model's behavior degrades.

Cybersecurity stops at the network boundary

Cybersecurity is treated as TLS termination and network ACL. The prompt content, the response content, the identity of the natural person behind the request, and the data classification of the prompt are all invisible to the network-layer stack. Article 15's cybersecurity obligation expects the deployer to show controls at the AI request boundary, not just at the network layer.

The three properties depend on the same evidence layer

Article 12's audit-log obligation and the Article 15 trio run on the same primitive. Without a per-decision record of what the system did and why, the deployer cannot demonstrate accuracy on real traffic, cannot prove resilience held under attack, and cannot show cybersecurity controls fired when they should have.

Mandate vs. Compliance

Article 15's text reads at one level of abstraction. The infrastructure to satisfy it operates several levels lower. The gap between the two is where most organizations are exposed.

Letter of Article 15

The letter expects appropriate accuracy, appropriate resilience, appropriate cybersecurity, declared in the technical documentation and consistent across the lifecycle. A reasonable reading might conclude that a Hugging Face model card plus a TLS-protected API endpoint and a model provider's SOC 2 attestation would satisfy the requirement. That reading holds until a regulator shows up and starts asking questions.

The questions a regulator will ask

The questions that follow a regulatory inquiry into a high-risk AI system under Article 15 are specific. What accuracy figure did the system achieve on your task, on your data, in the last quarter? When the system was probed with the prompt-injection corpus that your sector regulator uses, what proportion of attempts were blocked at the enforcement layer? Which authorized roles were permitted to access which model endpoints, and which requests were rejected by policy? Can you produce per-decision evidence for any of the above?

The application stack and the model provider's documentation rarely produce that evidence. The architecture has to.

What surviving a review actually requires

An architecture that satisfies Article 15 produces, for the system as a whole, four observable properties. A declared accuracy figure measured on the deployer's task. A resilience profile under a defined adversarial corpus. A runtime enforcement layer that constrains inputs and outputs by identity, role, and policy. A per-decision audit record that allows the deployer to demonstrate each of the above under regulatory inquiry.

The first two are work items the deployer schedules. The second two are architectural and have to exist before the deployment serves production traffic.

Beyond Article 15

Article 15 sits inside a broader compliance stack. The same architectural pattern satisfies the surrounding obligations.

Article 9 requires a continuous risk management system, which depends on the same per-decision evidence layer. Article 12 requires automatic logging across the system lifetime. Article 19 sets a six-month retention floor on the logs. The NIST AI RMF uses different vocabulary for the same control set. Fannie Mae LL-2026-04, effective August 6, 2026, expects the same evidence to be available on demand for mortgage AI.

The vocabulary changes across regimes. The infrastructure that produces the evidence is the same.

DeepInspect

This is the gap DeepInspect closes. DeepInspect sits at the AI request boundary as a stateless proxy between authenticated users or agents and any LLM. Every request is evaluated against per-route, per-role policies using identity context the application supplies. Prompt content is classified before the request reaches the model. Policy violations fail closed.

Every decision produces a signed, per-decision audit record containing identity, role, policy version, data sensitivity, outcome, and timestamp. The record is committed before the application receives the model's response. The same record stream supports the accuracy measurement, the resilience evaluation, and the cybersecurity evidence that Article 15 expects.

For Article 15, the pattern matters because the three properties stop being separate workstreams. Accuracy can be measured on real traffic. Resilience can be tested against the same enforcement layer that runs in production. Cybersecurity controls fire at the same point the audit record is produced.

If you are running AI in a regulated environment and your Article 15 readiness depends on the model provider's documentation, that readiness is incomplete. Book a demo today.