← Blog

Agentic AI Compliance: Where the Existing Frameworks Apply and Where They Fall Short

Agentic AI compliance is the application of EU AI Act, NIST AI RMF, ISO 42001, and sector regulations to autonomous AI systems that take actions on behalf of users. The frameworks were written before agentic systems were widely deployed. The Article 12 logging obligation applies. The NIST identity and authorization framework applies. The audit and disclosure obligations apply. The gap is that none of them name the action-level evidence requirement explicitly. This piece walks through where existing frameworks apply, where they fall short, and what the per-action evidence layer has to produce.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Problem-Awareagentic-aiai-agentscomplianceeu-ai-actnist-ai-rmfai-governance
Agentic AI Compliance: Where the Existing Frameworks Apply and Where They Fall Short

Agentic AI systems take actions on behalf of users and other agents through API calls, file operations, and downstream service requests. Compliance regimes written for AI as a question-and-answer surface (an employee asks a model a question, the model produces an answer) translate imperfectly to agentic deployment. The EU AI Act Article 12 logging obligation applies. The NIST AI agent identity and authorization framework was specifically written for this surface. ISO 42001 management system requirements apply. The sector regulators (HIPAA, GLBA, DORA, Fannie Mae LL-2026-04) apply.

The frameworks apply. The gap is that none of them name the per-action evidence requirement explicitly. The per-request audit record that a chatbot deployment produces is not the per-action record an agentic deployment needs.

I want to walk through where existing frameworks apply, where they fall short for agentic systems, and what the per-action evidence layer has to produce.

What agentic compliance is being asked to cover

An agentic AI system has three properties that compliance frameworks have to reach. The system authenticates as an agent against downstream services. The system takes actions: writes to systems of record, sends communications, moves money, changes configurations. The system operates with permissions that may exceed the originating user's permissions because the agent was granted access through static service credentials.

The compliance question is whether each action the agent takes is authorized for the specific data, in the specific moment, by the specific user on whose behalf the agent is acting. The chatbot question - was the request inspected for restricted data before it reached the model - is one layer of the agentic question. The deeper layer is whether the action that came out of the model was authorized for execution.

Where EU AI Act applies

Article 12 requires automatic recording of events over the lifetime of the system. For an agentic system, the events extend beyond the inbound user prompt to the outbound action the agent takes. The Article 19 specification of log content - period of use, input data, identity of natural persons involved - covers the user-facing surface. The downstream action surface is implicit but not specified at the action level.

For high-risk classifications under Annex III (credit scoring, employment screening, education access, biometric identification, critical infrastructure, law enforcement, border control), the Article 12 obligation applies regardless of whether the system is conversational or agentic. The August 2, 2026 deadline applies. The Article 99 penalty tier sits at 15 million EUR or 3% of global turnover.

The Annex III categories cover a substantial share of the early agentic deployments I see in production. The hiring agent that summarizes resumes for a recruiter is in scope. The credit-decisioning agent at a lender is in scope. The customer-facing agent at a healthcare provider that handles intake is adjacent.

Where NIST AI RMF applies

The NIST AI agent identity and authorization framework, with the public comment window closed April 2, 2026, is the framework written specifically for this surface. Pillar 1 covers agent identity (the agent has a verifiable identity that can be distinguished from a human user and from other agents). Pillar 2 covers delegated authority (the agent's permissions are scoped to the user's authorization, not the agent's static credentials). Pillar 3 covers action lineage (the action the agent took is recorded with the policy that authorized it and the identity of the user on whose behalf it acted).

Pillar 1 is the application-architecture side: the agent's identity has to be established by the system that runs the agent. Pillars 2 and 3 are the enforcement-layer side: the per-request policy evaluation and the per-action record.

For an agentic system deployed inside a financial institution, the SR 11-7 model risk management obligation extends to the actions the agent takes. The board-level governance, the model inventory, the validation, and the ongoing monitoring all apply. The audit and challenge functions need access to the per-action evidence.

Where ISO 42001 applies

ISO 42001 requires an AI management system covering policies, controls, monitoring, and continual improvement. For agentic deployments, the management system has to extend to the agent inventory (which agents are deployed, what they can do, who owns each one), the agent risk assessment (the blast radius if the agent misbehaves), the control set (what prevents the agent from taking an unauthorized action), and the operating evidence (the per-action records the audit produces against).

The certification audit tests the management system the same way it tests other ISO management systems: policy artifacts, control operation, evidence at the operating level. For agentic deployments, the evidence the auditor expects to see is per-action, not per-conversation.

Where the frameworks fall short

The frameworks apply at the policy and control level. They do not specify the per-action evidence shape that agentic deployments need. The gap shows up in three places.

The chatbot-shaped audit record - user prompted, model responded, log captured the prompt and response - does not cover the action surface. An agent that took ten actions in response to one user prompt produced ten policy-relevant events. The record has to capture each one.

The static-credential agent identity that most current deployments use does not satisfy NIST Pillar 1 in the framework's strong reading. The agent presents the same credentials regardless of which user the agent is acting on behalf of. The audit record cannot distinguish actions taken for User A from actions taken for User B.

The downstream-service authorization model is typically permissive: the agent has full API access through the static credential, and the per-action authorization is delegated to the agent's own decision making. The post-authentication gap I have written about applies at the action layer. The agent is authenticated. The agent is authorized to act. Whether the specific action against the specific data is permitted is decided by the agent's reasoning, not by an external policy point.

What the per-action evidence layer has to produce

For each action an agent takes, the audit record needs to contain the verified identity of the user on whose behalf the agent is acting, the agent's own identity, the policy version in effect at the moment of the action, the data classification of the data the action touches, the action being taken (read, write, send, transfer), the decision outcome from the policy evaluation, and a timestamp.

The action evaluation happens before the action executes against the downstream service. The record is committed before the action's effect is visible to the downstream system. The record is independent of the agent's own logs and of the downstream service's logs.

The pattern is the same as the per-request pattern for chatbots, extended to the action surface. The architectural primitive is per-decision evaluation at the HTTP request boundary, applied both to the inbound user prompt and to the outbound action calls.

DeepInspect

This is the architecture agentic compliance depends on. DeepInspect sits at the AI request boundary as a stateless proxy. For an agentic deployment, the proxy operates on the inbound prompt to the model and on the outbound action calls from the agent to downstream services. Every action call is evaluated against the user's identity, the policy in effect, and the data classification of the action's target. The decision is recorded as a per-action audit record under the deployer's control.

For NIST Pillar 2 (delegated authority), the proxy enforces the user's authorization scope on the agent's actions rather than the agent's static credentials. For NIST Pillar 3 (action lineage), the per-action record produced by the proxy is the action lineage the framework calls for. For EU AI Act Article 12, the recording is automatic and structural to the system. For ISO 42001 audit, the records are the evidence the auditor samples against.

The frameworks were written before agentic systems were the dominant deployment pattern. The architecture that satisfies them is the per-decision evaluation at the request boundary, applied at the action layer.

If your agentic deployment is producing chatbot-shaped logs against an action-shaped compliance obligation, the gap will surface at the next audit. Let's talk today.

Frequently asked questions

Does the EU AI Act explicitly classify agentic systems differently from chatbots?

No. The Act classifies AI systems by use case and risk level, not by interaction modality. A chatbot used in a high-risk Annex III category is high-risk. An agentic system used in the same category is also high-risk. The Article 12 obligation applies the same way. The practical difference is that agentic systems produce more policy-relevant events per user prompt, so the volume of records the system has to produce is higher and the granularity is finer.

How does the NIST framework treat agents acting on behalf of other agents?

The framework allows for delegation chains as long as the identity and authorization context propagate at each step. The pattern in practice is that the originating user identity stays with the request as it moves through the chain, and the authorization at each step is bounded by the user's permissions. Static credentials at any link in the chain break the chain and put the deployment outside the framework's strong reading. The architectural enforcement of this property happens at the request boundary where the action calls are made.

Does the framework apply to internal agents that do not interact with users directly?

Yes. The NIST framework was written to cover agent-to-agent and machine-to-machine AI activity in addition to human-facing chatbots. An internal agent that takes actions based on a scheduled trigger or another agent's prompt still produces actions that need authorization and recording. The originating identity in that case is the scheduled-task identity or the upstream agent's identity, and the same per-action evidence requirements apply.

What does the audit pattern look like for an agentic system at SOC 2 testing?

The auditor samples actions across the audit period rather than user-facing requests. For each sampled action, the auditor expects to see the user identity on whose behalf the action was taken, the policy version in effect, the data classification of the action's target, and the decision outcome. The control test asks whether an unauthorized action would have been blocked at the policy evaluation point. The evidence layer has to support both the sampled record retrieval and the control test.

How does this change the procurement evaluation for agentic AI vendors?

The vendor's architecture either supports per-action evidence under the deployer's control or it does not. The procurement evaluation has to test for it. A vendor that produces logs only on its own infrastructure, in its own format, with its own retention, leaves the deployer dependent on the vendor for the disclosure obligation. A vendor that can route action calls through the deployer's enforcement layer, or that produces records the deployer can ingest and retain, supports the audit framework. The distinction is becoming a standard line item in AI vendor questionnaires.