← Blog

Agentic AI Workflows: Where Identity-Bound Enforcement Fails Today

Agentic AI workflows chain LLM calls across tools, data stores, and other agents. Most deployments authenticate the human at the front door and run the rest of the chain on shared service credentials. The audit trail collapses by the second hop.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Problem-Awareagentic-aiai-securityidentity-and-authorizationai-governancenist-ai-rmfinline-enforcement
Agentic AI Workflows: Where Identity-Bound Enforcement Fails Today

An agentic workflow is an LLM call that triggers other LLM calls. A user prompt enters the system, the orchestrator decides which tools to invoke, and the tools issue follow-on prompts to the same or other models. By the third hop, the request has touched four credentials, three data stores, and one approval workflow. The audit trail at most enterprises identifies the orchestrator's service account. It does not identify the human or agent that started the chain. NIST's AI agent identity and authorization framework calls this out as the structural failure of current deployments.

I want to walk through the architecture of an agentic workflow, where the identity context is lost, and what an enforcement layer at the AI request boundary records that the application logs miss.

How an agentic workflow chains AI calls

A typical agentic workflow starts with a human prompt to a planner model. The planner returns a plan with tool calls. An executor invokes each tool. Tools include retrieval against a vector store, calls to internal APIs, calls to other LLMs for sub-tasks, and writes to external systems. Each tool call may itself prompt a model with context the human never typed. The orchestrator stitches the responses back together and returns a final answer.

LangChain, LangGraph, CrewAI, AutoGen, and Anthropic's own MCP all express this pattern. The architectural shape is the same: a tree of LLM calls rooted in a single user prompt, with each branch executed by code that holds its own credentials.

Identity context dissolves at the first hop

The human authenticates to the orchestrator with their SSO identity. The orchestrator authenticates to each downstream model with a static service credential. The vector store sees the orchestrator's credential. The summarization model sees the orchestrator's credential. The external API tool sees the orchestrator's credential. The natural person behind the chain is invisible after the entry point.

The orchestrator may pass a user identifier in a metadata field, but that field is ignored by every downstream policy decision point. The model provider's rate limiter sees the credential. The DLP layer, if one exists, sees the credential. The audit log writer sees the credential. The chain executes as the orchestrator, not as the human who started it.

Three failure modes the application log will not catch

The application log records the orchestrator's view of the chain. Three classes of decision escape that view.

Sub-agent prompts the human never approved

A retrieval tool builds a prompt that includes the top ten documents from a corpus the human can read at the orchestrator level. If the corpus mixes classifications and the user's authorization at the corpus level is broader than their authorization for the AI use case, the sub-prompt contains data the user should not have placed in a model context. The application log records the tool call and the response. The data-classification decision that should have happened never happened.

Tool call outcomes attributed to the wrong identity

A write tool commits a change to a system of record. The change is attributable to the orchestrator service account. The actual decision was the human's. When a regulator asks who authorized the change, the log answers with a service account that authorizes everything.

Recursive prompt injection across hops

A document retrieved at hop two contains an instruction that hop three executes. The injection bypasses the planner's safety reasoning because the planner never saw the document. The application log shows a successful chain.

What identity-bound enforcement records

An enforcement layer at the AI request boundary records, for every hop in the chain, who started the chain, which role and authorization context applied, what data classification the prompt at that hop carried, what policy version governed the decision, and what outcome the layer permitted. The records are independent of the orchestrator and committed before the response returns to the application. The full lineage is reconstructable from the records alone.

Regulatory framing

EU AI Act Article 12 requires logs sufficient to reconstruct risk situations. Article 19 requires identification of the natural persons involved. Agentic workflows that lose identity at the first hop fail both. NIST's three-pillar framework calls out Pillar 2 (delegated authority) and Pillar 3 (action lineage) as the controls that have to be evaluated and recorded per request, not per session.

DeepInspect

This is the gap DeepInspect closes for agentic workflows. DeepInspect sits inline between the orchestrator and every model API the chain touches. Each request carries the identity context the application supplies at the front door. Each hop is evaluated against per-route, per-role policies and produces a signed, per-decision audit record bound to the originating human or agent identity.

The records reconstruct the full lineage: which human started the chain, which orchestrator handled the planning, which tools were invoked, what data classification applied at each hop, and which policy version governed each decision. The application never has custody of the write path.

Frequently asked questions

Why do shared service credentials break the audit trail?

A shared service credential identifies the application, not the human or agent acting through it. Every downstream policy decision and audit record attributes the request to the service account. When a regulator asks who made a specific decision, the answer is the service account that makes every decision. The fix is identity context that travels with the request as a verifiable claim, evaluated independently at the enforcement layer.

A regulator asks for the audit trail of a single agentic interaction. What does the response look like?

A compliant response identifies the human or agent that initiated the chain, the authorization context they held, every model call the chain produced, the data classification that applied at each hop, the policy version that governed each decision, and the outcome. Application logs at most enterprises produce the orchestrator's view of the chain and identify the service account behind every downstream call. That response fails the traceability test in Article 12 and the action-lineage requirement in NIST Pillar 3.

How does prompt injection propagate across hops in an agentic workflow?

A document retrieved at hop two contains an instruction that hop three executes. The orchestrator's planner never inspected the document because retrieval happened inside a tool. The downstream model treats the injected instruction as part of its operating context and follows it. Identity-bound enforcement at the request boundary inspects every prompt, including those built by tools, against the classification policy that applies to the human who started the chain. Sub-prompts that contain instructions outside the human's authorization scope are blocked at the boundary.

Does MCP solve the identity problem?

Anthropic's Model Context Protocol standardizes how tools expose capability to models. It does not, on its own, attach human identity to tool invocations or produce per-decision audit records bound to that identity. MCP is the transport. The enforcement layer that evaluates identity-bound policy and writes the audit record sits above it. The two compose: MCP carries the tool semantics, the enforcement layer carries the identity, policy, and audit.

Can application logs from LangChain or LangGraph satisfy NIST Pillar 3?

The frameworks emit structured traces of the chain. The traces identify tool calls, prompts, and responses. They do not, by default, include the verified identity of the human behind the request, the data classification that applied to each prompt, or the policy version in effect at the moment of evaluation. Pillar 3 (action lineage) requires all three. The frameworks are useful for debugging the chain. The action lineage record has to come from an external enforcement layer that holds the identity and policy context.