What counts as an agent for this discussion?

An agent in this discussion is an LLM-driven runtime that perceives the environment, plans an action, executes the action against tools or services, observes the outcome, and iterates. The definition covers single-agent runtimes (one planner, one action loop), multi-agent runtimes (a planner that orchestrates specialist agents), and human-in-the-loop runtimes where a human approves the planner's actions at defined gates. The architectural property the discussion depends on is the LLM call inside an action loop; the variation across runtime shapes does not change the enforcement surface.

How does the proxy interact with the agent runtime?

The agent runtime calls the LLM provider through the corporate gateway. The corporate gateway is the proxy. The agent runtime configures its LLM client to point at the gateway with the corporate-issued credential. The gateway authenticates the agent runtime, verifies the agent identity claim and the delegating principal claim, evaluates the per-request authorization, and forwards the call to the upstream provider. The agent runtime sees a normal LLM API response and continues the loop. The audit records the gateway writes are visible to the compliance reviewer, not to the agent runtime.

What about tool calls the agent runtime makes outside the LLM?

The tool boundary is the agent runtime's responsibility. The architectural answer is the same pattern: an enforcement point that reads the tool call, evaluates per-action authorization against the policy, and writes a per-decision record. Most agent runtimes today either build the enforcement point themselves or use the runtime's plugin system to add the gate. The audit records from the tool boundary reconcile with the records from the LLM request boundary on the task identifier and the agent identity.

Does the proxy add unacceptable latency to multi-step agent tasks?

The proxy adds enforcement overhead under 50 milliseconds per LLM call in internal DeepInspect testing. An agent task that calls the LLM 50 times incurs 50 times the overhead, which sums to 2.5 seconds across the task. The LLM inference latency for the same task is 25 to 250 seconds (500 ms to 5 s per call multiplied by 50 calls). The proxy overhead is under 10% of the task budget and frequently under 2%. The user-perceived latency is dominated by the model, not by the proxy.

What if the agent runtime is open source and we cannot modify it?

The proxy operates at the LLM API call layer, which sits between the agent runtime and the LLM provider. The runtime does not need to be modified to point at a custom URL; the deployment pattern uses environment variables, corporate egress routing, or the runtime's standard configuration mechanism to direct the LLM client to the gateway. Open-source runtimes (LangChain, LlamaIndex, AutoGen, CrewAI, OpenAI Assistants SDK, Anthropic's tool-use SDK) all expose configuration for the LLM endpoint and the API key, which the deployment pattern uses to route through the corporate gateway.

Agentic AI Enterprise Deployment: The Identity and Audit Surface That Has to Be in Place First

Agentic AI in enterprise environments runs an LLM as the planner inside a loop that perceives, decides, and acts against the corporate environment. The agent authenticates once at the start of a session and operates across the session, often for hours, against many endpoints, executing many tool calls. The identity, authorization, and audit surface that has to be in place before the agent goes to production is broader than the surface a non-agentic LLM deployment needs. NIST closed the comment window on the AI agent identity and authorization framework on April 2, 2026, and the three-pillar model (agent identity, delegated authority, action lineage) is now the operating reference for enterprise deployments. Most agentic deployments today have Pillar 1 partially in place and rely on application logging for Pillars 2 and 3. That is the gap.

I want to walk through what each pillar requires at the implementation level, where the agent loop interacts with the LLM request boundary, and what the 2026 regulatory set expects from the deployment.

The agent loop and where it touches the LLM request boundary

The agent loop runs perceive, plan, act, observe in a continuous cycle. The plan step calls the LLM. The act step calls tools, APIs, databases, file stores, and external services. The observe step reads the outputs and feeds them back into the next iteration.

Each LLM call inside the loop crosses the AI request boundary. The agent's planner prompts the model, the model returns a structured response (tool calls, action recommendations, or text), the agent's runtime acts on the response, and the loop continues. A multi-step task can call the LLM 5, 50, or 500 times depending on the complexity of the task and the depth of the reasoning chain.

Each LLM call is a separate enforcement point. Each tool call is a separate action point. The agent runtime composes the calls into a task. The compliance evidence the deployment produces covers each call and reconciles across the task.

Pillar 1: agent identity

The agent identity question is who is acting. NIST Pillar 1 expects a verified identity context attached to every operation the agent performs.

Three sub-questions land here. The first is the agent's own identity: a verified principal that the corporate IdP issued, with a defined role and a defined scope. The second is the delegating principal: the human user or the service account that initiated the task the agent is acting on behalf of. The third is the chain of delegation: when an agent calls another agent, the chain has to preserve the original principal and the delegation path.

The corporate environment supplies identity through SSO assertions for human-delegated tasks, workload identity certificates for service-delegated tasks, and agent identity claims for the agents themselves. The agent runtime attaches the identity to the LLM call and to each tool call. The inspection point at the LLM request boundary verifies the identity claim and rejects calls that arrive without verified context.

This is the architectural property the rest of the stack depends on. Without Pillar 1, the inspection point has no decision input. Without a decision input, the policy gate has no enforcement.

Pillar 2: delegated authority

The delegated authority question is what is the principal permitted to do. NIST Pillar 2 expects per-request evaluation of whether the verified identity is authorized for the specific operation under the policy in effect.

The agentic case differs from the non-agentic case at three points. The first is temporal scope: the agent operates across a long session, and the authorization granted at the start of the session can be broader than the authorization the policy permits for individual operations later in the session. The second is task scope: the agent's task is decomposed into many sub-operations the planner did not enumerate at the start; the authorization has to evaluate at each sub-operation. The third is data scope: the agent's tool calls touch many data sources, and the authorization has to evaluate against each data classification the operation encounters.

The architectural answer is per-request authorization at the LLM request boundary and per-action authorization at the tool boundary. The LLM request boundary is where the planner's prompt is inspected. The tool boundary is where the action the agent takes is inspected. Both are enforcement points.

Static service credentials fail Pillar 2 by design. A static credential grants permanent access to the full model API and the full tool surface. Pillar 2 expects per-request, per-role, under-this-policy evaluation. The credential has to be dynamic enough to enable that evaluation. The implementation pattern is short-lived credentials issued per session and exchanged at each operation against the policy in effect.

Pillar 3: action lineage

The action lineage question is what happened. NIST Pillar 3 expects a structured record of who authorized this, under which policy, at what moment, with what outcome.

The agentic case multiplies the record count. A non-agentic LLM deployment produces one record per LLM call. An agentic deployment produces records for each LLM call, each tool call, each policy decision at each boundary, and each composite decision the agent runtime makes. A task that runs for an hour can produce hundreds of records.

The records reconcile on the task identifier and the agent identity. The compliance reviewer reads the records and reconstructs the task: what the agent was asked to do, what the planner decided, what tools the agent called, what data it touched, what authorizations applied at each step, and what the final outcome was.

The architectural property the records depend on is audit independence. The agent runtime cannot write the audit records itself, because that would be self-attestation. The records commit to a write path the agent runtime has no access to. The records carry tamper-evident signatures. The compliance reviewer trusts the records by trusting the signature chain.

Where most enterprise agentic deployments are exposed

The Cloud Radix figure of 86% IT leader blindness to AI interactions applies sharply to agentic deployments because the action surface is wider. The agent's tool calls reach into databases, file systems, APIs, and external services that the network DLP and the CASB do not catalog at the action level.

The Netwrix figure of 37% of organizations with any AI governance policy applies upstream. Without a policy that defines what the agent is permitted to do, the per-request authorization at Pillar 2 has nothing to evaluate against.

IBM's $670,000 incremental cost for shadow AI breaches applies to agentic deployments that operate outside the inspection points. The agent that runs in the corporate environment with no identity binding, no per-request authorization, and no action lineage record is the canonical case the figure covers. The 247-day detection window applies the same way: the breach surfaces only when the data shows up outside the boundary.

The architectural answer is the three pillars in place before the agent goes to production. Each pillar produces an enforcement point or a record. The combination produces the operational surface the compliance regimes expect.

What the 2026 regulatory set expects from agentic AI

EU AI Act Article 9 risk management system applies to agentic deployments that fall under the Annex III high-risk classifications (credit scoring, employment screening, education access, biometric identification, critical infrastructure, law enforcement, migration, justice). The August 2, 2026 deadline applies. Article 9 expects the agent's operations to be inside the risk identification, estimation, evaluation, and treatment process, with evidence at each decision.

Article 12 logging applies per decision. For an agentic deployment, that means per LLM call and per tool call inside the agent's task. Article 19 specifies the log content (timestamps, input data, identification of natural persons) and the retention floor (six months). The agent's task can run for hours and produce many decision records; each falls under the Article 19 obligation.

Article 14 human oversight expects the deployer to retain control over the high-risk AI system. The agentic case puts pressure on human oversight because the agent operates autonomously across the task. The architectural answer is the enforcement points at the LLM request boundary and at the tool boundary, which the human operator configures through policy, and the audit records, which the human reviewer reads after the fact.

The Fannie Mae LL-2026-04 mandate, effective August 6, 2026 per the Cooley legal analysis, applies to mortgage lenders using agentic AI in origination or servicing. The disclosure-on-demand obligation expects the lender to produce records of how AI tools handled specific decisions. Texas TRAIGA, effective January 1, 2026, applies to operators and developers of AI systems used in consequential decisions, and the agentic case is one of the cases the law covers.

DeepInspect

This is the gap DeepInspect closes at the LLM request boundary side of the agentic stack. DeepInspect is a stateless proxy that sits between the agent runtime and the LLM provider. The proxy verifies identity context the agent runtime supplies (Pillar 1 evidence), evaluates per-request authorization against the policy in effect (Pillar 2 enforcement at the LLM call), and writes a per-decision audit record (Pillar 3 evidence at the LLM call).

The tool boundary is the agent runtime's responsibility, and the same architectural pattern applies there: an enforcement point that reads the tool call, evaluates per-action authorization, and writes a per-decision record. The audit records from the LLM request boundary and the tool boundary reconcile on the task identifier and the agent identity.

Enforcement overhead on the LLM request boundary runs under 50 milliseconds in internal DeepInspect testing. The agent loop can iterate at the cadence the model returns responses (500 milliseconds to 5 seconds per LLM call), and the proxy overhead is within the inference budget.

The proxy works in front of any HTTP-accessible LLM endpoint, which means the agent runtime can target OpenAI, Anthropic, Bedrock, Azure OpenAI, Vertex, or a self-hosted model and the same audit format applies. The compliance evidence the deployer produces does not depend on the provider choice.

If you are planning an agentic AI deployment and your Pillar 2 and Pillar 3 surface is application logging, book a technical deep dive at deepinspect.ai.