← Blog

Agentic AI Architecture Patterns: Where the Enforcement Layer Sits

Six agentic AI architecture patterns dominate production deployments today: ReAct, plan-and-execute, multi-agent crews, retrieval-augmented agents, code-executing agents, and tool-using single agents. The security architecture differs across each. The enforcement layer always sits at the HTTP AI request boundary.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Problem-Awareagentic-aiai-securityarchitectureinline-enforcementpolicy-enforcementllm
Agentic AI Architecture Patterns: Where the Enforcement Layer Sits

An agentic deployment is a process that issues prompts to one or more LLM endpoints, applies the responses, and chains the next call based on the result. Underneath that description sit several distinct architectural patterns, each with different operational properties and different consequences for the security architecture. I want to walk through the six patterns I see most often in production deployments, what each does at the AI request layer, and where the enforcement architecture has to sit to satisfy the EU AI Act, Fannie Mae LL-2026-04, and the NIST framework.

The Mandiant M-Trends 2026 report measured median attack handoff at 22 seconds. Across every pattern below, asynchronous controls fail at that tempo. The enforcement decision has to happen inline, before the prompt reaches the model.

ReAct: reasoning and action interleaved

The ReAct pattern interleaves reasoning and action. The agent generates a "thought" that decides what to do next, issues a tool call or model call, observes the result, and feeds the observation back into the next thought. Each iteration is a single LLM call that includes the prior thoughts and observations in the prompt context.

Security properties

The prompt grows with each iteration. By the third or fourth ReAct step, the prompt may include sensitive data retrieved by an earlier tool call, intermediate reasoning that mentions regulated identifiers, and the latest user input. Classification has to apply to the full prompt at each step. A field that was clean in step one may join sensitive content in step three.

The enforcement proxy sees each ReAct iteration as a separate request. Per-decision audit records reconstruct the full ReAct loop only if the application attaches a session identifier and a step counter to each call. Without those, the audit trail shows individual calls but not the reasoning chain.

Plan-and-execute

The plan-and-execute pattern separates planning from execution. A planner LLM produces a list of steps. An executor LLM (or a non-LLM executor) runs each step. The planner may revise the plan based on execution results.

Security properties

The planner's output is the decision boundary. The plan itself constitutes a structured commitment that can be audited before any step is executed. The enforcement layer can apply policy to the planner's output, not just to the model call that generated it. A plan that violates policy (calls a forbidden tool, requests regulated data, exceeds a per-agent budget) can be rejected at the plan stage, before any step runs.

This pattern is the most straightforward to audit because the structured plan acts as a precommitment. The executor's calls produce per-step audit records that reference the plan identifier. Action lineage falls out of the architecture naturally.

Multi-agent crews

The multi-agent crew pattern uses several agents that communicate to solve a goal. A common configuration includes a planner, an executor, a critic, and sometimes a researcher. Each agent issues its own model calls and may invoke other agents as tools.

Security properties

Each agent is a separate identity from the policy decision point's perspective. The planner has one role, the executor has another, the critic has a third. Per-role policies attach naturally to this pattern: the critic can read certain data the executor cannot, the executor can call tools the researcher cannot.

The risk is identity collapse. If all agents in the crew share the same service credential or the same role identifier, the per-role policy benefit disappears. Production deployments routinely default to a shared credential because per-agent identity provisioning is operational work the application team has not yet completed. NIST Pillar 1 (verified identity per agent) is the architectural answer.

Retrieval-augmented agents

The retrieval-augmented pattern combines an agent with a vector database or other retrieval system. The agent issues a query, retrieves relevant documents, and includes them in the prompt to the model.

Security properties

The retrieval step is a new attack surface. A query may retrieve regulated data that the agent then includes in the prompt. The model sees the regulated data regardless of whether the agent was permitted to access it. Authorization at the retrieval boundary is necessary but not sufficient: even if the agent is allowed to query the vector store, the regulated data it retrieves may not be permitted in the prompt that goes to the model.

The enforcement layer has to classify the prompt after retrieval has completed and before the model call is issued. PII, regulated data, source code, and pre-announcement financials are detected at the prompt level. The decision (pass, redact, block) is made on the full retrieval-augmented prompt, not on the agent's original query.

Code-executing agents

Code-executing agents generate code, run it in a sandbox, observe the output, and feed the output back into the next prompt. The pattern is common for data analysis, software engineering, and automation tasks.

Security properties

The generated code is itself an attack vector. The agent may produce code that exfiltrates data to a third party, that calls services with elevated permissions, or that reads files outside the intended scope. Sandboxing the execution environment is necessary. Classifying the code generated by the model is necessary.

The enforcement layer can apply policy to the code-generation prompt and to the code itself before the sandbox executes it. For regulated environments, an additional control is applied to the output the sandbox returns to the model, since that output may contain regulated data that should not enter the next prompt.

Tool-using single agents

The simplest pattern is a single agent with a tool registry. The agent issues a prompt, the model returns a structured tool call, the application invokes the tool, and the result feeds back into the next prompt. No multi-agent communication, no plan-and-execute split.

Security properties

The tool registry is the policy boundary. Per-tool policies determine which tools the agent can call, with which parameters, against which data. The enforcement layer can apply policy at two points: at the prompt before the model selects a tool, and at the tool call before the application invokes it.

This pattern is the most common in early agentic deployments and the most straightforward to retrofit with an enforcement layer because the tool call format is structured and the policy decision points are well-defined.

Where the enforcement layer sits

Across all six patterns, the enforcement layer sits at the same architectural position: the HTTP AI request boundary, between the agent process and the LLM endpoint. The proxy intercepts every call to OpenAI, Anthropic, Bedrock, Azure OpenAI, Vertex, or self-hosted inference. The policy decision point evaluates identity, role, prompt classification, and tool authorization. The audit record is signed and committed before the model response returns to the agent.

What differs by pattern is the granularity of the audit record. ReAct deployments need step-level identifiers. Plan-and-execute deployments need plan-level identifiers. Multi-agent crews need per-agent identifiers. Retrieval-augmented deployments need retrieval-source identifiers. Code-executing deployments need code-hash identifiers. Tool-using single agents need tool-call identifiers.

The proxy reads whatever metadata the application attaches. The application owns the work of attaching the right identifiers for the agent pattern in use.

Compliance angle

EU AI Act Article 12 mandates automatic recording of events over the system lifetime. Article 19 specifies that logs include the period of use, the input data, the reference databases checked, and the identification of natural persons involved. For each agent pattern, the architectural requirement is the same: per-decision records committed independently of the application.

The NIST AI agent identity and authorization framework codifies the same architectural requirements as three pillars. Pillar 1 (verified identity) is the application's job. Pillars 2 (delegated authority) and 3 (action lineage) live at the enforcement layer.

The Fannie Mae LL-2026-04 mandate, effective August 6, 2026, applies the same disclosure obligation to mortgage lenders running AI in origination and servicing. The pattern of the agent does not affect the disclosure obligation; the deployer owns it regardless.

DeepInspect

This is exactly what DeepInspect does. DeepInspect sits inline between the agent process and the LLM APIs the agent calls. For every request, regardless of the agent pattern, the proxy evaluates identity, data classification, model authorization, and policy, and makes a pass, redact, or block decision before the request reaches the model.

The per-decision audit record is signed and committed before the response returns to the application. The proxy is model-agnostic and pattern-agnostic. ReAct, plan-and-execute, multi-agent, retrieval-augmented, code-executing, and tool-using single-agent deployments share the same enforcement and audit infrastructure.

Frequently asked questions

Which agent pattern produces the cleanest audit trail?

Plan-and-execute is the architecturally cleanest pattern for audit because the planner's output is a structured precommitment that can be evaluated before any step runs. The audit record references the plan identifier, and each step's record references the plan plus the step number. ReAct is harder because the reasoning chain is interleaved with action and the audit record has to reconstruct the chain after the fact.

How do we apply per-agent policies in a multi-agent crew?

Per-agent policies require per-agent identity. The application provisions a distinct identity for each agent in the crew (planner, executor, critic, researcher) and attaches the agent identity to every request the agent issues. The enforcement layer evaluates the policy that matches the agent identity. Shared credentials across crew members collapse the policy to a single role.

Does retrieval-augmented generation need separate enforcement at the retrieval layer?

Authorization at the retrieval layer is necessary so the agent only queries data sources it is permitted to access. The enforcement layer at the model boundary is also necessary because retrieved data may include regulated content that should not enter the prompt. Both controls apply. The retrieval-layer control is about access. The model-boundary control is about disclosure.

What about code-executing agents that generate Python or SQL?

Code generation produces an additional artifact (the code) that has to be classified before execution. The enforcement layer applies policy to the code-generation prompt and to the code itself. The sandbox handles isolated execution. Output returned from the sandbox to the next prompt is classified again because it may contain regulated data the agent retrieved during execution.

Can we use the same DeepInspect deployment for all six patterns at once?

Yes. The proxy evaluates each request independently against the identity context, classification, and policy that apply. A single deployment handles ReAct, plan-and-execute, multi-agent, retrieval-augmented, code-executing, and tool-using single-agent traffic concurrently. The audit record schema captures whatever metadata the application attaches, which is what supports action lineage a