Which agent framework is best for security?

Self-hosted frameworks (LangChain, LangGraph, AutoGen, CrewAI) keep the agent loop in the deployer's environment, which gives the enforcement layer visibility into every model call and every intermediate state. The framework choice within the self-hosted set is driven by the agent design, not by security alone. The enforcement architecture sits outside the framework and applies the same way regardless.

Can we attach identity context in LangChain without modifying the chain?

The RunnableConfig object accepts metadata that propagates through callbacks. The application sets the metadata at the entry point (user ID, role, policy context) and the framework carries it through the chain. The model call inherits the metadata, which the proxy can read from request headers or from a structured policy-context field the application adds.

Does the OpenAI Assistants API let us audit intermediate steps?

The Assistants API exposes per-thread message logs and tool-call logs. The application can fetch them after a run completes. The auditor must trust the vendor's log for the intermediate state because the deployer's environment never saw it directly. For regulated environments where Article 12 or Fannie Mae LL-2026-04 disclosure is in scope, self-hosted agent loops with deployer-controlled records are the architecturally cleaner option.

How do multi-agent patterns affect the audit record?

Each agent in a multi-agent pattern issues its own model calls. The audit record needs to distinguish which agent made which call within which conversation. The application attaches the agent identifier and the conversation identifier to each request. The per-decision record then reconstructs the multi-agent flow, which is what NIST Pillar 3 action lineage requires.

What if we mix frameworks in the same deployment?

The enforcement layer sits at the AI request boundary regardless of which framework runs the agent. A LangChain deployment, a CrewAI deployment, and direct model API calls can all flow through the same proxy. Per-route policies apply based on the destination model endpoint. Per-role policies apply based on the identity context the application attaches. The mix is handled at the policy decision point, not at the framework integration layer.

Agentic AI Frameworks: Security Properties Compared

An agent framework is a runtime that decomposes a goal into LLM calls, tool calls, and state updates. The frameworks in production use today (LangChain, LangGraph, AutoGen, CrewAI, and the OpenAI Assistants API) ship different agent loops and different ways of carrying identity, tool context, and intermediate state. I want to walk through what each framework does at the request layer and where the security properties diverge, because that is what determines what an enforcement layer can see.

The control plane has to sit in front of the model endpoint regardless of which framework runs the agent. The framework choice affects what identity context the proxy receives, what classification it can apply, and what evidence the audit record contains.

LangChain

LangChain is the most widely deployed agent framework. Its agent abstractions wrap a model call, a tool registry, and a chain of intermediate steps. The default identity behavior uses application-supplied credentials per LLM provider. Custom callback handlers and a RunnableConfig object let the application attach metadata (user IDs, run IDs, custom tags) to each invocation.

Security properties

Identity context is whatever the application supplies through the RunnableConfig metadata. The framework does not synthesize an identity. Tool calls are issued through the registered tool list, with a tool name and an input. The agent's intermediate scratchpad contains the chain of reasoning and may include partial outputs from prior calls. The HTTP call to the model endpoint carries the prompt, the tool call schema, and any custom headers the application added through callbacks.

The proxy in front of the model endpoint can read the prompt body, the tool schema, and the headers. The proxy needs the application to attach the user identity and role context as headers or as policy-context fields. Without that attachment, the proxy sees the agent's request but not the natural person on whose behalf the agent runs.

LangGraph

LangGraph is the graph-based successor pattern from the LangChain team. The agent is modeled as a state graph with nodes that issue model calls, tool calls, or state transitions. The graph runtime carries a checkpointer state object that persists between calls.

Security properties

The graph definition makes the agent's possible decision paths explicit, which is a security property in itself. The auditor can read the graph and know what the agent might do. Identity context still flows through application-supplied metadata. The checkpointer's state object can carry the user identity and role across the graph, which simplifies attachment.

The proxy sees individual model calls as the graph executes. The graph's structure is not visible to the proxy unless the application attaches a graph identifier and step identifier to each call. With those attached, the per-decision audit record reconstructs the agent's full traversal, which is what NIST Pillar 3 action lineage requires.

AutoGen

AutoGen, originally from Microsoft Research, models the agent as a set of conversational participants. Each participant is itself an LLM agent, and the conversation drives the workflow. Multi-agent patterns are first-class: a planner agent talks to an executor agent, which talks to a critic agent.

Security properties

The multi-agent pattern multiplies the identity problem. Each participant is an agent. Each agent issues its own model calls. The conversation between agents may itself contain regulated data. The proxy sees each call as a separate request. Without explicit identity attachment per participant, the proxy cannot distinguish the planner's calls from the executor's calls or apply different policies to each.

For an audit trail that satisfies action lineage, the application has to attach the participant identity and the conversation identifier to each request. The audit record then reconstructs which agent issued which call within which multi-agent conversation. The graph structure is implicit in the conversation flow.

CrewAI

CrewAI is a role-based agent framework where each agent has a defined role (researcher, writer, reviewer) and a backstory. The "crew" orchestrates the agents toward a shared goal.

Security properties

The role definition is the framework's most security-relevant property. Each agent has an explicit role that maps naturally to the policy decision point's per-role evaluation. The application can attach the agent's role identifier to each request, and the proxy can apply role-specific policies (the reviewer can read certain data; the researcher can call certain tools).

The challenge in production CrewAI deployments is that the role identifiers must map to the deployer's actual policy taxonomy, not to the framework's default labels. The mapping is the application's responsibility and a routine source of misconfiguration.

OpenAI Assistants API

The Assistants API is OpenAI's hosted agent runtime. Threads, runs, and tool calls are managed on OpenAI's infrastructure. The application sends messages to a thread, runs the assistant, and receives the result.

Security properties

Hosted agent runtimes shift the agent loop out of the deployer's environment. The intermediate state, the tool calls, and the reasoning live on the vendor's infrastructure. The deployer sees only the inputs and outputs of the thread. The proxy in front of the OpenAI endpoint sees the API calls to the Assistants endpoints, which carry the message content and the tool definitions but not the intermediate state.

This raises a structural disclosure problem. EU AI Act Article 12 record-keeping requires that the records "enable identification of risk-creating situations." If the agent's intermediate reasoning lives on the vendor's infrastructure and the deployer never sees it, the deployer cannot produce a record of what the agent considered before responding. The Assistants API supports per-thread logging, but that log is the vendor's log, on the vendor's retention schedule.

Framework choice has architectural consequences

The frameworks differ in two security-relevant ways. The location of the agent loop (self-hosted or vendor-hosted) determines what state the deployer can see. The identity propagation mechanism (callbacks, metadata, role objects) determines what identity context reaches the enforcement layer.

Self-hosted frameworks (LangChain, LangGraph, AutoGen, CrewAI) keep the loop in the deployer's environment. The enforcement layer can sit in front of every model call. The audit record captures the intermediate state because the application is the one issuing each request.

Vendor-hosted runtimes (OpenAI Assistants API, similar hosted offerings from other providers) move the loop into the vendor's environment. The enforcement layer sees only the boundary calls. The intermediate state lives at the vendor.

For regulated environments under EU AI Act Article 12, Fannie Mae LL-2026-04, or HIPAA, self-hosted agent loops with an explicit enforcement layer at the model endpoint are the architecturally cleaner pattern. Vendor-hosted runtimes require contractual commitments from the vendor for log access on the deployer's retention schedule, which is harder to enforce in practice than the architectural answer.

Compliance angle

The frameworks themselves are agnostic to regulation. The deployer's choice of framework affects what evidence the deployer can produce when the regulator asks. Article 12 requires automatic recording of events. Article 19 specifies that logs include the period of use, the input data, the reference databases checked, and the identification of natural persons involved.

A self-hosted LangChain or LangGraph deployment with an enforcement proxy in front of the model endpoint satisfies all four. A self-hosted CrewAI deployment with role identifiers attached satisfies all four. A vendor-hosted Assistants API deployment satisfies only the boundary calls, which means the deployer relies on the vendor's logs for everything else.

Fannie Mae LL-2026-04 reaches the same conclusion through different language. Disclosure on demand requires that the lender produce the AI tools in use, the data they touched, and the controls in place. A vendor-hosted agent loop pushes part of that disclosure onto the vendor.

DeepInspect

This is the gap DeepInspect closes. DeepInspect sits at the AI request boundary as an external enforcement layer between the agent framework and the model endpoint. The proxy evaluates each request against the identity context the framework supplies, applies per-route and per-role policies, and produces a per-decision audit record signed at the moment of evaluation.

The proxy is model-agnostic and framework-agnostic. LangChain, LangGraph, AutoGen, CrewAI, and direct model API calls all flow through the same proxy. The audit record captures whatever metadata the framework attaches, which means action lineage scales with the level of detail the framework provides.

LangChain

Security properties

LangGraph

Security properties

AutoGen

Security properties

CrewAI

Security properties

OpenAI Assistants API

Security properties

Framework choice has architectural consequences

Compliance angle

DeepInspect

Frequently asked questions