Agent-to-Agent Authentication: How One Agent Verifies Another at the API Boundary
Multi-agent systems route work between agents that authenticate to one another. The pattern that worked for service-to-service traffic (mTLS plus a shared service account) under-attributes the action. Agent-to-agent authentication needs the workload identity of the calling agent plus the delegation chain back to the natural person, plus per-call records that capture the chain. This piece walks through the three properties an agent-to-agent auth model must support, the token-exchange pattern that satisfies them, and where the policy decision lands.

Multi-agent systems route work between agents that authenticate to one another. The orchestrator agent invokes the research agent which invokes the database agent which invokes the model. The pattern that worked for service-to-service traffic (mTLS at the network layer plus a shared service account at the API layer) collapses two pieces of information the records need. The action's per-call attribution disappears at the first hop, and the delegation chain back to the natural person stops at whichever agent received the original user request. Multi-agent compliance with NIST Pillar 3 action lineage and EU AI Act Article 19 records requires a stronger primitive.
I want to walk through the three properties an agent-to-agent authentication model needs, the token-exchange pattern that satisfies them, and where the gateway sits in the topology.
Agent-to-agent authentication
Two agents communicating with one another need to answer three questions at each hop. Who is the calling agent. What authority does the calling agent hold for this specific call. Whose user-on-whose-behalf did the chain originate from.
The first question is workload identity. The second is the delegated authority for this call. The third is the delegation chain.
Property 1: Verifiable workload identity per agent
Each agent runs as a distinct workload with its own identity. The identity is short-lived, scoped to the agent's expected behavior, and issued by the corporate identity provider through a workload-identity standard like SPIFFE, AWS IAM Roles for Service Accounts, Azure Workload Identity Federation, or GCP Workload Identity.
Identity is verifiable by the receiving agent without out-of-band coordination. The token is signed; the signature can be verified against the identity provider's public keys.
Property 2: Delegated authority per call
The calling agent attaches a token that encodes the authority delegated for this specific call. The delegated authority is the minimum subset of the calling agent's authority needed for the downstream call, not the calling agent's full authority. The pattern is OAuth-style scoping at per-call granularity.
If the orchestrator's full authority is "read CRM, read database, call model," and the downstream call is a database read, the delegated authority on the downstream call is "read database, specific table, specific predicates," not the orchestrator's full set.
Property 3: Delegation chain back to the natural person
The delegation chain begins at the natural person who initiated the request. The chain travels through each hop in the agent-to-agent call graph. The receiving agent at any hop can verify that the chain originated with a real user and that each delegation in the chain was authorized.
The chain is a sequence of tokens, each signed by the identity provider, each containing the previous token's hash and the new delegation. The pattern is similar to OAuth's token exchange (RFC 8693) extended for multi-step delegation.
Why the service-account pattern fails
The pattern most multi-agent deployments use today is a shared service account that every agent in the deployment authenticates with. The failures are identifiable.
The receiving agent cannot distinguish callers
Every inbound call carries the same service-account identity. The receiving agent cannot tell whether the call originated from agent A, agent B, or a leaked credential. The records attribute every action to the service account.
The natural-person identity is lost at the first hop
The orchestrator might know the user. The next agent in the chain does not. The static service account at the inter-agent boundary erases the user context. Article 19 records that require the natural person cannot be produced past the first hop.
Compromise of one agent compromises every agent
A vulnerability in any agent that uses the shared credential gives the attacker access to every API the credential authorizes. The blast radius is the full set of authorizations the deployment grants. The CVE-2026-39987 marimo incident, where attackers used a compromised credential to drive an LLM agent through the victim's AWS environment, is the cautionary case.
The token-exchange pattern
The working pattern across multi-agent deployments today has three components.
Step 1: The orchestrator receives the user request and gets a delegation token
The user authenticates to the application. The application issues a delegation token that binds the natural-person identity to the agent that will act on the user's behalf. The token is short-lived, scoped to the requested operation, and signed by the identity provider.
Step 2: The orchestrator exchanges its token for a downstream-call token
When the orchestrator needs to invoke another agent, it calls the identity provider's token-exchange endpoint with two arguments: its current token, and the scope it wants on the downstream call. The identity provider issues a new token that retains the natural-person identity, retains the chain back to the original delegation, and adds the new scope for the downstream call.
Step 3: The downstream agent verifies and acts
The downstream agent receives the new token, verifies the signature, extracts the natural-person identity and the chain, and acts under the delegated scope. The downstream agent records the call with the full chain so the lineage is reconstructable.
The pattern continues at each hop. Every receiving agent sees the identity of the calling agent, the natural-person identity at the root, and the scope for the current call. Every agent records the full chain.
Compliance gap
NIST Pillar 3 action lineage requires per-action records that support reconstruction of the action's authorization chain. EU AI Act Article 19 requires identification of natural persons involved. OWASP Top 10 for Agentic Applications includes identity spoofing as one of the ten categories. The three converge on the same architectural answer: workload identity per agent, delegation chain back to the natural person, per-call records.
The disclosure test
When a regulator opens an inquiry into an agent's behavior, the question is which user authorized which action through which agent chain. The shared-service-account pattern produces records that name the service account at every hop. The token-exchange pattern produces records that show the user at the root, the calling agent at each hop, and the delegated scope on each call.
Vendor liability
Multi-agent platforms ship with default authentication patterns that the deployer inherits. The deployer cannot transfer the records obligation. A deployer using a multi-agent framework that does not support per-agent identity and chain attribution leaves the inter-agent records gap in place. The remediation is to put a gateway in the path of agent-to-agent calls that enforces the token-exchange pattern.
Mandate vs. Compliance
The text of the regulations and frameworks reads at one level of abstraction. The implementation operates at the per-call boundary.
The disclosure test
A regulator asks "which user initiated the action that produced this decision, through which agent chain, under what authority at each hop." The records produced by the token-exchange pattern at the gateway answer the question at the per-call granularity the regulator needs.
Compliance gap
Most multi-agent deployments use the shared-service-account pattern because it ships by default in popular agent frameworks. The structural fix is to put a gateway in the path of inter-agent calls that requires the token-exchange pattern and records the chain.
DeepInspect
This is the architecture DeepInspect was built to provide. DeepInspect sits at the AI request boundary as a stateless proxy between any application and any LLM, and between agents and the tools and other agents they call. Per call, the gateway verifies the delegation token, extracts the calling agent's workload identity, extracts the natural-person identity at the root of the delegation chain, and records the full chain.
Every decision produces a per-decision audit record containing the natural-person identity, the calling agent identity, the receiving agent or tool, the delegated scope, the policy version, and the decision outcome. The record is signed and tamper-evident. The chain back to the natural person is recoverable from the records alone.
For NIST Pillar 3, this is action lineage at per-call granularity in multi-agent topologies. For Article 19, the natural person is identified at every hop. For OWASP identity spoofing, the per-agent workload identity makes spoofing detectable at the receiving gateway.
Book a demo today.
Frequently asked questions
- How does this work when one of the agents is a third-party service?
The pattern extends to third-party agents through standardized token-exchange. The calling agent presents its delegation token to the third-party service's token endpoint, which exchanges it for a third-party-issued token bound to the calling agent's identity and the delegated scope. The third party records its end of the call with the delegation chain. The calling agent records its end with the same chain. Cross-organization audit reconciliation is possible because the chain has a verifiable signature from the original identity provider that both ends can verify.
- What happens to performance when each call adds a token-exchange round trip?
The token-exchange endpoint runs at the identity provider. The round-trip latency is typically 5 to 20 ms for a cached IdP. Agents that fire many inter-agent calls can hold a small pool of pre-exchanged tokens for common scopes, refreshed before expiry. The latency impact in steady-state operation is negligible compared to the LLM call latency the agent is also waiting for. The first call in a new session pays the cold-token cost; subsequent calls reuse the pool.
- Can two agents communicate without going through the gateway if they're in the same trust boundary?
The gateway-free path is an architectural choice with trade-offs. Removing the gateway from inter-agent calls inside a tight trust boundary saves the per-call evaluation overhead. The cost is the per-call records that satisfy the NIST and Article 19 obligations. Most regulated deployments find the records valuable enough to keep the gateway in the path even within trust boundaries. The gateway evaluation overhead is small relative to LLM and tool-call latency, and the records have evidentiary value beyond the operational case for the gateway.
- How does this interact with the OWASP excessive agency category?
The token-exchange pattern is one of the architectural answers to excessive agency. The calling agent's authority on a downstream call is the delegated scope from the token, not the calling agent's full authority. The gateway evaluates the policy against the scope and the receiving service. An agent that has been over-granted at the role level still has its per-call authority constrained by the scope. The records show what was authorized per call and what was used. Detection of excessive agency in production becomes a query against the records: where did the policy permit a call broader than the typical scope.
- What's the relationship to mTLS at the network layer?
mTLS at the network layer authenticates the workload identity at the connection level. It is necessary but insufficient for agent-to-agent. mTLS tells the receiving agent which workload identity is on the other end of the connection. It does not carry the delegation chain back to the natural person, the delegated scope for this specific call, or the per-call attribution. The token-exchange pattern runs at the application layer above mTLS. The two work together: mTLS at the connection, signed token at the call, gateway evaluation at the boundary.