AI Agent Secrets Handling: Why the Agent Process Should Never See an API Key
An AI agent that holds API keys in process memory is an exfiltration target. The architecture that survives keeps the keys at the gateway and exposes only short-lived, identity-bound tokens to the agent. This walkthrough covers the three patterns enterprises use today, the failure modes that surface under prompt injection or pre-auth RCE, and the broker pattern that closes the gap.

When an AI agent holds an OpenAI API key, an AWS access key, or a third-party tool credential in process memory, those secrets become the highest-value target inside the process. Prompt injection that produces remote code execution turns the agent into a credential harvester. CVE-2026-39987 showed the pattern in production: a pre-auth RCE in Marimo led to AWS key harvest and to LLM-driven Secrets Manager calls inside the victim environment. The fix is architectural. The agent process should never see the API key. The keys live at the gateway; the agent gets a short-lived, identity-bound token instead.
I want to walk through the three patterns enterprises use today, the failure modes each one carries, and the broker pattern that closes the gap.
Pattern 1: keys in environment variables
The most common pattern: the agent process reads its API keys from environment variables at startup and holds them in memory for the lifetime of the process. The variables are populated by the deployment system (Kubernetes Secret, ECS task definition, Vault sidecar).
The failure modes are well known. A core dump leaks the keys. A debug endpoint that prints environment leaks the keys. A prompt-injection payload that gets the agent to read its own environment leaks the keys. The Marimo CVE showed the last failure mode in production.
This pattern is acceptable only when the keys it holds are scoped tightly enough that their theft causes bounded damage and when the agent process has no exposure to untrusted input. Neither condition holds for a production AI agent that ingests user prompts.
Pattern 2: keys fetched from a secret manager per call
A more defensive pattern fetches the key from a secret manager at call time and discards it after use. The benefit is that the key spends less time in memory. The cost is the per-call latency to the secret manager and the explosion of secret-manager IAM permissions the agent identity now holds.
The deeper failure mode is that the agent's identity now has standing permission to read every secret it might need. An attacker who compromises the agent has the same permission. The agent's identity is the keys to the kingdom; the kingdom is the secret manager.
Pattern 3: workload identity for AWS or GCP, keys for everything else
When the agent runs on AWS or GCP, the platform supports workload identity (IRSA on EKS, GKE workload identity, EC2 instance roles). The agent's calls to AWS services use the workload identity instead of a static key. This pattern is the correct one for the AWS surface.
The pattern usually does not extend to third-party APIs. The agent still holds an OpenAI key, an Anthropic key, and a third-party tool key. The AWS surface is covered; the third-party surface is back to Pattern 1 or Pattern 2.
The broker pattern
The pattern that survives is to push the secret out of the agent process entirely. The keys live at the gateway. The agent calls the gateway with its verified workload identity. The gateway authenticates to the upstream model or tool with the key it holds. The agent never sees the key.
The properties that fall out of the architecture matter at the threat model level.
A compromised agent process cannot exfiltrate provider keys because the keys are not in the process. The attacker who gets RCE on the agent has the agent's workload identity, which is bounded by the policy the gateway enforces. The blast radius is the set of model calls and tool calls the agent identity is allowed to make, rather than the union of every credential the agent ever loaded.
A rotated provider key changes in one place. The gateway picks up the new key from its vault and continues to forward; the agent process never restarts.
A per-decision audit record now carries the agent identity (the verified caller) and the provider being authenticated to (the destination), without ever needing the agent to know which key was used. The auditor sees the agent identity in the record, not the static key.
The token the agent does see
The agent presents a workload identity token to the gateway on each call. The token can be one of three things depending on the deployment.
All three produce the same property: the agent holds a credential that is identity-bound, short-lived, and useless outside the gateway. An attacker who steals the token has at most the TTL window of usefulness and can only reach the gateway, where the policy applies.
How this maps to existing patterns
The broker pattern is the AI-specific application of the secret zero problem that secret-management products have addressed for traditional services. Vault, AWS Secrets Manager, GCP Secret Manager, and Doppler all solve secret zero for human users and for traditional workloads. The AI gateway extends the pattern to AI agents by adding the policy decision and the per-decision audit record on the request path.
The broker pattern also aligns with the NIST AI RMF MANAGE function on running systems. The forthcoming COSAiS overlays for single-agent and multi-agent systems specify identity-bound credentials at the request layer as a measured control. The broker pattern is the natural implementation of that control.
DeepInspect
DeepInspect implements the broker pattern as the default. The gateway holds the provider keys in its key vault, authenticates the agent's verified workload identity on each call, applies the policy, calls the upstream provider, and writes the per-decision audit record with the agent identity and the natural-person identity attached.
The agent process holds no provider API keys. A compromise of the agent process exposes the workload identity; the workload identity is bounded by what the gateway policy permits, not by the union of credentials the process has accumulated. Key rotation changes the value in the gateway vault and propagates to every agent identity that uses it. Take the AI readiness self-assessment to see where your current secrets posture sits against the broker pattern.
Frequently asked questions
- What about agents that need to call dozens of third-party tools, each with its own key?
The broker pattern scales by adding the key to the gateway vault, by tagging it with the tool destination, and by mapping the agent identity's egress allowlist to the tool. The agent identity does not change; the gateway selects the right key based on the destination. The platform team manages the keys in one place rather than per-agent.
- Does the gateway become a single point of failure for keys?
The gateway is the centralized credential surface, which is the same posture as a centralized secret manager. The countermeasures are the same: high-availability deployment, key-vault encryption at rest and in transit, audit on key access, and disaster recovery on the vault. The single-point posture is the property that makes key rotation tractable; the alternative is keys scattered across every agent process.
- How does this interact with cloud-native workload identity?
For AWS, GCP, and Azure calls, the agent identity continues to use the cloud platform's workload identity directly (IRSA, GKE WI, AAD MI). The broker pattern applies to provider APIs that are not covered by the cloud platform's workload identity: OpenAI, Anthropic, third-party tools, internal SaaS. The two patterns coexist; the broker pattern fills the gap the cloud platform does not cover.
- What is the latency impact?
The gateway already terminates the connection to evaluate policy. Authenticating to the upstream with the held key adds no additional network hops; the key is in process memory at the gateway and is applied during the existing forward request. The end-to-end latency is the same as Pattern 1 or Pattern 2 plus the policy-decision time.
- How does the audit record change?
The record carries the agent identity (verified at the gateway), the natural-person identity (verified at the user session), the destination, the data classification, the policy version, and the decision. The key used to authenticate to the upstream is recorded by identifier, not by value. An auditor can correlate the per-decision record back to the key in the vault without ever seeing the key material.