← Blog

AI Incident Response Playbook: Detection, Containment, and Forensics for AI-Layer Compromises

Most enterprise incident response playbooks assume the compromise sits at the network, endpoint, or application layer. AI-layer incidents (prompt injection in production, agent tool-call escalation, model-extraction attempts, credential theft via LLM-operated post-exploitation, data exfiltration through prompts) require a different detection signal, a different containment action, and a different forensic timeline. This playbook walks through the AI-layer incident classes the SOC should recognize, the detection signals each class produces, the containment actions that work at the AI request boundary, the forensic evidence the post-mortem needs, and the integration points with the rest of the security operations stack.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Problem-Awareincident-responsesocai-securityprompt-injectionforensicscontainment
AI Incident Response Playbook: Detection, Containment, and Forensics for AI-Layer Compromises

The classical incident response playbook assumes the compromise sits at the network, the endpoint, or the application. AI-layer incidents (prompt injection in production, agent tool-call escalation, model-extraction attempts, LLM-operated post-exploitation, prompt-based data exfiltration) sit at a different layer and produce a different signal. The Microsoft Prompts Become Shells disclosure from May 7, 2026 and the Marimo CVE-2026-39987 incident from May 10 both make the point: the SOC has to pick up an AI-layer playbook because the classical playbook does not detect or contain AI-layer incidents in time.

I want to walk through the AI-layer incident classes the SOC should recognize, the detection signals each class produces, the containment actions that work at the request boundary, the forensic evidence the post-mortem needs, and the integration points with the rest of the security operations stack.

The AI-layer incident classes

The classes recur across enterprise deployments. A playbook that names them and assigns the response is more useful than a generic "AI incident" category.

Prompt injection in production

A user-facing AI feature receives a prompt that overrides the system instructions. The model executes the overridden instructions, which often involve exfiltrating context data, revealing system prompts, or steering an agent toward an attacker-controlled tool. The detection signal is anomalous tool selection, anomalous response content, or anomalous output length relative to the input pattern.

Indirect prompt injection

The agent reads attacker-controlled content (a document, an email, a webpage) and the embedded instructions reshape the agent's behavior. The user did not type the instruction; the agent encountered it in its retrieval pipeline. The detection signal is a divergence between the user's stated request and the agent's actions, plus the presence of attacker-controllable content in the agent's context.

Agent tool-call escalation

The agent selects a tool with attacker-controlled arguments. The tool runs. The host executes attacker-controlled code. The detection signal is anomalous tool-invocation patterns: unexpected tool selection, unusual argument shapes, or invocations of high-privilege tools (shell, code interpreter, file system) outside the normal usage pattern.

Model extraction and reconstruction

An attacker queries the model repeatedly with carefully crafted inputs to extract its training data, system prompts, or proprietary fine-tuning. The detection signal is high-volume request patterns with unusual prompt structures, often from a single identity or a coordinated set of identities.

LLM-driven post-exploitation

The attacker compromises a system through a non-AI vector (a CVE in a service, a stolen credential, a supply-chain attack), then uses an LLM agent inside the victim environment to operate. The Marimo CVE-2026-39987 incident is the canonical example: pre-auth RCE in a service, harvested AWS keys, then LLM-driven Secrets Manager calls. The detection signal is unusual LLM activity from inside the environment, especially LLM calls that drive cloud provider APIs.

Prompt-based data exfiltration

A user with legitimate access to sensitive data uses an AI feature to exfiltrate the data. The pattern is "summarize the customer list," "translate this contract into [language the attacker controls]," or "encode this PHI in base64 and return it." The detection signal is the pairing of sensitive data classification in the prompt with an unusual output channel.

Detection signals the AI request boundary produces

The detection signals that work for AI-layer incidents differ from network or endpoint signals. They emerge at the AI request boundary.

Identity-anchored anomaly detection

Each request carries identity context (the user or agent making the request) and policy context (the policy in force for the route). An anomaly detector at the request boundary that baselines per-identity request patterns surfaces requests that fall outside the baseline. The signal is more precise than a network-layer or endpoint-layer signal because the identity is known at the request layer.

Policy-decision tracking

The pass-or-block decisions the policy gateway makes are a high-value signal. A sudden spike in policy blocks for a given identity, route, or data classification indicates either an attack in progress or a policy misconfiguration. Both warrant investigation.

Content classification signals

The classifiers that evaluate prompt content (PII detection, PHI detection, prompt-injection detection, credential pattern detection) produce signals the SOC can subscribe to. A prompt that contains a credential pattern and is being sent to an external model from a non-secrets-management identity is a high-priority signal.

Cross-tier correlation

The AI request boundary signal combined with the endpoint or network signal produces a stronger correlation than either alone. An anomalous LLM call from a host that also shows credential-theft indicators in the endpoint telemetry is a different incident than either signal alone.

Containment actions at the request boundary

When an incident is detected, containment has to happen at the AI request boundary, because the AI request is the active attack surface.

Identity-level policy escalation

Revoke or downgrade the affected identity's AI access at the gateway. The identity may retain access to other systems while AI access is held until the investigation completes. The granularity matters: a per-route, per-role policy escalation is less disruptive than a wholesale identity disable.

Route-level fail-closed posture

For an AI route under active attack, escalate the route to fail-closed at the gateway. All requests on the route are denied until the incident is resolved. The fail-closed posture is deterministic; it does not depend on the application's behavior or the model's reasoning.

Tool-binding suspension

For an agent under tool-call escalation, suspend the affected tool binding at the gateway. The agent retains its other tools and continues to operate in a degraded mode while the suspended tool is reviewed. The suspension is a configuration change at the gateway, not a code change at the framework.

Data classification escalation

For a route showing sensitive-data exfiltration patterns, escalate the data classification policy. Prompts that previously passed because the classification was at one tier are now blocked because the classification threshold for the route is tightened.

Forensic evidence the post-mortem needs

The post-mortem requires evidence at a granularity the classical incident response often does not capture.

Per-decision audit records at the request boundary

For each request involved in the incident, the audit record needs to show the identity, the policy version in force, the data classification, the decision outcome, and the timestamp. Without per-decision records, the post-mortem operates on aggregated counts, which obscure the specific attack pattern.

Policy-state history

The post-mortem needs the policy state at the moment of each decision, not just the current policy state. A policy that was changed during the incident produces different decisions before and after the change. The post-mortem has to reconstruct the policy timeline.

Content classification history

The classifier outputs at the moment of each decision matter for the post-mortem. A prompt that the classifier flagged but the policy passed is a different signal than a prompt that the classifier missed. Both indicate different remediation paths.

Identity context

The identity context for each request (user, role, session, source IP, authentication method) anchors the forensic timeline. Without identity context, the post-mortem cannot distinguish coordinated activity across identities from concentrated activity from a single identity.

Integration with the rest of the security operations stack

The AI-layer playbook does not replace the classical playbook. It integrates with it.

SIEM ingestion of AI request boundary events

The per-decision audit records and the detection signals from the AI request boundary feed the SIEM in the same way the network and endpoint events do. The SIEM correlation rules then operate across AI and non-AI signals.

SOAR playbook triggers

The detection signals at the AI request boundary trigger the SOAR playbooks the SOC already runs. An identity-anomaly signal triggers the user-investigation playbook. A policy-block spike triggers the policy-review playbook. A tool-binding anomaly triggers the agent-investigation playbook.

EDR cross-correlation

For the LLM-driven post-exploitation class (Marimo CVE-2026-39987 pattern), the AI request boundary signal correlates with EDR signals on the host running the agent. The correlation tightens the investigation timeline. The Marimo incident reported earlier surfaced because the security team correlated unusual LLM activity with the host telemetry, not because either signal alone triggered an alarm.

Compliance evidence

The per-decision audit records also serve the compliance evidence requirements under EU AI Act Article 12, HIPAA, Fannie Mae LL-2026-04, and adjacent regimes. The incident post-mortem and the regulatory evidence draw from the same source data.

DeepInspect

This is the AI request boundary the playbook above operates against. DeepInspect sits as a stateless proxy between authenticated users or agents and the LLM endpoints, enforces identity-bound policy on every request, and writes per-decision audit records with policy version, identity context, data classification, and decision outcome.

For the SOC's incident response, the gateway produces the detection signals (identity anomaly, policy blocks, content classifier signals), supports the containment actions (per-identity, per-route, per-tool policy escalation), and retains the forensic evidence (per-decision audit records with full context). The records integrate with the SIEM and SOAR pipelines the SOC already runs.

If you are standing up an AI incident response playbook and the AI request boundary in your environment does not produce per-decision records, the next incident will run without the evidence the post-mortem needs. Book a demo today.

Beyond AI-specific response

The AI-layer playbook fits inside the broader incident response framework. NIST SP 800-61 sets the four-phase incident response model: preparation, detection and analysis, containment-eradication-recovery, and post-incident activity. The AI-layer playbook is a specialization of each phase for AI-layer incidents. The integration with the SIEM, SOAR, EDR, and compliance evidence pipelines means the playbook does not stand outside the classical security operations practice.

Frequently asked questions

What is the difference between an AI-layer incident and a classical IT incident?

The compromise sits at the AI request layer rather than the network, endpoint, or application layer. The detection signals emerge from prompt content, policy decisions, and identity-anchored anomalies. The containment actions operate at the request boundary. The forensic evidence is per-decision audit records. The classical playbook addresses the host, the network, and the application; the AI-layer playbook addresses the request boundary.

How does the playbook handle indirect prompt injection?

Indirect prompt injection is detected by divergence between the user's stated request and the agent's actions, plus the presence of attacker-controllable content in the agent's context. Containment escalates the data classification on the route, which blocks subsequent indirect injection vectors. The forensic evidence includes the retrieved content the agent processed, so the post-mortem can identify the source of the embedded instructions.

What about model-extraction attempts?

Model extraction shows up as high-volume requests with unusual prompt structures from a single identity or a coordinated set. The gateway applies rate limits per identity and per route. The audit records capture the request structures, which feed the model-extraction investigation. The containment action is identity-level access revocation while the investigation runs.

How does the playbook integrate with the SIEM?

The per-decision audit records and the detection signals from the gateway feed the SIEM as a structured event stream. The SIEM correlation rules operate across AI events and the classical network, endpoint, and application events. The integration is the same pattern the SOC uses for other infrastructure event streams.

What about retention of the AI-layer audit records?

Retention is set by the compliance regime that applies to the deployment. EU AI Act Article 26 requires at least six months for the deployer's logs unless Union or national law requires longer. HIPAA requires six years for the audit records of PHI access. Financial services regimes typically require seven years. The retention period is a configuration parameter at the audit record store, not a property of the gateway itself.

Can the playbook handle agent-on-agent attacks in multi-agent deployments?

Yes. The detection signal in a multi-agent deployment is the cross-agent communication pattern: which agent invoked which other agent, with which arguments, under which policy. The gateway records the cross-agent traffic, and the playbook's containment actions extend to the multi-agent context. A misbehaving agent can be isolated from the other agents at the gateway, while the other agents continue to oper