← Blog

Anthropic API Gateway Setup: An Implementation Walkthrough for Enterprise Claude Deployments

Direct integrations with api.anthropic.com terminate TLS at Anthropic's edge, which leaves the deployer with no inspection point and no audit record. This guide walks through the gateway architecture that sits between the application and Anthropic's API, with attention to Claude-specific patterns: system prompts, tool use, prompt caching, and the message-completion streaming format. Code samples for the Anthropic Python SDK included.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Platform & Architectureanthropicclaudeai-gatewayimplementation-guideenterprise-aiapi-proxy
Anthropic API Gateway Setup: An Implementation Walkthrough for Enterprise Claude Deployments

A direct Anthropic integration sends an authenticated POST to https://api.anthropic.com/v1/messages with an x-api-key header, a model ID, and a messages array. The cleartext prompt is visible only to the application and to Anthropic. For an enterprise deployer carrying EU AI Act Article 12 obligations, NIST AI RMF MANAGE 1.3, or HIPAA, that topology does not produce a compliant audit record and does not let the deployer enforce content policy on the wire.

The gateway sits between the application and api.anthropic.com. This guide walks through the implementation with attention to Claude-specific patterns: the system prompt, tool use, prompt caching, the streaming format.

Request path

[application] --(TLS A)--> [gateway] --(TLS B)--> api.anthropic.com
|
v
[audit + policy + identity]

The application's Anthropic SDK points at the gateway URL:

import anthropic

client = anthropic.Anthropic(
base_url="https://ai-gateway.internal.example.com/v1/anthropic",
api_key="<gateway-issued-token>",
)

The gateway-issued token resolves to a verified caller. The Anthropic vendor key is held by the gateway, scoped to the outbound calls.

Anthropic Messages API surface

The gateway implements the v1 surface that production Claude deployments use:

| Endpoint | Inspection profile | |---|---| | POST /v1/messages | Full classification on system, messages[].content, tool definitions | | POST /v1/messages (stream) | Per-delta inspection on the SSE stream | | POST /v1/messages/count_tokens | Light classification (read-only intent), still produces an audit record | | POST /v1/files | Content scan on uploaded payloads |

The streaming case is where Claude-specific handling matters. Anthropic's SSE format emits message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop events. The gateway buffers content_block_delta events through the inspection window, applies the completion-side policy, and forwards approved chunks. A policy violation midstream truncates the stream and emits a final message_stop with stop_reason: policy_block.

System-prompt handling

Anthropic's API separates the system parameter from the messages array. The gateway inspects both. The system prompt frequently carries the most sensitive content (deployer-specific instructions, retrieved context, account-scoped data). A system prompt that carries PHI without a BAA-routed destination fails closed regardless of how clean the user message looks.

Specifically, the gateway runs the full classification chain on the system parameter before evaluating policy. The audit record names the data class of the system prompt separately from the user messages:

{
...
"data_class": "phi",
"data_class_components": {
"system": "phi",
"messages": "none"
},
...
}

This separation matters in forensic review. An incident where PHI reached a non-BAA route by way of a leaky system prompt has a different remediation than one where a user pasted PHI into a chat.

Tool use

Claude's tool-use loop produces request/response pairs where the model issues tool_use blocks and the application returns tool_result blocks in the next message. Each tool-use block is a potential exfiltration vector: a model can invoke a function with arguments that name PII or PHI.

The gateway inspects the tool-use block on the outbound (model-to-application) side and the tool-result block on the inbound (application-to-model) side. The policy can:

  • Block a tool call whose arguments name a customer the caller is not authorized to query.
  • Redact the tool result before it returns to the model, so the model never sees fields outside the caller's scope.
  • Refuse to forward a tool result whose content exceeds the prompt-side data class.

The audit record names each tool call with its arguments hash, the policy verdict, and the result hash:

{
"tool_invocations": [
{
"tool_name": "lookup_customer_record",
"arguments_hash": "sha256:8e2f0a...",
"decision": "pass",
"result_hash": "sha256:c19f3d..."
}
]
}

Prompt caching

Anthropic's prompt-caching feature lets the deployer mark sections of the prompt as cacheable. The cached prefix is reused across requests at lower cost and lower latency. The gateway handles caching with a per-request inspection pass on the cached and uncached portions, with the cache-control breakpoints respected.

The audit record names the cached portion's hash separately so the forensic reader can confirm whether a request's exposure came from the cached prefix (often the system prompt) or the live extension.

The policy can refuse to cache a prefix that carries a sensitive data class. A PHI-carrying system prompt that the application marks cache_control: ephemeral is permitted only to BAA-covered destinations, and the cache is scoped to those destinations.

Streaming inspection

The SSE stream's content_block_delta events accumulate over the response. The gateway maintains an inspection window of N tokens (default 64) and applies the completion-side classifier across the window. The window slides forward as new deltas arrive. A policy violation on the windowed content produces:

event: message_delta
data: {"delta": {"stop_reason": "policy_block", "stop_sequence": null}, ...}

event: message_stop
data: {"type": "message_stop", "policy_reason_code": "completion.pii.detected"}

The application's streaming handler treats stop_reason: policy_block as a recoverable error and surfaces the policy reason code to the operator.

Identity model for agentic patterns

Anthropic's tool-use loop is the most common path enterprise deployers use for agentic workflows. The gateway's identity model carries both the human principal and the agent identity through the loop:

  • Outbound user-to-Claude request: subject = human principal, agent = none.
  • Outbound Claude-to-tool call: subject = human principal, agent = the agent identity scoped to this conversation.
  • Inbound tool-result to Claude: subject = human principal, agent = the agent identity.

Article 26 of the EU AI Act treats the agent-on-behalf model as a deployer obligation. The gateway records the principal and the agent on each call so the action lineage holds up under a regulator review.

Performance budget

The gateway adds 50 ms P99 across the inspection chain, the same envelope as the OpenAI gateway path. The Anthropic API's TTFT (time to first token) on Claude Sonnet 4.6 runs 350 to 800 ms; the gateway's 50 ms sits inside the existing budget without a user-visible regression.

Failure modes

  • Anthropic 529 (overloaded). The gateway returns the 529 to the application after writing an audit record with outcome: upstream-overloaded.
  • Policy fail-closed. Same behavior as the OpenAI path. The request returns 503, the audit record captures the reason.
  • Streaming cut. The gateway terminates the stream and writes the partial audit record.
  • Token-count drift. The gateway records both the locally counted token count and the Anthropic-reported token count for billing reconciliation.

DeepInspect

DeepInspect's gateway is Anthropic-compatible at the v1 Messages level. Streaming inspection, tool-use policy, prompt-cache awareness, system-prompt data-class separation, and the chained audit format are implemented. Application teams change one line (the base_url) and the Anthropic SDK calls route through the inspection chain.

We have working integrations in production for healthcare deployers running Claude under BAA, finance deployers running Claude for back-office workflows, and federal contractors running Claude under AI RMF MANAGE 1.3 obligations.

Book a technical deep dive at deepinspect.ai.

Frequently asked questions

Does the gateway support Anthropic's Bedrock and Vertex deployments as well as direct API?

Yes. The gateway accepts the Anthropic REST surface, the Bedrock Anthropic surface, and the Vertex Anthropic surface. The deployer chooses the destination per route. A request can pass through the gateway and be forwarded to Anthropic direct, to Bedrock, or to Vertex based on the policy's allowed-destination set.

How does the gateway interact with Anthropic's prompt caching for BAA-covered deployments?

PHI-carrying cached prefixes are scoped to BAA-covered destinations only. The policy evaluator reads the cache-control markers in the request and rejects a cache request whose destination does not carry a BAA. The audit record names whether the request's cache hit or miss contributed to the data class.

What is the gateway's behavior on Claude's vision (image) inputs?

Image inputs in messages[].content blocks are inspected through an image classifier (OCR + content classification). The classifier runs against the decoded image, emits a data-class verdict (PHI in a clinical image, faces, license plates, document content), and feeds the verdict into the policy decision. The image hash is recorded in the audit field.

How does the gateway handle Anthropic's Computer Use API?

Computer Use sessions are agentic by definition: Claude issues tool calls that move a mouse, type text, or take screenshots. The gateway inspects each tool call against the policy bundle. A Computer Use session that tries to navigate to a domain the policy does not allow is blocked at the tool-call layer, with the full action lineage recorded.