How does the gateway handle SigV4 signing?

The application signs its request to the gateway with its own AWS credentials. The gateway verifies the signature, runs the inspection pipeline, and signs the upstream call to Bedrock with the gateway's IAM credentials. The application can use a least-privilege role with no direct Bedrock permission; only the gateway holds the Bedrock invocation permission.

Does the gateway support cross-account Bedrock calls?

The gateway can assume a role in the Bedrock-host account before signing the upstream call. The pattern fits deployers who centralize Bedrock usage in one AWS account and route calls from multiple application accounts.

Can the gateway run alongside Bedrock Guardrails?

Yes. Guardrails operate at the model inference layer inside Bedrock. The gateway operates on the AWS Bedrock traffic path. The two layers compose. Guardrails apply on AWS-hosted models. The gateway adds the identity context, the per-decision audit record, and the consistent inspection point across AWS-hosted and non-AWS endpoints.

How does the gateway treat Bedrock Knowledge Bases retrievals?

The RetrieveAndGenerate flow retrieves documents from a configured knowledge base before the model sees the augmented prompt. The gateway inspects the user query at the request boundary and the retrieved documents at the augmentation boundary. The per-decision record captures which documents were retrieved and what classification labels apply to them.

How does the gateway handle the Bedrock Agents action sequence?

An agent invocation triggers a sequence of internal steps. The gateway inspects each step: model invocations, action-group calls, knowledge-base retrievals, step transitions. The per-decision record links all steps under one agent invocation ID, so the auditor reads the full agent action lineage as one record set.

Bedrock API Gateway: Inspection at the AWS Bedrock Runtime Boundary

A Bedrock API gateway is the inspection point traffic to the AWS Bedrock runtime passes through before it reaches the model. The gateway attaches identity context the application supplies, runs prompt-level classification, evaluates the policy in effect at the moment of decision, and writes a per-decision audit record. The architecture sits between callers and the AWS Bedrock surface: InvokeModel, InvokeModelWithResponseStream, Converse, ConverseStream, RetrieveAndGenerate, and the agents APIs. The gateway terminates the AWS SigV4-signed call from the application, runs the inspection pipeline, and re-signs the upstream call to Bedrock with the gateway's own AWS credentials.

I want to walk through the inspection points across the Bedrock surfaces, how the gateway interacts with AWS Bedrock Guardrails, and what the deployment trade-offs look like inside AWS networking.

What the gateway intercepts

The gateway intercepts the HTTP request to the Bedrock runtime endpoint, the request body (which carries the prompt content in a model-family-specific shape), the streaming response for the streaming variants, and the agent action calls for the agent runtime.

The request body shape varies by model family. Anthropic Claude on Bedrock uses the Anthropic message schema. Meta Llama on Bedrock uses a Llama-specific prompt envelope. Cohere on Bedrock uses Cohere's schema. The Converse and ConverseStream APIs unify these into a single shape, which simplifies the gateway's normalization step. The gateway extracts the prompt content in a model-agnostic representation before running classification.

The headers carry the AWS SigV4 signature, the model identifier, and any custom identity-bearing headers the application includes. The gateway extracts the corporate identity from the application-supplied headers and attaches it to the per-decision record. The SigV4 signature is verified at the gateway and replaced with a new SigV4 signature using the gateway's IAM credentials for the upstream call.

How Bedrock's API surfaces map to inspection points

The Bedrock runtime exposes four categories of API surface the gateway inspects.

The first is InvokeModel and InvokeModelWithResponseStream. These are the low-level surfaces that accept model-family-specific payloads. Inspection runs against the model-specific body shape.

The second is Converse and ConverseStream. These are the unified surfaces AWS introduced to give a consistent message schema across model families. Inspection runs against the unified body shape, which simplifies policy authoring.

The third is RetrieveAndGenerate (the Knowledge Bases retrieval-augmented generation surface). The request submits a query and the runtime retrieves matching documents from a configured knowledge base before passing the augmented prompt to the model. The gateway inspects the query at the request boundary and the retrieved documents at the augmentation boundary, then the synthesized response on the return path.

The fourth is the Bedrock Agents runtime. An agent execution consists of a sequence of actions: model invocations, tool calls into action groups, knowledge base retrievals, and step transitions. The gateway inspects each step. The per-decision record links the steps under one agent invocation ID, so the audit trail captures the agent's full action lineage.

How identity context attaches at the gateway

The Bedrock runtime authenticates callers via AWS IAM. The IAM principal identifies the role or user assumed at call time. The role often resolves to an application-level identity (e.g. arn:aws:iam::123456789012:role/app-bedrock-caller) rather than the natural person on whose behalf the application is calling.

The gateway sits in front of Bedrock and treats the IAM principal as one input to the decision. The application also supplies a corporate identity context via headers, which the gateway extracts and attaches to the decision record. The combined view shows which IAM principal made the call and which corporate identity authorized it.

In some deployments, the gateway uses an IAM role that has Bedrock permissions while the application's own IAM role has no direct Bedrock permission. The application can only reach Bedrock through the gateway, which guarantees every call passes through the inspection point.

How the gateway interacts with Bedrock Guardrails

AWS Bedrock Guardrails is AWS's own content-filtering layer on top of model invocations. Guardrails apply at the model inference layer and cover AWS-hosted endpoints.

A Bedrock gateway and Bedrock Guardrails sit at different layers. Guardrails enforce content rules during inference inside AWS. The gateway enforces identity-bound policy and produces audit records on the AWS Bedrock traffic path. The two are complementary on AWS-hosted models. The gateway adds the layer Guardrails does not cover by design: identity context, per-decision audit records that are independent of the application, and the same policy applied uniformly to non-AWS endpoints in the same deployment.

For a deployer that runs Bedrock for some workloads and OpenAI or Anthropic for others, the gateway is the single inspection point across all of them. Guardrails apply on the AWS-hosted subset. The structured audit record is uniform across the full set.

What policy looks like at the Bedrock gateway

A representative policy at the Bedrock gateway:

The agent-action rule fires inside the Bedrock Agents flow. The gateway buffers the agent's action call, evaluates the policy, and either permits or blocks the action before the runtime executes it.

Deployment trade-offs inside AWS networking

Three deployment shapes cover the typical Bedrock gateway architecture inside AWS.

VPC endpoint plus gateway in the same VPC. The deployer creates a VPC endpoint for the Bedrock runtime in the deployer's VPC. The gateway runs as ECS Fargate or EKS workloads in the same VPC. Application traffic stays inside AWS. Latency is low (sub-millisecond between the gateway and the Bedrock endpoint).

PrivateLink to a managed gateway service. The deployer uses PrivateLink to connect to a gateway service hosted in the gateway vendor's AWS account. Traffic remains on AWS infrastructure but crosses an account boundary. The pattern fits deployers who prefer the gateway as a managed service.

Cross-region inspection. Bedrock runs in specific AWS regions, and large enterprises serve callers from multiple regions. The gateway can sit per-region with a shared policy artifact, or sit centrally with region-aware routing. The deployer picks based on the data residency requirements and the latency budget per workload.

Performance characteristics

Bedrock's own model invocation latency dominates the request timing (typically hundreds of milliseconds to several seconds). The gateway adds inspection overhead in the sub-50 ms range on internal DeepInspect benchmarks. Streaming responses see per-event overhead on the order of milliseconds, dominated by the classifier cost on each emitted content block.

The audit record write path is the dominant cost at high volume. The deployer typically writes the records to an append-only store with synchronous acknowledgement before the response returns to the caller, which preserves the integrity property the audit relies on.

DeepInspect

This is the Bedrock API gateway DeepInspect was built to provide. DeepInspect sits inline between authenticated users or agents and any Bedrock runtime endpoint. Every InvokeModel, Converse, RetrieveAndGenerate, and agent action passes through one inspection point. Identity attaches at the request layer. Prompt and response classification fire at the boundary. Policy decides per request. The signed per-decision record commits before the response returns to the caller.

The same inspection point sits in front of OpenAI, Anthropic, Azure OpenAI, Vertex, and self-hosted endpoints with the same policy and audit semantics. Bedrock is one entry point among several. The architecture is model-agnostic.

If you are evaluating a Bedrock API gateway for a regulated AWS deployment, book a demo today.