Amazon Bedrock Gateway Patterns: How To Front Bedrock with Inline Enforcement
An Amazon Bedrock gateway sits between calling applications and the Bedrock runtime endpoints, attaches identity context to every InvokeModel and InvokeModelWithResponseStream call, evaluates a per-request policy, and commits a per-decision audit record before the request reaches Anthropic, Mistral, Meta, Cohere, AI21, or Amazon Titan. The gateway pattern complements Bedrock Guardrails by adding identity-bound policy enforcement and a per-decision audit record format that satisfies EU AI Act Article 12 and the Fannie Mae LL-2026-04 lender record requirement. This piece walks through the AWS SigV4 handling, the model-agnostic policy, and the audit record format.

An Amazon Bedrock gateway addresses the inspection layer instead of the Bedrock runtime endpoints. The application keeps calling bedrock-runtime.us-east-1.amazonaws.com and the inspection layer sits at the network or SDK level to intercept the InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream calls. The gateway authenticates the caller's identity, attaches identity context to the request, evaluates a policy bundle, commits a per-decision audit record, and forwards the request to Bedrock on a controlled AWS role. The same architecture covers every foundation model served on Bedrock: Anthropic Claude, Mistral, Meta Llama, Cohere, AI21 Jamba, and Amazon Titan.
I want to walk through how the inspection layer integrates with AWS SigV4 authentication, how the policy bundle handles the multi-model fan-out across Bedrock, how the inspection layer composes with Bedrock Guardrails, and the audit record format that holds up under EU AI Act Article 12 review.
AWS SigV4 and identity attribution
Bedrock authenticates every request with AWS SigV4 over the calling principal (an IAM role, a user, or a federated identity). The Bedrock CloudTrail log records the IAM principal that signed the request. A direct integration that uses a shared application role produces CloudTrail entries that attribute every model call to the application role, with no identity for the natural person whose action triggered the request. The post-authentication gap appears here in the same shape as it does for direct OpenAI and direct Anthropic integrations.
The inspection layer closes the gap by extracting the natural-person identity from the inbound request (the application's JWT or OIDC token) and attaching it to the audit record on the gateway side. The Bedrock upstream call goes out on the inspection layer's IAM role, signed with SigV4 against that role's credentials. The CloudTrail record attributes the call to the inspection layer's role, and the inspection layer's audit record attributes the call to the natural person. The composed record series satisfies the EU AI Act Article 12 and the Fannie Mae LL-2026-04 traceability tests.
The integration pattern works in two shapes. The application can address the inspection layer directly (the inspection layer publishes an HTTP endpoint that the application calls in place of the AWS SDK), in which case the inspection layer signs the upstream Bedrock call. Or the inspection layer can sit as a network-level proxy intercepting the AWS SDK's outbound calls, in which case the inspection layer re-signs the request with its own role. Most production deployments choose the first pattern because it gives a cleaner integration point and the application's existing identity primitive (JWT, OIDC, service account) becomes the inspection-layer credential.
Multi-model fan-out across Bedrock
Bedrock fans out across foundation models with different request and response shapes. Anthropic Claude on Bedrock uses the Anthropic message format. Mistral on Bedrock uses the Mistral chat format. Meta Llama uses the Llama prompt format. Amazon Titan uses Amazon's own format. The Converse API normalized the request shape across most models in 2024, but the underlying differences in capability, in tool support, in image handling, and in streaming behavior still matter for the policy.
The inspection layer abstracts the multi-model fan-out at the policy level. The policy bundle for a route specifies which Bedrock models the route is permitted to call. The inspection layer normalizes the request to a canonical form for policy evaluation (identity, route, prompt content, tool list, model selection) and applies the policy regardless of which Bedrock model the route addresses. The audit record stamps the model and the model version (anthropic.claude-sonnet-4-20250514-v1:0, meta.llama3-1-405b-instruct-v1:0, amazon.titan-text-premier-v1:0) so the auditor can reconstruct which foundation model served the request.
The multi-model abstraction matters for regulated workloads. A financial-services workload that calls multiple Bedrock models for different tasks (Titan for embeddings, Claude for analysis, Mistral for summarization) runs under a single policy bundle and produces a single audit record series. The auditor reading the records does not have to traverse three different model-provider record formats to reconstruct the workload.
Composition with Bedrock Guardrails
Bedrock Guardrails is AWS's model-side guardrails feature. It attaches a guardrail configuration to a Bedrock model invocation and applies content filtering, denied-topic filtering, sensitive-information filtering, and contextual grounding checks inside the Bedrock managed service. The guardrail is configured in the AWS console or via API and runs on the Bedrock side.
The inspection layer composes with Bedrock Guardrails rather than replacing it. The architectural pattern is layered enforcement: the inspection layer applies the identity-bound policy and the per-decision audit record on the gateway side; Bedrock Guardrails applies the model-side filtering on the AWS side. The two layers handle different concerns. The inspection layer answers "who is calling, with what data, under what policy, and how is the decision recorded for audit." Bedrock Guardrails answers "what model output content matches the AWS-managed filter."
Bedrock Guardrails operates inside the AWS-managed inference layer and covers only Bedrock-hosted endpoints. The configuration sits in AWS and applies to AWS-served models. Deployments that span Bedrock and non-AWS LLM endpoints (direct OpenAI, direct Anthropic, Azure OpenAI, Vertex, self-hosted) need an enforcement layer that operates at the request boundary regardless of the model provider. The inspection layer provides that layer and treats Bedrock Guardrails as an additional inspection point on the AWS-served path.
The Converse API pattern
The Bedrock Converse API normalizes the request and response across most foundation models. The inspection layer handles Converse with the same pattern as InvokeModel: address the inspection layer, authenticate, evaluate policy, audit, forward. The Converse API's tool-use loop is the same shape as Anthropic's tool-use loop on the direct API. The inspection layer evaluates the tool list at request time, evaluates proposed tool calls in the response, and evaluates tool results on the follow-up request. The audit record format is identical to the direct-API audit record.
The ConverseStream variant returns a Server-Sent Events stream. The inspection layer handles the stream as a pass-through with response-time policy hooks on the streamed chunks. The chunk format is normalized across models, which makes the response-time evaluation simpler than the direct-API streaming responses across multiple providers.
Audit record format for Bedrock
A per-decision audit record from a Bedrock gateway carries: the request identifier and the correlation identifier for tool-use loops; the natural-person identity, the tenant, the role; the AWS account ID and the region where the Bedrock model was served; the model identifier and version; the route identifier and the policy version; the request and response fingerprints; the data classification outcome (PII detected and redacted, source code detected and blocked); the Bedrock Guardrails outcome if the route uses a guardrail; the decision outcome from the inspection layer (pass, block, modified); the token counts and the timestamp; and the cryptographic integrity signature.
The record format satisfies the EU AI Act Article 12 traceability requirement, the Fannie Mae LL-2026-04 lender record requirement, the NIST AI agent identity and authorization Pillar 3 action lineage requirement, and the FedRAMP and StateRAMP audit requirements that apply to federal and state government workloads running on Bedrock.
DeepInspect
This is the gateway pattern DeepInspect runs in front of Amazon Bedrock and the direct foundation model endpoints alongside it. DeepInspect addresses the Bedrock runtime endpoints across regions and handles the SigV4 signing on the upstream call. The application points its AWS SDK or its Bedrock-runtime client at the inspection layer, the inspection layer authenticates the natural-person identity, evaluates the policy bundle, commits the per-decision audit record, and forwards to Bedrock on a controlled IAM role.
The inspection layer composes with Bedrock Guardrails on the AWS side and extends the same architecture across direct OpenAI, direct Anthropic, Azure OpenAI, Vertex, and self-hosted endpoints. The audit record format is identical across providers, which means an organization running a multi-provider AI footprint has one audit pipeline regardless of which foundation model served any given request. The EU AI Act Article 12, Fannie Mae LL-2026-04, FedRAMP, and DORA Article 6 review consumes the same record series.
If you are running a multi-model Bedrock deployment and the audit pipeline pulls from CloudTrail plus a custom application log, let's talk about a single record format.
Frequently asked questions
- How does the gateway interact with AWS PrivateLink for Bedrock?
The inspection layer can run inside the VPC that holds the PrivateLink endpoint for Bedrock, which keeps all traffic on the AWS network and inside the customer's account. The application calls the inspection layer over PrivateLink, the inspection layer calls Bedrock over the PrivateLink endpoint, and the data path never crosses the public internet. The inspection layer's IAM role is the only role that needs Bedrock invocation permissions on the customer account, which simplifies the IAM policy. The audit record captures that the call traversed PrivateLink for the records that the FedRAMP and the IL5 review will expect.
- Does the gateway support cross-account Bedrock invocations?
Yes, through standard AWS cross-account IAM. The inspection layer's IAM role assumes a Bedrock invocation role in the destination account before calling Bedrock there. The audit record stamps the source account, the destination account, and the assumed role identifier so the auditor can reconstruct the cross-account call path. Production deployments that have a shared central inspection layer serving multiple AWS accounts use the cross-account assume-role pattern to keep the inspection layer in one account while the model invocations land in the per-team or per-product accounts.
- How does the gateway record the Bedrock Guardrails outcome?
When the route configuration includes a Bedrock Guardrails identifier, the inspection layer attaches the guardrail to the upstream Bedrock call. The Bedrock response includes the guardrail outcome (
action,trace,outputs). The inspection layer extracts the outcome from the response and stamps it on the audit record alongside the inspection layer's own decision. An auditor reading the record sees both the gateway-side policy outcome and the AWS-managed guardrail outcome, which lets them reconstruct the full layered enforcement chain.- What if the application uses the AWS SDK directly without going through a gateway endpoint?
The inspection layer can sit as an HTTPS-intercepting forward proxy at the VPC egress. The AWS SDK calls Bedrock through the proxy, which intercepts the SigV4 request, evaluates the policy, re-signs with the inspection layer's role, and forwards to Bedrock. The pattern requires VPC routing to direct AWS-SDK traffic through the proxy. Most production deployments prefer the addressable endpoint pattern over the forward-proxy pattern because the addressable endpoint is explicit in the application code and produces less operational drift over time. The forward-proxy pattern is the fallback for legacy applications that cannot be modified to change the SDK base URL.
- Does the gateway support the Bedrock batch inference and knowledge bases APIs?
Yes. The batch inference submission and the knowledge base query both go through the gateway, which evaluates whether the caller is permitted to submit the batch on the referenced data or query the knowledge base. The audit record captures the batch identifier and the knowledge base identifier, which lets the auditor trace every job back to the request that submitted it. The inspection layer does not need to be on the data path of the long-running batch job because the job is Bedrock-side; the inspection layer recorded the submission decision and the knowledge base query decision.