AI Gateway Multi-Cloud: The Single Control Plane Across OpenAI, Anthropic, Bedrock, and Vertex
Enterprise AI traffic now spans OpenAI direct, Anthropic direct, AWS Bedrock, Azure OpenAI, and Google Vertex in the same week, often in the same application. Each provider has its own auth, its own request shape, its own error semantics, and its own audit emission. A multi-cloud AI gateway is the single control plane that normalizes identity, classification, policy, and audit across all of them. This walkthrough covers the normalization layer, the per-provider adapters, and the audit record that survives the regulator regardless of which provider the request hit.

A platform team that runs OpenAI for the chatbot, Bedrock Claude for the support summarizer, and Vertex Gemini for the document analyzer has three control planes. Each provider has its own SDK, its own auth, its own per-route rate limits, and its own audit emission. The combinatorics of policy across providers are what break the single-vendor posture.
I want to walk through the normalization layer that produces one control plane across providers, the per-provider adapters that translate it, and the audit record that survives the regulator regardless of which provider the request hit.
What "single control plane" means in this context
A single control plane means one identity binding, one classification scheme, one policy version, and one audit pipeline, regardless of which provider the request reaches. The platform team writes a rule once; the rule applies whether the underlying call goes to api.openai.com, bedrock-runtime.us-east-1.amazonaws.com, or aiplatform.googleapis.com.
The provider stays a deployment-time concern. The policy plane stays a governance-time concern. The audit record is the same shape across providers and the regulator's question about a specific decision resolves against the same fields.
The normalized request shape
The gateway accepts a normalized request shape from the application. The normalized shape carries the identity envelope, the model selection (provider-agnostic name), the message list, the tool catalog, and the routing hint. The gateway's adapter layer translates the normalized shape into the provider-specific format on egress.
The application interacts only with the normalized shape. Provider migrations become routing changes in the gateway, not application code changes.
Per-provider adapters
Each provider needs an adapter that maps the normalized shape into the wire format the provider expects.
The OpenAI adapter translates messages into chat/completions, tools into tools[], and identity into the user field. The adapter handles streaming, function-calling, and the tool_choice parameter.
The Anthropic adapter translates messages into messages.create, system into the top-level system field, and tools into Anthropic's tools[]. The adapter handles streaming and tool_use.
The Bedrock adapter wraps either the Anthropic or the Meta family request inside Bedrock's InvokeModel envelope. The adapter handles AWS SigV4 signing and the per-model variations inside the Bedrock surface.
The Vertex adapter translates into Gemini's generateContent, with Google's Tool and FunctionDeclaration shapes. The adapter handles Vertex's per-region endpoints.
The Azure OpenAI adapter looks similar to the OpenAI adapter but uses the deployment name in place of the model name and uses Microsoft Entra tokens instead of OpenAI bearer tokens.
Each adapter is responsible for its provider's quirks. The policy plane stays uniform.
Provider-aware routing
Routing combines the application's hint, the per-route policy, and the live state of each provider. A simple routing rule:
The residency constraint is what the EU AI Act, GDPR, and DORA expect to see. A request from an EU-authenticated identity follows the EU residency path; the gateway refuses a US-only failover for that identity even when the EU path is degraded.
Normalized audit across providers
The audit row across providers carries the same fields with provider-specific provenance.
provider_used and provider_request_id are the trail back into the provider's own logs when needed. The auditor's question about which provider answered the request resolves against the same field across all providers.
Cost and rate budgets across providers
A per-provider rate budget catches the case where a failover storm against one provider would burn through the budget on a peer. The gateway tracks tokens-in and tokens-out per provider, per route, per identity, per minute. Budgets fire before the next provider error reaches the application.
The same accounting captures cost. A model-routing rule that selects Bedrock Claude for general traffic and Anthropic-direct Claude for sensitive routes can be expressed in policy and the audit record carries the rule that picked the provider.
What this looks like for the EU AI Act
EU AI Act Article 12 requires automatic recording over the lifetime of the system. The system spans providers; the audit record is the artifact that crosses providers in a single trail. Article 19's six-month retention applies to the gateway's audit pipeline; provider-side logs are a secondary record the gateway can reach through provider_request_id when the regulator's question demands a deeper trace.
DeepInspect
DeepInspect is the normalized control plane across OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, and Google Vertex. The application sends a normalized request; the gateway resolves identity, classification, policy version, and the routing rule; the adapter layer translates into the provider's wire format. The audit record is the same shape across providers and the regulator's question resolves against a single trail.
The gateway runs in-line with sub-50ms p95 enforcement overhead from internal DeepInspect testing. Provider migrations become policy-plane changes; application code is unaffected. Book a technical deep dive at deepinspect.ai to walk through the multi-cloud posture against your current provider mix.
Frequently asked questions
- Does multi-cloud routing degrade latency?
The adapter overhead is bounded by the gateway's enforcement overhead. The dominant latency contributor is the LLM call itself, which is the same regardless of whether the application or the gateway makes it. The gateway's worker-to-provider hop adds a small fixed amount that the routing benefit absorbs.
- How does the gateway handle provider-specific features?
Provider-specific features that the normalized shape does not yet cover travel through an
extensionsmap that the adapter consumes. The application sets the extension; the audit row carries the extension; the policy plane can constrain which identities are allowed to use which extensions.- What about model-specific safety features?
Bedrock Guardrails, Azure Content Safety, and provider-native content filters continue to operate at the provider boundary. The gateway records their verdicts in the audit row but does not depend on them; the gateway's own policy decisions are the enforced control.
- How does this affect existing OpenAI-SDK applications?
The application points its base URL at the gateway and continues to use the OpenAI SDK. The gateway accepts the OpenAI-shaped request as the normalized shape's canonical wire form and translates to other providers as the rule dictates.
- What is the rollout path?
The rollout typically starts with a single route, a single provider, and a single rule. The application points that route at the gateway; the audit pipeline begins emitting rows. As confidence grows, additional routes and providers are added. The full multi-cloud posture is a target state, not a day-one requirement.