Does the gateway require code changes in the application?

The application changes one configuration line: the OpenAI SDK's base_url. The SDK methods (client.chat.completions.create, client.embeddings.create, streaming handlers) are unchanged. The application's authentication switches from the OpenAI vendor key to a gateway-issued token, but the SDK call surface is identical.

How does the gateway handle OpenAI's organization and project headers?

OpenAI's OpenAI-Organization and OpenAI-Project headers pass through the gateway transparently. The gateway records the values in the audit metadata field for traceability. Outbound to api.openai.com, the gateway uses the deployer's organization-level key, which the application no longer needs to hold.

What is the gateway's behavior on function calling and tool use?

Function-calling completions are inspected on both sides. The gateway classifies the prompt and the tool-call response before returning either to the application. For agentic patterns where the model issues a tool call, the gateway evaluates the function arguments against the policy bundle (a destructive function call with PII arguments may trigger a block).

Can the gateway run in front of Azure OpenAI as well as OpenAI?

The gateway accepts both Azure OpenAI's REST surface and OpenAI's REST surface. Routing is per-tenant. A deployer that uses Azure OpenAI for HIPAA-covered workloads and OpenAI direct for non-PHI workloads can run both through one gateway with separate policy bundles per route.

OpenAI API Gateway Setup: An Implementation Walkthrough for Enterprise Deployments

A direct OpenAI integration sends an authenticated POST to https://api.openai.com/v1/chat/completions with a bearer token, a model ID, and a prompt body. The application's TLS terminates at OpenAI's edge. The cleartext prompt is visible only to the application and to OpenAI. No third control point sees the request. That topology fails three obligations every enterprise deployment carries: EU AI Act Article 12 (the deployer has to produce a per-decision audit record), NIST AI RMF MANAGE 1.3 (action lineage), and HIPAA (PHI does not cross to a vendor without a BAA and a redaction policy on the wire).

The architectural answer is a gateway between the application and api.openai.com. This guide walks through the implementation.

Request path

The gateway sits between the application and OpenAI:

TLS A terminates at the gateway, not at OpenAI's edge. The gateway decrypts the prompt body, runs the inspection chain, and re-encrypts to OpenAI over TLS B. The application configures its OpenAI SDK to point at the gateway URL instead of api.openai.com:

The gateway-issued token carries the verified caller identity. The application no longer holds the OpenAI vendor key. The vendor key is held by the gateway, scoped to the gateway's outbound calls.

Identity model

The gateway expects a JWT or a session token with a verified sub claim. The token resolves to one of three subject types:

User. A natural person acting through an authenticated application session. Maps to MAP 1.5 of AI RMF.
Agent-on-behalf. An autonomous agent acting on behalf of a verified principal. The token carries both the agent's identity and the principal's identity. Maps to the agent identity model in Article 26 of the EU AI Act.
Service. A backend job acting under a service principal. Service requests carry the principal that scheduled the job, not the job runtime.

The identity context is the input to the policy decision. A request without a resolved subject claim fails closed.

Classification

The decrypted prompt body runs through four classification stages. Each stage emits a verdict that the policy engine reads.

1. PII detection

Regex passes for SSN, credit card, phone, email. Named-entity recognition for person names and addresses. A nine-digit number near the literal token SSN is a stronger signal than a nine-digit invoice number, so the classifier carries lightweight context windows for each pattern.

2. PHI detection

The 18 HIPAA identifiers (45 CFR 164.514(b)). MRN, account numbers, full-face photos, dates of birth, geographic subdivisions smaller than a state. PHI detection is separate from PII because PHI carries HIPAA-specific routing rules (route only to BAA-covered endpoints, redact otherwise).

3. Secret detection

API keys, AWS access keys, GitHub tokens, OAuth bearer tokens, private keys (PEM, OpenSSH, PKCS). The detector runs against the prompt body and the system message. A request that carries a secret in the prompt is blocked unconditionally; secrets do not belong in vendor-bound payloads.

4. Prompt-injection detection

Direct and indirect injection patterns (OWASP LLM01). The classifier reads the prompt and any retrieved context the application attached. A high-confidence injection verdict triggers a block or a sanitization rewrite depending on policy.

Policy lookup

The policy engine reads the request context (subject, route, classification verdicts) and returns a decision plus a reason code. The decision is one of:

Pass. Forward the request to api.openai.com.
Block. Return a 403 to the application with the policy reason code.
Redact. Rewrite the prompt to remove the offending field, then forward.
Route-only. Forward only if the destination route matches the policy's allowed-destination set (used for PHI to BAA-covered endpoints).

The policy bundle is versioned. Each request's audit record stamps the SHA-256 hash of the policy bundle in effect at decision time.

OpenAI route handling

The gateway implements the OpenAI v1 API surface (chat completions, completions, embeddings, audio, image, files). Each endpoint carries a different inspection profile:

| Endpoint | Inspection | |---|---| | /v1/chat/completions | full classification chain on messages[*].content | | /v1/embeddings | classification on input, lighter policy (read-only intent) | | /v1/files | content scan on the uploaded body, route to allowlisted purposes only | | /v1/images/generations | prompt classification only, output policy on completion | | /v1/audio/transcriptions | classification on the transcribed text after transcription |

Streaming responses (stream: true) require completion-side inspection that runs against each delta chunk. The gateway buffers chunks for the inspection window, applies the policy, and forwards approved chunks to the application.

Audit-record format

Each request produces one record:

The record is committed synchronously before the gateway returns the OpenAI response to the application. The HMAC chain produces tamper-evidence on the record set. Retention defaults to six months (EU AI Act Article 19 baseline) and extends to six years for HIPAA-covered deployments.

Failure modes the gateway has to handle

OpenAI 429. Upstream rate limit. The gateway returns the 429 to the application after writing the audit record with decision: pass, outcome: upstream-rate-limit.
Policy lookup timeout. The decision point times out. The gateway fails closed; the request returns 503 with reason_code: policy.fail-closed.lookup-timeout.
Audit-writer error. The audit store cannot accept the write. The gateway fails closed for that request and surfaces an SRE alert. Forwarding a request whose audit cannot be persisted produces the same evidentiary gap as no audit at all.
Identity verification failure. The JWT is expired or the signature does not validate. The gateway returns 401 with reason_code: identity.fail.verify.

Performance budget

The OpenAI inference call itself runs 500 ms to 5 seconds. The gateway's added latency budget is 50 ms (P99) across the full inspection chain. The split:

TLS termination and re-encryption: 4 ms
Identity verification (JWT validate): 2 ms
PII classification: 8 ms
PHI classification: 6 ms
Secret detection: 3 ms
Prompt-injection detection: 18 ms
Policy lookup: 4 ms
Audit write (synchronous): 5 ms

Total P99: 50 ms. The application's user-facing latency is dominated by the OpenAI call; the gateway's overhead is in the noise floor.

DeepInspect

DeepInspect is an OpenAI-compatible gateway. Application teams change one configuration line (the base_url) and the existing OpenAI SDK calls route through the inspection chain. Identity verification, the four-stage classification, the versioned policy bundle, the chained audit-record format, and the streaming-response inspection are all implemented. The reference deployment runs the full chain under 50 ms P99 on the deployer's own infrastructure.

We have a working integration in production for OpenAI deployments at finance, healthcare, and federal-government deployers. The reference architecture is documented end to end.

Book a technical deep dive at deepinspect.ai.