← Blog

Azure AI Content Safety Architecture Deep Dive: Where the Inspection Sits and What It Cannot See

Azure AI Content Safety runs inside the Azure-hosted classification path. The product covers text, image, prompt-shield, groundedness, and protected-material checks the deployer composes through the Content Safety endpoint. This piece walks through the request path, the API surfaces, the policy categories, the audit records the deployer receives through Azure Monitor and the Foundry observability stack, and the deployment patterns the Azure-only customer and the multi-cloud customer should each consider.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
AI Security Solutionsazure-aicontent-safetyai-securityai-architectureinline-enforcementaudit-logs
Azure AI Content Safety Architecture Deep Dive: Where the Inspection Sits and What It Cannot See

Azure AI Content Safety is the Microsoft-side classification layer Azure ships alongside Azure OpenAI Service and the Azure AI Foundry. The product evolved from the original Content Moderator and now covers text categories, image categories, prompt shields, groundedness detection, and protected-material checks. The architecture runs inside the Azure region the customer's resource sits in and exposes a callable classification API the application invokes either inline against Azure OpenAI traffic or standalone against an arbitrary text or image payload. The position has consequences for the Azure-only customer that wants to rely on Microsoft-authored policy and the multi-cloud customer that needs per-decision evidence across non-Azure endpoints.

I want to walk through the Content Safety request path, the API surfaces, the policy categories, the audit records the deployer receives through Azure Monitor and the Foundry observability stack, and the deployment patterns the Azure-only and the multi-cloud customer should each consider.

The Content Safety request path

A Content Safety call goes against the customer-region Content Safety endpoint (the resource's regional URL). The application supplies the text or image payload, the categories the application wants the service to evaluate, and the severity thresholds. The service runs the classifier and returns a per-category severity score the application's policy logic reads.

The standalone classification call is the building block. The application composes the call into the request path. A common composition runs the call as a pre-check before the Azure OpenAI invocation: if the prompt's harmful-content score exceeds the threshold, the application blocks the invocation. A second composition runs the call as a post-check on the model response: if the response's harmful-content score exceeds the threshold, the application redacts or blocks the surface.

The Azure OpenAI Service includes a separate content filtering layer that runs inside the OpenAI deployment by default. The default filter handles the four standard categories (hate, sexual, self-harm, violence) at a service-default severity. The customer can configure the severity thresholds through the deployment's content filter configuration in Foundry. The default filter and the standalone Content Safety API are related products with overlapping classifiers but separate configuration surfaces.

The Prompt Shields API targets prompt-injection detection. The application supplies the user input and any retrieved documents the prompt context includes. The shield evaluates the user input for direct jailbreak attempts and the retrieved documents for indirect injection patterns. The shield returns separate verdicts for each, which the application's policy logic reads.

The Groundedness Detection API targets RAG-style applications. The application supplies the response, the grounding source documents, and an optional reasoning flag. The API evaluates whether the response is grounded in the source and returns a verdict the application's policy reads.

The Protected Material Detection API targets the model's output reproducing copyrighted text or code. The API evaluates the response against a corpus of known protected material and returns a verdict.

The five API surfaces

The first surface is text content categories. The classifier covers hate, sexual, self-harm, and violence with severity scores. The deployer configures severity thresholds through the API call or through the Azure OpenAI deployment configuration.

The second surface is image content categories. Same four categories, same severity model, against image payloads.

The third surface is prompt shields. Direct jailbreak detection on user input and indirect injection detection on retrieved documents.

The fourth surface is groundedness detection. RAG response evaluation against grounding sources.

The fifth surface is protected material detection. Response evaluation against the protected material corpus.

The five surfaces compose a content-safety policy the deployer chains in the request path. The deployer's application code orchestrates the chain and reads the verdicts.

Where the inspection actually runs

The classifier runs inside the Azure region the customer's Content Safety resource is provisioned in. The application calls the regional endpoint over HTTPS. The classification happens server-side at Azure. The deployer's policy decision (block, redact, modify) runs in the deployer's application code based on the verdict the service returns.

The position has three consequences. The classifier coverage is what Microsoft publishes. A category outside the five published surfaces requires a separate classifier the deployer runs. A per-user policy that depends on the natural-person identity, a per-tenant policy that depends on customer attribution, or a per-region policy that depends on data residency runs in the deployer's application logic, not in the Content Safety service.

The audit record is what the deployer composes from the Content Safety API responses and the Azure Monitor traces. The records the Foundry observability stack writes carry the model invocation outcome and the content filter intervention. The deployer's application logs (when configured) carry the Content Safety verdict alongside the application's own context.

The non-Azure endpoints are not covered by default. A multi-model deployment that uses Azure OpenAI for one workload and Anthropic for another can call Content Safety standalone against the Anthropic prompt, but the policy then depends on the deployer's orchestration code rather than on a single inspection layer.

The audit records the deployer receives

Azure Monitor captures the Content Safety API call with the request identifier, the timestamp, the resource identifier, and the response status. Azure OpenAI invocation logs (when diagnostic logging is enabled) capture the prompt, the response, the content filter intervention, and the model identifier.

The records satisfy Microsoft's internal audit needs and the SOC 2 evidence Microsoft provides for its own controls. The deployer's external audit needs are partly satisfied. EU AI Act Article 12 expects identification of the natural persons involved in the decision; the records carry the application's Azure AD identity rather than the natural-person identity when the application calls Azure OpenAI with an application service principal. The deployer that needs the natural-person identity in the records configures Azure AD Single Sign-On for the application or runs an external inspection layer that captures the identity at the application boundary.

HIPAA-covered deployments running on Azure OpenAI work under the Microsoft HIPAA BAA. The audit records the BAA covers are the Microsoft-side records. The deployer's audit records (the application-side composition of Content Safety verdicts, the policy decisions, and the per-decision evidence) are the deployer's responsibility under the shared responsibility model.

The deployment patterns

The Azure-only customer with a single-model deployment composes Content Safety into the Azure OpenAI request path. The configuration covers the published content categories and the deployer's application code handles the policy decisions. The Azure Monitor and Foundry observability stack hold the records.

The Azure-only customer with a multi-model deployment (Azure OpenAI for one workload, a self-hosted model for another) calls Content Safety standalone against the self-hosted model's traffic. The deployer's application code orchestrates the standalone calls. The pattern works when the deployer's application code is the right layer to author the orchestration.

The multi-cloud customer with Azure OpenAI alongside Bedrock, Anthropic, or OpenAI's direct API faces the policy fragmentation. The Azure Content Safety classifier covers the Azure-routed traffic by default. The other routes either run uncovered, run their own provider-side filter (each with a different configuration surface), or run an external inspection layer that covers all routes from a single policy surface. The external layer is the pattern the multi-cloud customer most often lands on for the per-decision evidence.

The deployer that needs per-user policy based on directory roles, per-tenant policy based on customer attribution, or per-region policy based on data residency runs the policy in the application code or in an external layer because the Content Safety configuration surface focuses on content categories.

DeepInspect

This is the architectural context DeepInspect deploys into for Azure customers. DeepInspect sits inline between the application and each AI endpoint the application calls. The inspection layer reads the prompt, the retrieved context, the response, and the identity the application propagates. The layer evaluates identity-aware policy (per-user, per-tenant, per-region, per-route) and commits per-decision audit records bound to the verified natural-person identifier.

For an Azure-only deployment that needs per-decision evidence at the natural-person level, DeepInspect runs alongside Azure Content Safety. The Azure surface handles the Microsoft-authored content categories. DeepInspect's records carry the natural-person identity, the deployer-authored policy decisions, and the integrity metadata an external auditor reads. The two record series compose into the full per-decision evidence the regulator and the customer auditor consume.

For a multi-cloud deployment that includes Azure OpenAI and non-Azure endpoints, DeepInspect covers all routes from a single policy surface. The deployer authors one policy that applies across Azure OpenAI, Bedrock, Anthropic, and any on-prem model the deployment runs. Book a technical deep dive at deepinspect.ai.

Frequently asked questions

Can Azure Content Safety enforce a per-user policy based on directory roles?

The Content Safety API surfaces the content classifier verdicts. The deployer's application code reads the verdicts and applies policy decisions. The policy logic that varies the decision based on the directory role lives in the application code. A deployer that wants the per-user policy outside the application code runs an external inspection layer that reads the verified identity and the classifier verdict at the request boundary.

How does the deployer get the natural-person identity into the Azure OpenAI invocation logs?

The application configures Azure AD Single Sign-On so the user's identity propagates into the Azure OpenAI session. The Foundry observability stack then captures the user identifier alongside the model invocation. The pattern works for Azure-native applications. A SaaS multi-tenant application that uses its own auth system needs an external mechanism to propagate the identity.

What is the difference between Prompt Shields and a general prompt-injection classifier?

Prompt Shields targets two specific patterns: direct jailbreak attempts in user input and indirect injection through retrieved documents. The API returns separate verdicts for each. A general prompt-injection classifier covers a broader pattern surface and may surface additional verdicts (encoded payloads, multi-step persuasion, tool-poisoning attempts). The deployer that wants broader coverage runs Prompt Shields plus an external classifier for the additional patterns.

How does Groundedness Detection compare to a deployer's own RAG evaluation?

Groundedness Detection evaluates the response against the grounding source the deployer supplies. The API returns a verdict that the deployer's policy reads. A deployer's own RAG evaluation may run different metrics (retrieval recall, answer relevance) and produce different verdicts. The two compose: the Microsoft service for the standard groundedness check, the deployer's own evaluation for the workload-specific metrics.

Does Azure Content Safety satisfy EU AI Act Article 12 on its own?

Article 12 expects record-keeping that covers the period of use, the input data, and the identification of the natural persons involved. The Azure Monitor and Foundry observability stack records cover the period of use and the input data when diagnostic logging is enabled. The natural-person identification reaches the records when the application propagates the identity through Azure AD or an external identity broker. The deployer that operates outside that pattern needs an external layer to capture the natural-person identity at the application boundary.