Does the inspection layer break streaming responses?

Streaming responses accumulate into a full response object before Pass 2 evaluates. The client receives the streaming tokens with a small buffer that lets Pass 2 evaluate the complete response before the final chunk closes. For applications where the token-by-token stream is the product experience, the buffer is a tradeoff, and the inspection layer supports policies that allow token-level streaming with a post-stream validation pass for audit purposes only.

How does the layer handle responses that fail Pass 1?

A response that fails Pass 1 is returned to the caller as an error. The audit record captures the model's raw response, the schema that was declared, and the specific validation failure. Applications can implement a retry loop that regenerates with the model and reruns Pass 1. Deployments with strict SLAs cap the retry count and surface the failure to the caller after the cap.

Are the semantic rules declarative or code?

Both patterns are supported. Declarative rules cover the common cases (data classification, authorization scope, simple business logic) and read like the YAML samples above. Complex cases that require custom evaluation logic run as sandboxed code with a defined input contract (the parsed response, the caller identity, the request context) and a defined output contract (allow, block, redact, or approve).

How does semantic validation interact with model fine-tuning?

Fine-tuning changes what the model tends to produce, not what the enterprise's policy allows. A fine-tuned model that is unlikely to violate a semantic rule still runs through the Pass 2 evaluation, because the enforcement layer is where the compliance artifact is produced regardless of the model's training. The inspection layer is the point of record.

Does the layer work with providers that do not offer structured output?

Yes. Providers that do not offer schema-conditioned responses return responses that may or may not parse as valid JSON. Pass 1 validates the parsed content against the schema the caller declared, and responses that fail Pass 1 error to the caller. The inspection layer treats the provider's structured-output feature as a hint, not a guarantee, and validates independently.

What does the audit record show for a redacted response?

The audit record shows the original response, the rule that triggered the redaction, and the redacted response as delivered. The auditor reading the record can reconstruct the original content, the enforcement action, and the delivered content. The reconstruction is the artifact the Article 12 record-keeping mandate expects.

LLM Response Schema Validation: When JSON Mode Is Not Enough

An LLM that returns JSON through OpenAI's Structured Outputs, Anthropic's tool use, or Google's response schema is constrained to produce a document that satisfies the schema at the syntactic level. The document parses. The required fields are present. The types match. The syntactic guarantee is what these features deliver, and the guarantee is real.

The syntactic guarantee is not a semantic guarantee. A JSON document that satisfies the schema can still contain values that violate business policy, personal data that violates data-classification policy, or tool-call arguments that violate authorization scope. The response validates against the JSON schema. The response does not validate against the enterprise's policy. The two validations run at different layers.

I want to walk through what JSON mode actually covers, the semantic-validation gap it leaves, and the inspection-layer architecture that runs schema validation and semantic validation on the same response path.

What JSON mode covers

The schema-conditioned response modes across providers deliver a common set of guarantees.

The response parses as valid JSON. There is no trailing prose, no markdown fencing, no malformed structure. Applications that consume the response can call JSON.parse without wrapping the call in defensive text extraction.

The response satisfies the JSON Schema declaration the caller sent. All required fields are present. All types match the schema declaration. All enum values fall inside the enum list. Nested objects satisfy their nested schemas.

The response respects field-level constraints the schema declares. Minimum and maximum values on numbers. Minimum and maximum length on strings. Pattern constraints on string values via regex.

The response format is deterministic across retries at temperature 0. Two calls with the same prompt and schema produce responses that parse identically (though the content may differ if temperature > 0).

The semantic-validation gap

The syntactic guarantees do not cover the four categories of validation an enterprise needs on an LLM response.

Business-rule validation

A schema declares that discount_percent is an integer between 0 and 100. The business rule is that discounts above 25 require manager approval. The schema-conditioned response can return discount_percent: 50 and satisfy the schema. The business rule fails, and the application processes the discount unless something else catches it.

Data-classification validation

A schema declares that customer_response is a string. The data-classification policy says customer responses cannot include personal information for accounts in the EU without additional consent. The schema-conditioned response can return a customer response that contains the customer's home address. The schema is satisfied. The classification policy is violated, and the response ships to a channel that has not passed the consent test.

Authorization-scope validation

A schema declares that target_account_id is a UUID. The authorization scope for the current caller is limited to accounts the caller is assigned to. The schema-conditioned response can return any valid UUID, including UUIDs of accounts the caller has no authorization to read or modify. The schema is satisfied. The authorization scope is violated.

Cross-field consistency validation

A schema declares two fields independently. The business rule is that the two fields have to satisfy a joint constraint. order_total = sum(line_items) + tax - discount. The schema declares each field. The schema does not express the arithmetic relationship. The schema-conditioned response can return a document where the arithmetic fails.

Inspection-layer architecture

The four categories of validation run at the inspection layer on the response, before the response returns to the caller. The layer runs two passes over the response.

Pass 1: schema validation

The inspection layer runs the JSON Schema validation the caller declared. The pass catches responses that do not satisfy the declared schema even when the provider's structured-output mode is enabled. Providers occasionally miss the schema constraint (mid-stream errors, provider-side format drift), and the inspection layer's pass is the safety check.

Pass 2: semantic validation

The inspection layer runs the enterprise's semantic validation rules over the parsed response. The rules include business logic, data classification, authorization scope, and cross-field consistency. The rules are declared once at the inspection layer and apply to every response that flows through, regardless of which model produced it.

The rules run on the response after Pass 1 succeeds. A rule that triggers a block returns an error to the caller without delivering the response. A rule that triggers a redact modifies the response before delivery. A rule that triggers approval routes the response to an approval queue.

The tool-call variant

The same architecture applies to tool calls the model requests. A model that returns a tool_calls array is asking the caller to execute the listed tools. The tool-call arguments are the response equivalent of the JSON body.

The inspection layer runs Pass 1 (schema validation of the tool-call arguments against the tool's declared schema) and Pass 2 (semantic validation of the tool-call arguments against the caller's authorization scope, the target resource's data classification, and the business rules that apply). A tool call that satisfies the schema but violates the scope blocks at Pass 2 before the tool executes.

Compliance implications

The semantic validation layer produces the artifacts multiple compliance regimes require.

The EU AI Act Article 12 record-keeping mandate requires the log to reconstruct what the AI system did. A response that was validated and modified by the inspection layer carries both the pre-validation and post-validation form in the audit record. The auditor can reconstruct the original response, the validation rules that fired, and the modification the inspection layer applied.

The OWASP AISVS 1.0 requirements V3.2 (output validation) and V3.3 (output filtering) map directly to the two-pass architecture. The Pass 1 validation satisfies V3.2. The Pass 2 semantic validation satisfies V3.3.

The GDPR data-minimization principle requires that responses containing personal data are constrained to the identities authorized for the data. The Pass 2 data-classification validation is the enforcement mechanism.

Performance profile

The two-pass validation adds latency to every response. The added latency is dominated by the Pass 2 semantic evaluation, which depends on the complexity of the rules. From internal DeepInspect testing, the added latency is under 10 ms for a typical rule set (10 to 30 rules) on a response under 4 KB. The LLM inference latency itself is 500 ms to 5 seconds, so the added validation latency is under 2% of the total end-to-end response time in the common case.

DeepInspect

This is exactly what DeepInspect does. DeepInspect sits inline between your users or agents and the LLM APIs they call. Every response runs through Pass 1 schema validation and Pass 2 semantic validation before the response returns to the caller.

The semantic rules are declared once at the inspection layer and apply to every response. Multiple providers, multiple models, and multiple applications all pass through the same rule set. The audit record captures the pre-validation response, the rules that fired, and the post-validation response the caller received.

Book a demo today.