Platform & Architecture

94 posts on platform & architecture.

July 4, 2026

AI tool-use authorization: what the caller can invoke, what the model is allowed to attempt, and where the line sits

AI tool-use authorization decides which tools an LLM caller can invoke, which arguments the caller can pass, and which tool calls the model is allowed to attempt on the caller behalf. Production deployments enforce three layers: caller-role authorization (what the identity is entitled to use), argument-value authorization (what values fall inside the caller scope), and model-behavior authorization (which tool call sequences the deployer permits). This piece walks through the three layers, the failure modes each one catches, and the evidence each layer produces on the per-decision audit record.

tool-callingai-agentauthorizationai-gatewayagent-security

Read post →

July 4, 2026

AI response tool-call validation: the five checks that run before a tool call reaches the executor

When an LLM response contains a tool call, the tool call sits between the model output and a side effect in a real system. Untouched tool calls execute whatever the model produced, including hallucinated tools, malformed arguments, and unauthorized parameters. Production deployments run five checks at the gateway before the tool call reaches the executor: schema validation, tool-allowlist check, argument authorization, idempotency-key attachment, and audit-record production. This piece walks through each check, the failure modes it catches, and how the checks compose across the OpenAI, Anthropic, and Bedrock tool-call formats.

tool-callingai-agentai-gatewayagent-securityllm-response

Read post →

July 4, 2026

LLM multi-model routing: the invariants that hold when you serve traffic from more than one provider

LLM multi-model routing spreads traffic across two or more model providers so a single-vendor outage, price change, or policy shift does not stop production. The pattern is simple in principle and complicated in practice because different providers have different token formats, streaming semantics, tool-call schemas, and safety-refusal patterns. This piece walks through the six invariants that hold regardless of provider (identity resolution, classification, policy, audit, idempotency, and response normalization) and the three variances that do not (token accounting, streaming chunking, and tool-call format).

llm-routermulti-modelai-architectureai-gatewayanthropicopenaibedrock

Read post →

July 4, 2026

LLM fallback routing: the retry chain that survives provider outages without leaking policy

LLM fallback routing chains a primary model to a secondary and tertiary so provider outages, rate-limit errors, and quality regressions do not cause user-visible failures. The failure modes are usually not the fallback logic itself but the boundary between the fallback chain and the policy decision that authorized the request. This piece walks through the four common triggers for fallback, the retry semantics per trigger, the authorized-endpoint constraint, and the idempotency requirements for tool-calling workloads.

llm-routerllm-fallbackai-architectureai-gatewayreliability

Read post →

July 4, 2026

LLM routing strategies: five patterns for production, and where the policy decision constrains each one

LLM routing strategies decide which model, provider, or endpoint handles a given request. Five patterns cover most production deployments: static routing, cost-optimized routing, quality-tiered routing, latency-budgeted routing, and fallback routing. Each pattern operates on request metadata after the policy decision at the gateway has authorized the request and produced the audit record. This piece walks through the five patterns, what each optimizes for, and the constraints the gateway places on all of them.

llm-routerllm-routingai-architectureai-gatewaymulti-model

Read post →

July 4, 2026

The LLM inference gateway: what sits between authenticated callers and the model, and what belongs somewhere else

The LLM inference gateway is the identity-aware policy enforcement point between authenticated users or agents and any model endpoint. It is the layer where authorization, data classification, and audit-record production live. This piece defines the term, walks through the four fields the gateway resolves per request, contrasts it with the inference server, model router, and API gateway it is often confused with, and shows why the audit-write path must be isolated from the caller. Applies to any deployment running an OpenAI-compatible or provider-native LLM API in production.

llm-gatewayllm-inferenceai-architectureai-control-planepolicy-enforcement

Read post →

July 4, 2026

LLM gateway vs LLM router: what each component does and why the enforcement layer sits in only one of them

The LLM gateway and the LLM router occupy different layers in an AI stack even when a vendor bundles them under one label. The gateway is the identity-aware policy enforcement point that sits between authenticated users and any LLM. The router is the traffic-shaping component that decides which model handles a given request. Confusing the two produces predictable failure modes at audit time. This piece walks through the two components, the fields each layer records, and how the policy decision at the gateway constrains what the router is allowed to route to.

llm-gatewayllm-routerai-architectureai-policy-enforcementai-control-plane

Read post →

July 3, 2026

Agent-to-Agent TLS: Mutual Authentication Between AI Agents in a Multi-Agent Workflow

A multi-agent workflow chains AI agents where each agent calls the next over an HTTP transport. The security posture of the chain depends on the mutual authentication between the agents at each hop. This piece walks through the mTLS pattern for agent-to-agent authentication, the certificate lifecycle, and the inspection-layer architecture that binds every agent-to-agent call to a verified identity pair.

agent-to-agentmtlsai-agent-securitymulti-agentai-engineering

Read post →

July 3, 2026

AI Audit Log Immutability: Object Lock, WORM Storage, and the Storage-Layer Contract a Regulator Accepts

The reconstruction test a regulator applies during an AI audit assumes the log record has not been rewritten. The assumption fails when the log lives in a storage layer that permits modification by the same operator who runs the AI application. This piece walks through the immutability contract at the storage layer, S3 Object Lock and Azure Blob immutability policies as implementations, and the audit-record shape that verifies immutability by construction.

ai-audit-logsimmutabilityobject-lockcomplianceai-engineering

Read post →

July 3, 2026

AI Red Teaming Workflow: The Test-Fix-Prove Loop for Enterprise AI Deployments

AI red teaming discovers vulnerabilities in prompt handling, tool-call authorization, and response classification. The finding is one artifact. The fix is another. The evidence that the fix works is a third. This piece walks through a red-teaming workflow that produces all three artifacts inside the enterprise control boundary, and the inspection-layer architecture that turns findings into policy the enforcement layer executes.

red-teamingai-securitypenetration-testingai-engineeringai-governance

Read post →

July 3, 2026

LLM Response Schema Validation: When JSON Mode Is Not Enough

JSON mode and structured output constrain the LLM to produce valid JSON, but the JSON can still contain values that violate business policy, personal data that violates data-classification policy, or tool-call arguments that violate authorization scope. This piece walks through what JSON mode covers, the semantic-validation gap it leaves, and the inspection-layer architecture that runs schema validation and semantic validation on the same response path.

llm-engineeringjson-schemaai-engineeringstructured-outputai-security

Read post →

July 3, 2026

AI Agent OAuth Consent: The Permission Screen Users Never Read and the Blast Radius It Grants

An AI agent that authenticates to a SaaS application via OAuth requests a consent scope from the user. The scope grants the agent standing authorization to call APIs on the user behalf. Users grant scopes they do not read, and the standing authorization outlasts the interaction that produced it. This piece walks through the OAuth consent mechanism, the blast radius it creates, and the inspection-layer controls that constrain the scope after grant.

oauthai-agent-securityconsentnon-human-identityai-engineering

Read post →