← All posts

Platform & Architecture

56 posts on platform & architecture.

Model Routing for Cost: What to Actually Measure Before Switching a Workload from GPT-4 to Haiku

Most "use the cheaper model" posts skip the rigor. Real model routing decisions have four layers: token cost, quality regression on an eval set, latency impact, and governance risk. This article walks through each layer with the questions a platform engineer should answer before flipping a workload from a frontier model to a smaller one, plus an example routing rule expressed at the gateway layer. The gateway is the right place to enforce routing because it has identity and policy context the application does not.

ai-securitypolicy-enforcementarchitecturellmllm-securitydevsecops
Read post →

Mapping a Zero-Trust AI Gateway to NIST''s Upcoming COSAiS Single-Agent and Multi-Agent Overlays

NIST is teeing up the Concept of Operations for Securing AI Systems (COSAiS) overlays in two forms: a Single-Agent overlay and a Multi-Agent overlay, plus an AI RMF Profile for Critical Infrastructure. Federal contractors and critical infrastructure operators will be measured against these. The pre-map advantage is real: federal procurement reviews already reference the work in progress. This article walks the overlay structure, where a zero-trust AI gateway maps to each control family, and the evidence artifact each control consumes.

nistnist-ai-rmfzero-trustai-securitycompliancearchitecture
Read post →

Mapping the OWASP Top 10 for Agentic Applications 2026 to Control Points a Policy Gateway Enforces

OWASP GenAI published the Top 10 for Agentic Applications 2026 as a separate framework from the LLM Top 10. The framework adds the "agentic skills" intermediate behavior layer as a new vulnerable component and reorders the threat list around tool invocation, plan corruption, and identity propagation. This article maps each of the 10 categories to specific control points that a policy gateway at the AI request boundary actually enforces, with example policy rules and the audit fields each control writes.

ai-securityagentic-aillm-securitypolicy-enforcementarchitectureaudit
Read post →

AI Red Team Methodology: A Six-Phase Framework for Adversarial Testing of LLM Applications

Most AI red team engagements run as ad-hoc prompt-injection tests against a chat interface and call the result a red team. A defensible methodology runs through six phases: scope and threat modeling, identity-context attacks, content-vector attacks, agent-layer escalation, multi-turn and persistence attacks, and post-engagement reporting against a remediation owner. This article walks through each phase, the techniques each phase deploys, the evidence the red team should capture, the remediation owner each finding routes to, and the integration points with the rest of the security program.

ai-red-teamadversarial-testingprompt-injectionagent-securitysecurity-testing
Read post →

AI Gateway Multi-Tenant Isolation: Identity, Policy, and Audit at the Tenant Boundary

Multi-tenant AI deployments share infrastructure across tenants and have to enforce isolation at the request boundary. Tenant context attached at authentication time has to flow through every policy decision, every tool invocation, every retrieval call, and every audit record. A gateway that maintains the tenant boundary at all four touch points is the architectural pattern that keeps multi-tenant AI safe under load. This piece walks through where the tenant context has to land and what the audit record looks like when isolation holds.

ai-gatewaymulti-tenantai-securityisolationpolicy-enforcementaudit
Read post →

AI Gateway Redaction for RAG Contexts: Stopping Cross-Tenant Data Leakage

A retrieval-augmented generation pipeline fetches documents from a vector store, concatenates them into the prompt context, and sends the assembled prompt to the LLM. The fetched chunks can carry data the requesting user is not authorized to see. The model has no way to distinguish authorized content from leaked content. An AI gateway that redacts at the context-assembly boundary, with identity-bound policy on each retrieved chunk, is the architectural pattern that stops cross-tenant data leakage in RAG.

ai-gatewayragredactionai-securitypolicy-enforcementdata-leakage
Read post →

AI Gateway Policies for Tool Use: Authorizing Function Calls at the Request Boundary

Tool use turns an LLM call into a sequence of function invocations against the application backend, the file system, third-party APIs, and other tools the model is allowed to call. Each function call has its own authorization scope and its own audit shape. An AI gateway that enforces policy on the model request alone leaves the tool invocations unauthorized. This piece walks through the architecture for authorizing tool use at the request boundary, the per-tool policy shape, and the audit record that captures the full tool-use trace.

ai-gatewaytool-usefunction-callingai-securitypolicy-enforcementaudit
Read post →

AI Gateway Architecture for Streaming LLM Responses: Policy, Audit, Backpressure

Streaming LLM responses arrive as server-sent events or chunked HTTP, token by token, over a connection that may stay open for seconds or minutes. An AI gateway built for request-response patterns cannot enforce policy, redact sensitive content, or produce per-decision audit records on streaming traffic without re-architecting the proxy. This piece walks through the architectural changes streaming requires, the enforcement model that holds at chunk granularity, and the audit record shape that survives the inspection.

ai-gatewaystreamingai-securitypolicy-enforcementauditarchitecture
Read post →

AI Gateway High Availability: The Failure Modes That Matter and the Topology That Survives Them

An AI gateway sits inline between the user and the LLM. When the gateway fails, the AI traffic either stops (fail closed) or bypasses the gateway (fail open). Both choices have costs. This article walks through the failure modes that matter in production, the topology patterns that survive them, and the architectural trade-offs around fail-closed vs fail-open under regulatory pressure.

ai-gatewayarchitectureai-securityinline-enforcementcloud-security
Read post →

Per-Route AI Policies: Attaching Policy to the URL Path, Not the Application

Per-route AI policies attach the policy decision to the API route the request is calling, not to the application that initiated it. Different LLM endpoints carry different risk profiles. The chat-completion endpoint, the embeddings endpoint, the file-upload endpoint, the batch endpoint, the audio endpoint, and the agent action surfaces each warrant their own rules. I walk through what per-route policy looks like in practice, how route patterns express AI-specific constraints, and how the architecture composes with per-role policy and prompt-level classification at the inspection point.

policy-enforcementai-securityarchitectureinline-enforcementllm
Read post →

Bedrock API Gateway: Inspection at the AWS Bedrock Runtime Boundary

A Bedrock API gateway is the inspection point traffic to the AWS Bedrock runtime passes through before it reaches the model. The gateway attaches identity context the application supplies, runs prompt-level classification, evaluates policy, and writes a per-decision audit record. The architecture sits between callers and the InvokeModel, Converse, RetrieveAndGenerate, and agents APIs Bedrock exposes. I walk through the inspection points across each surface, how the gateway interacts with Bedrock Guardrails, and what the deployment trade-offs look like inside AWS networking.

ai-securityllmpolicy-enforcementinline-enforcementarchitecturecloud-security
Read post →

The Anthropic API Gateway: Where the Inspection Point Sits Between Your Workforce and api.anthropic.com

An Anthropic API gateway is the inspection point HTTP traffic to api.anthropic.com passes through before it reaches Claude. The gateway attaches identity context, classifies prompt content, evaluates policy, and writes a per-decision audit record. The architecture sits between authenticated users or agents and the Anthropic endpoints (messages, batch, files, computer-use beta, prompt caching). I walk through the inspection points across each API surface, how identity attaches on top of static Anthropic API keys, and how policy enforces against Claude-specific patterns like prompt caching and the computer-use tool.

ai-securityllmpolicy-enforcementinline-enforcementarchitecture
Read post →