← All posts

Platform & Architecture

41 posts on platform & architecture.

AI Red Team Methodology: A Six-Phase Framework for Adversarial Testing of LLM Applications

Most AI red team engagements run as ad-hoc prompt-injection tests against a chat interface and call the result a red team. A defensible methodology runs through six phases: scope and threat modeling, identity-context attacks, content-vector attacks, agent-layer escalation, multi-turn and persistence attacks, and post-engagement reporting against a remediation owner. This article walks through each phase, the techniques each phase deploys, the evidence the red team should capture, the remediation owner each finding routes to, and the integration points with the rest of the security program.

ai-red-teamadversarial-testingprompt-injectionagent-securitysecurity-testing
Read post →

AI Gateway Multi-Tenant Isolation: Identity, Policy, and Audit at the Tenant Boundary

Multi-tenant AI deployments share infrastructure across tenants and have to enforce isolation at the request boundary. Tenant context attached at authentication time has to flow through every policy decision, every tool invocation, every retrieval call, and every audit record. A gateway that maintains the tenant boundary at all four touch points is the architectural pattern that keeps multi-tenant AI safe under load. This piece walks through where the tenant context has to land and what the audit record looks like when isolation holds.

ai-gatewaymulti-tenantai-securityisolationpolicy-enforcementaudit
Read post →

AI Gateway Redaction for RAG Contexts: Stopping Cross-Tenant Data Leakage

A retrieval-augmented generation pipeline fetches documents from a vector store, concatenates them into the prompt context, and sends the assembled prompt to the LLM. The fetched chunks can carry data the requesting user is not authorized to see. The model has no way to distinguish authorized content from leaked content. An AI gateway that redacts at the context-assembly boundary, with identity-bound policy on each retrieved chunk, is the architectural pattern that stops cross-tenant data leakage in RAG.

ai-gatewayragredactionai-securitypolicy-enforcementdata-leakage
Read post →

AI Gateway Policies for Tool Use: Authorizing Function Calls at the Request Boundary

Tool use turns an LLM call into a sequence of function invocations against the application backend, the file system, third-party APIs, and other tools the model is allowed to call. Each function call has its own authorization scope and its own audit shape. An AI gateway that enforces policy on the model request alone leaves the tool invocations unauthorized. This piece walks through the architecture for authorizing tool use at the request boundary, the per-tool policy shape, and the audit record that captures the full tool-use trace.

ai-gatewaytool-usefunction-callingai-securitypolicy-enforcementaudit
Read post →

AI Gateway Architecture for Streaming LLM Responses: Policy, Audit, Backpressure

Streaming LLM responses arrive as server-sent events or chunked HTTP, token by token, over a connection that may stay open for seconds or minutes. An AI gateway built for request-response patterns cannot enforce policy, redact sensitive content, or produce per-decision audit records on streaming traffic without re-architecting the proxy. This piece walks through the architectural changes streaming requires, the enforcement model that holds at chunk granularity, and the audit record shape that survives the inspection.

ai-gatewaystreamingai-securitypolicy-enforcementauditarchitecture
Read post →

AI Gateway High Availability: The Failure Modes That Matter and the Topology That Survives Them

An AI gateway sits inline between the user and the LLM. When the gateway fails, the AI traffic either stops (fail closed) or bypasses the gateway (fail open). Both choices have costs. This article walks through the failure modes that matter in production, the topology patterns that survive them, and the architectural trade-offs around fail-closed vs fail-open under regulatory pressure.

ai-gatewayarchitectureai-securityinline-enforcementcloud-security
Read post →

Per-Route AI Policies: Attaching Policy to the URL Path, Not the Application

Per-route AI policies attach the policy decision to the API route the request is calling, not to the application that initiated it. Different LLM endpoints carry different risk profiles. The chat-completion endpoint, the embeddings endpoint, the file-upload endpoint, the batch endpoint, the audio endpoint, and the agent action surfaces each warrant their own rules. I walk through what per-route policy looks like in practice, how route patterns express AI-specific constraints, and how the architecture composes with per-role policy and prompt-level classification at the inspection point.

policy-enforcementai-securityarchitectureinline-enforcementllm
Read post →

Bedrock API Gateway: Inspection at the AWS Bedrock Runtime Boundary

A Bedrock API gateway is the inspection point traffic to the AWS Bedrock runtime passes through before it reaches the model. The gateway attaches identity context the application supplies, runs prompt-level classification, evaluates policy, and writes a per-decision audit record. The architecture sits between callers and the InvokeModel, Converse, RetrieveAndGenerate, and agents APIs Bedrock exposes. I walk through the inspection points across each surface, how the gateway interacts with Bedrock Guardrails, and what the deployment trade-offs look like inside AWS networking.

ai-securityllmpolicy-enforcementinline-enforcementarchitecturecloud-security
Read post →

The Anthropic API Gateway: Where the Inspection Point Sits Between Your Workforce and api.anthropic.com

An Anthropic API gateway is the inspection point HTTP traffic to api.anthropic.com passes through before it reaches Claude. The gateway attaches identity context, classifies prompt content, evaluates policy, and writes a per-decision audit record. The architecture sits between authenticated users or agents and the Anthropic endpoints (messages, batch, files, computer-use beta, prompt caching). I walk through the inspection points across each API surface, how identity attaches on top of static Anthropic API keys, and how policy enforces against Claude-specific patterns like prompt caching and the computer-use tool.

ai-securityllmpolicy-enforcementinline-enforcementarchitecture
Read post →

The OpenAI API Gateway: Where the Inspection Point Sits Between Your Workforce and api.openai.com

An OpenAI API gateway is the inspection point your traffic to api.openai.com passes through before it reaches the model. The gateway attaches identity context, runs prompt-level classification, evaluates policy, and produces a per-decision audit record. The architecture sits between authenticated users or agents and OpenAI endpoints (chat completions, responses, embeddings, audio, batch, assistants). I walk through what the gateway intercepts, how the API surfaces map to the inspection points, and what the trade-offs are between deploying it as a SaaS-hosted proxy, a VPC-isolated proxy, or a sidecar.

ai-securityllmpolicy-enforcementinline-enforcementarchitecture
Read post →

AI Policy as Code: The Declarative Pattern That Makes Enforcement Auditable

AI policy as code expresses the rules that govern AI usage in a declarative configuration format checked into version control, evaluated at the AI request boundary, and versioned per decision in the audit record. The pattern differs from policy as documents at three points: machine-readable expression that the gate evaluates directly, version control that ties each decision to the policy in effect at the moment, and code review that captures the change history. I walk through what the policy actually contains, how the gate evaluates it, and how the audit record references it.

ai-policypolicy-as-codeai-securityenforcementengineeringcompliance
Read post →

AI Gateway TLS Termination: Why the Inspection Point Has to Decrypt the Request Body

An AI gateway terminates the outbound TLS session to the LLM provider so the inspection point can read the JSON request body in plaintext, classify the prompt content, evaluate identity-aware policy, and write a per-decision audit record. The architectural choice differs from a pass-through proxy at three points: control of the certificate chain, decryption authority over the prompt body, and re-encryption to the upstream provider with the gateway-managed identity. I walk through how the termination works, what it costs, and what the 2026 compliance set requires from the inspection point.

ai-gatewaytlsengineeringai-securityenforcementarchitecture
Read post →