← Blog

AI Tenant Isolation: How Multi-Tenant SaaS Enforces Per-Customer Boundaries on LLM Traffic

Multi-tenant SaaS applications that add LLM features carry a new isolation obligation on top of the database and storage isolation the platform already enforces. Prompts flow through the LLM provider carrying tenant-specific data. Retrieval-augmented generation queries the vector store where tenant data lives. Agent tools call downstream systems that hold tenant data. Each of these paths introduces a way for tenant A's data to reach tenant B's context without a database join between them. This piece walks through the four isolation domains (prompt, retrieval, tool call, response), the enforcement patterns at the AI gateway, and the audit records that demonstrate the isolation held across the audit period.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Platform & Architecturemulti-tenantai-securitytenant-isolationsaasragai-gateway
AI Tenant Isolation: How Multi-Tenant SaaS Enforces Per-Customer Boundaries on LLM Traffic

Multi-tenant SaaS applications that add LLM features carry a new isolation obligation on top of the database and storage isolation the platform already enforces. The pre-LLM isolation model held: rows have a tenant_id, queries filter by tenant_id, storage objects sit under tenant-prefixed paths. The model is well understood and testable. LLM features introduce four new paths that carry tenant data outside the database's tenant filter: the prompt, the retrieval context, the agent tool call, and the response. Each path is a way for tenant A's data to reach tenant B's context without a database join between them. According to Zscaler's ThreatLabz 2026 AI Threat Report, enterprises moved 18,033 TB of data into AI tools in the past year, up 93% year over year. In multi-tenant SaaS, the delta between tenants is where the isolation breaks.

I want to walk through the four isolation domains, the enforcement patterns at the AI gateway, and the audit records that demonstrate the isolation held across the audit period.

The four isolation domains

Prompt isolation covers the request body the application sends to the LLM provider. The prompt contains data from the tenant's context: their customer records, their conversation history, their business rules, their system-prompt-carrying tenant configuration. The isolation obligation is that the prompt for a request executing under tenant A's identity contains no data from tenant B's context.

Retrieval isolation covers the RAG pipeline. The vector store holds embeddings from documents each tenant contributed. A retrieval query executed for tenant A has to return only vectors from tenant A's documents. The vector store's query path has to enforce the tenant filter the same way the database enforces its tenant filter.

Tool call isolation covers the agent's outbound calls to systems that hold tenant data. When an agent runs under tenant A's identity and calls the tenant's CRM, the CRM call has to authenticate as tenant A and access only tenant A's records. The tool call has to preserve the tenant identity across the boundary.

Response isolation covers the LLM output. The model's response has to be scoped to the tenant. Model providers do not have a tenant model; the model returns whatever text it produces. The application (or the gateway) has to ensure the response does not contain data the model surfaced from another tenant's context inadvertently.

The prompt isolation enforcement pattern

Prompt isolation starts at the application. When the application constructs the prompt, the application selects the context that belongs to the tenant. Selection is a database query, a cache lookup, or a session-scoped variable, and each has to filter by tenant.

The gateway adds a second control. The gateway resolves the tenant identity from the request (from the enterprise SSO token, from the API key, or from a tenant header). The gateway's classifier scans the prompt for data patterns that would indicate cross-tenant content: another tenant's customer identifier prefix, another tenant's account name, another tenant's document identifier.

The gateway's classifier can be pattern-based (regex on known tenant identifier prefixes) or embedding-based (semantic similarity to tenant-bound content). Pattern-based classifiers are deterministic and cheap; embedding-based classifiers cover more content but require the tenant's authoritative embeddings.

When the classifier flags a cross-tenant pattern, the gateway denies the request and logs the deny event. The application's next request has to originate from the correct tenant scope. The deny event is a signal to the SOC that the application has an isolation defect.

The retrieval isolation enforcement pattern

RAG pipelines add a metadata field to each vector: the tenant identifier. The query engine has to include the tenant filter in every query. The filter is not optional and is not the application's choice at query time.

The gateway enforces the filter at the retrieval boundary. When the application calls the vector store through the gateway, the gateway inspects the query and rejects any query that does not include the tenant filter. The gateway can also add the filter to queries that omit it, using the tenant identity from the request context.

The vector store's own access model has to align with the gateway's enforcement. If the vector store's API allows queries without the tenant filter, the gateway is the sole control. If the vector store's own access model enforces tenant filtering, the gateway is a second control the auditor can verify independently.

The tool call isolation enforcement pattern

Agent tool calls to downstream systems (CRM, database, ticketing system, file store) have to authenticate as the tenant. Two patterns satisfy the requirement.

Tenant-scoped credentials. The tenant configures per-tenant credentials for each downstream system. The agent uses the tenant's credentials to call the system. The system authenticates the credentials and returns only the tenant's data.

Delegation. The application authenticates to the downstream system as the platform and asserts the tenant identity through a delegation token. The downstream system's authorization applies the tenant scope to the query.

The gateway's role in tool call isolation is to enforce the credentials selection. The gateway resolves the tenant identity, retrieves the tenant's credentials from the credential store, and forwards the tool call with the resolved credentials. The application never handles the raw credentials directly. This pattern also constrains the blast radius of a compromised agent: the agent can only reach systems the gateway allows for the current tenant identity.

The response isolation enforcement pattern

Response isolation runs against the model output. The gateway inspects the response for content the tenant should not see: content from other tenants' contexts that the model surfaced, content that references internal platform information, content that leaks the system prompt.

The response classifier can be rule-based (deny responses containing known tenant identifier patterns from other tenants) or embedding-based (deny responses semantically close to content the tenant does not own).

The response classifier is more difficult to implement well than the prompt classifier because the response is free-form generated text. False positives on legitimate responses reduce the feature's utility. False negatives on cross-tenant content are the isolation failure the classifier is supposed to catch.

Most deployments layer response classification with a tighter policy on high-sensitivity tenants and a looser policy on the rest. High-sensitivity tenants (regulated industries, enterprise contracts with specific isolation SLAs) get response classification with tighter thresholds and human review escalation.

The audit records that demonstrate isolation

The audit records prove the isolation held over the audit period. The records have to answer three questions.

Which tenant executed each request? The gateway records the tenant identity resolved at request time.

Did the request contain cross-tenant content? The gateway records the classifier verdict on each request.

Did the response contain cross-tenant content? The gateway records the response classifier verdict on each request.

The record series has to be queryable per tenant. The tenant's own compliance officer can request an audit of the tenant's requests, and the platform has to produce the record excerpt for the tenant without exposing other tenants' records. The tenant-scoped audit access is itself an isolation obligation.

The interaction with SOC 2 and ISO 42001

SOC 2 Common Criteria CC6 (logical access controls) applies to tenant isolation. The auditor tests whether the platform has implemented controls that maintain tenant isolation across the AI request path. The gateway's tenant identity resolution, the classifier verdicts, and the audit records all serve as evidence.

ISO 42001 Annex A.8.4 (responsible use logging) and A.9.4 (intended use monitoring) apply to the tenant isolation records the gateway produces. The auditor samples the records and confirms the isolation held.

Confidentiality and privacy trust services criteria (when the SOC 2 engagement includes them) add specific isolation obligations. The platform's contractual commitments to specific tenants (isolation SLAs in enterprise agreements) also add obligations the audit has to verify.

The interaction with EU AI Act Article 26

Article 26 assigns the deployer specific obligations. In a multi-tenant SaaS setup, each tenant is a separate deployer under the AI Act. The platform provides the AI system, and the tenant deploys the AI system within its own environment.

The platform's tenant isolation architecture is what allows each tenant to satisfy its own Article 26 obligation. The tenant needs to demonstrate that its AI system's operation is isolated from other tenants. The platform's audit records support the tenant's demonstration.

Enterprise contracts increasingly include specific Article 26 provisions. The tenant asks the platform to warrant tenant isolation and to produce audit records the tenant can share with its own auditor. The platform's contract terms and the audit evidence have to align.

DeepInspect

The DeepInspect gateway resolves the tenant identity from the request, applies the tenant-scoped policy to the prompt, enforces the tenant filter on the retrieval query, selects the tenant credentials for the tool call, and inspects the response for cross-tenant content. The per-decision audit record captures the tenant identity and the classifier verdicts across the four isolation domains.

The gateway's tenant-scoped audit access lets each tenant retrieve its own records without exposing other tenants' records. The platform's compliance officer can produce the full record series for internal audits and the tenant-specific subset for tenant-facing audits.

If your platform is adding LLM features to a multi-tenant SaaS product and needs to demonstrate isolation to enterprise buyers or regulators, book a technical deep dive at deepinspect.ai.

Frequently asked questions

How does tenant isolation apply to shared foundation models?

The foundation model (GPT-5, Claude 4, Gemini 3) is shared across all tenants of the platform. The isolation applies to the data that flows through the model, not to the model itself. Each tenant's prompt, retrieval context, tool call, and response are scoped to the tenant. The model has no tenant awareness; the platform enforces the tenant boundary at the application and gateway layers.

Can we use a separate model deployment per tenant?

Some enterprise contracts require per-tenant model deployment (dedicated instances on AWS Bedrock, Azure OpenAI, or GCP Vertex). Per-tenant deployment adds a second layer of isolation and simplifies the compliance story, but multiplies the cost. Most multi-tenant SaaS starts with a shared deployment and moves specific tenants to dedicated deployments based on contract requirements.

What happens if the model returns content that leaks the system prompt?

System prompt leakage is a common attack pattern. The gateway's response classifier can catch known system prompt fragments in the response. The application's system prompt should not contain tenant-specific information; tenant-specific configuration goes in the prompt context rather than the system prompt.

How do we handle tenants that share integrations with the same downstream vendor?

The tool call isolation applies at the tenant credential layer. Each tenant configures its own credentials to the downstream vendor. Two tenants of the platform who both integrate with the same vendor use separate credentials. The gateway selects the correct credentials based on the tenant identity of the request.

Do we need to provide tenant-specific model fine-tuning?

Model fine-tuning per tenant is expensive and rarely necessary for isolation. Most multi-tenant deployments use a shared model with tenant-scoped context. Fine-tuning is warranted when tenant-specific vocabulary or task requirements benefit from the customization, not for isolation reasons.