← Blog

LLM Vector Store Access Control: The Filters That Have to Run on Every RAG Query

The vector store holds embeddings the enterprise's users, tenants, and documents contributed. Every retrieval-augmented generation query has to run access-control filters against the store before the retrieval reaches the LLM context. This piece walks through the filter design that survives multi-tenant SaaS, cross-department access, and time-bounded document lifecycle: per-vector metadata, query-time filter injection, retrieval-response inspection, and the audit records that prove the filter held on every query.

ByParminder Singh· Founder & CEO, DeepInspect Inc.
Platform & Architectureragvector-storeaccess-controlai-securityai-gatewayllm-dlp
LLM Vector Store Access Control: The Filters That Have to Run on Every RAG Query

The Zscaler ThreatLabz 2026 AI Threat Report, published June 17, 2026, tracked 18,033 TB of data moving into AI tools over the year, a 93% year-over-year increase. A significant share of that data landed in vector stores that feed retrieval-augmented generation. The vector store holds embeddings the enterprise's users, tenants, and documents contributed. Every RAG query has to run access-control filters against the store before the retrieval reaches the LLM context. I want to walk through the filter design that survives multi-tenant SaaS, cross-department access, and time-bounded document lifecycle.

The four vector-store access failures

Four failure modes appear on production vector stores that carry enterprise data.

Cross-tenant retrieval. A query executed under tenant A's identity returns a vector from tenant B's document. The store's query engine did not include the tenant filter, and the retrieval populated tenant A's prompt with tenant B's data.

Cross-department retrieval. A query from a marketing user returns a vector from an HR-restricted document. The store had no department metadata, or the query did not include the department filter, or the department policy at the store did not match the enterprise's policy.

Expired-document retrieval. A query returns a vector from a document the enterprise's retention policy already required deleted. The store kept the vector after the source document was purged from the primary system.

Access-revoked retrieval. A query returns a vector from a document the requesting user was previously authorized to see but is no longer. The store's authorization snapshot was stale.

Each failure produces a specific incident category. Cross-tenant retrieval is the multi-tenant SaaS incident that triggers customer contract clauses. Cross-department retrieval is the internal-policy incident that triggers HR or legal. Expired-document retrieval is the retention-policy incident that triggers a regulatory review. Access-revoked retrieval is the offboarding gap that triggers a security incident.

The per-vector metadata schema

The metadata attached to each vector is the substrate the filters run against. A minimal enterprise schema carries four fields.

Tenant identifier. The tenant the document belongs to. Required for multi-tenant SaaS.

Access-control list. The user or group identifiers authorized to see the document. Can be a list or a reference to an ACL table the enterprise's identity system maintains.

Classification level. The document's classification (public, internal, confidential, restricted). Maps to the enterprise's data classification policy.

Retention timestamp. The date after which the document has to be deleted. The retention policy attaches the timestamp when the document lands in the store.

Additional fields depend on the enterprise. Department, project, geography, and regulatory scope all show up in production schemas.

The query-time filter injection pattern

The filter injection pattern requires every query to carry the identity context of the caller, and the store's query engine adds the filter based on the context.

The gateway between the application and the vector store resolves the caller's identity from the request, computes the filter (tenant, ACL, classification level, retention date), and injects the filter into the query. The store returns only vectors that match the filter.

The pattern removes the burden from the application code to construct the correct filter on every query. The application declares the query's intent (which document type, which similarity threshold), and the gateway attaches the identity-driven filter.

The pattern also prevents the application from bypassing the filter by mistake. An application bug that omits the tenant filter cannot reach the store, because the gateway attaches it.

The retrieval-response inspection layer

The inspection layer runs against the retrieval results before they reach the LLM's context. The layer applies a second check that the returned vectors match the caller's authorization.

The pattern is a defense in depth against the query-time filter being incorrect. The inspection compares the returned vectors' metadata against the caller's identity independently, and rejects the retrieval if the metadata does not match.

The pattern also catches the case where the store's index is stale relative to the enterprise's authorization state. A user whose access was revoked five minutes ago might still see the pre-revocation authorization in the store's index. The inspection layer runs against a fresher authorization state and catches the discrepancy.

The audit record captures the retrieval count, the metadata distribution of the returned vectors, and any rejections the inspection layer applied.

The access-revocation propagation pattern

The revocation propagation pattern moves an access change from the identity provider to the vector store's filter behavior in a bounded time window.

Option one: eventual propagation. The store subscribes to the identity provider's revocation stream and updates its authorization snapshot on each revocation event. The propagation window is on the order of seconds.

Option two: query-time re-check. The store defers the authorization to query time and asks the identity provider for the caller's current authorization on each query. The pattern adds latency but removes the propagation window.

Option three: gateway-side check. The gateway between the application and the store performs the authorization check independently of the store's snapshot. The store's snapshot serves as a coarse filter, and the gateway's check serves as the fine-grained filter. The pattern combines the store's index performance with the gateway's fresh authorization state.

Enterprise deployments typically run option three because it produces the tightest revocation window without the query-time latency of option two.

The retention-driven purge pattern

The retention purge pattern deletes vectors from the store when the source document's retention date passes.

The pattern runs as a background job that scans the store for vectors past their retention timestamp. The job deletes the vectors and records the deletion in the audit log. The retention policy operator can query the log to confirm the deletion fired.

For regulatory retention policies (GDPR Article 17 right to erasure, HIPAA's PHI retention rules), the pattern has a shorter window than the enterprise's general retention. The purge job runs at a higher frequency for the regulated categories, and the audit log carries the regulation's identifier for the erasure record.

The pattern interacts with the RAG index's own indexing. Deleting a vector removes it from the similarity index, and the store rebuilds the index against the smaller vector set. The rebuild has a small window during which a query might still return the deleted vector, and the retrieval-response inspection layer catches the residual.

The audit records that prove the filter held

The audit records answer three questions the reviewer asks about each RAG query.

Which caller made the query. The record captures the caller identity, the session identifier, and the request identifier.

Which filter was applied. The record captures the tenant, ACL, classification, and retention filters the gateway attached to the query. The reviewer can verify the filter matches the caller's authorization.

Which vectors the store returned. The record captures the metadata (not the content) of each returned vector. The reviewer can verify the metadata matches the filter and that no cross-tenant or cross-classification vector reached the caller's context.

The record set is per-query, tied to the LLM request that consumed the retrieval, and stored in a tamper-evident log.

The interaction with EU AI Act and HIPAA

EU AI Act Article 12 requires automatic recording of events over the lifetime of the system to support traceability. The per-query audit record is the artifact that satisfies the traceability requirement for the RAG layer of a high-risk AI system.

HIPAA's Security Rule requires audit controls that record and examine activity in systems containing PHI. A RAG deployment that carries PHI vectors produces per-query audit records tied to the caller identity and the accessed documents. The record set satisfies the audit control requirement, and the retention-driven purge pattern satisfies the retention requirement.

DeepInspect

This is the gap DeepInspect closes at the RAG layer. DeepInspect sits inline between the application and the vector store, and between the vector store and the LLM. The gateway attaches identity-driven filters at query time, runs the retrieval-response inspection layer, and records the per-query audit tuple.

The gateway performs the gateway-side authorization check independently of the store's snapshot, which produces the tight revocation window enterprise deployments defend. The audit records land in a hash-chained log the reviewer can query per caller, per document, or per regulatory scope.

Book a demo today.

Frequently asked questions

What are the four vector-store access failure modes?

Cross-tenant retrieval (a query returns vectors from another tenant). Cross-department retrieval (a query returns vectors from a restricted department). Expired-document retrieval (a query returns vectors past the retention date). Access-revoked retrieval (a query returns vectors from a document the caller lost access to). Each failure produces a specific incident category the enterprise's response team has to handle.

Why does the query-time filter injection pattern beat application-attached filters?

The pattern removes the application's burden to construct the correct filter on every query. An application bug that omits the filter cannot reach the store because the gateway attaches the filter. The pattern also produces a consistent audit record because the filter is applied at the gateway boundary rather than at each application call site.

How does the retrieval-response inspection layer work?

The layer runs against the retrieval results before they reach the LLM context. The layer compares the returned vectors' metadata against the caller's identity independently. The pattern is defense in depth against a stale index or an incorrect query-time filter. The audit record captures any rejections the layer applied.

What is the tightest access-revocation window?

The gateway-side check pattern produces the tightest window because the gateway performs the authorization independently of the store's snapshot. The store's snapshot serves as a coarse filter for index performance. The gateway's fresh authorization state serves as the fine-grained filter. The pattern combines the two without the query-time latency of a per-query identity provider round trip.

How does the retention-driven purge pattern work?

A background job scans the store for vectors past their retention timestamp and deletes them. The audit log records each deletion for the retention policy operator. For regulated categories (GDPR erasure, HIPAA PHI), the job runs at a higher frequency and the log carries the regulation's identifier. The retrieval-response inspection layer catches residual reads during the index rebuild window.

What audit records satisfy Article 12 for the RAG layer?

The per-query record captures the caller identity, the filter applied, and the returned vectors' metadata. The record satisfies Article 12's traceability requirement for the RAG layer of a high-risk AI system. The record set is stored in a tamper-evident log the enterprise can produce for the market surveillance authority when requested.