Document-Level Security in GPT-RAG (Azure AI Search ACLs)

**Why are we doing this?**
To ensure GPT-RAG enforces **document-level access control** at query time, so users only retrieve content they are explicitly authorized to see. This is critical for enterprise, regulated, and compliance-driven deployments where data exposure must follow identity, group membership, and resource-level permissions.

By integrating Azure AI Search’s native **ACL and RBAC-based security filtering**, GPT-RAG can provide secure multi-tenant and enterprise-grade search experiences without duplicating authorization logic in the application layer. This reduces the risk of data leakage, simplifies compliance, and aligns GPT-RAG with Microsoft-recommended security patterns.

**What does it do?**

* **Native Azure AI Search ACL enforcement** – Uses built-in query-time access control based on `userIds`, `groupIds`, and/or `rbacScope` metadata.
* **Permission-aware indexing** – Ensures permission metadata is ingested alongside content (via indexers or push APIs).
* **Automatic security filters** – Relies on Azure AI Search to dynamically append internal security filters at query time.
* **Identity propagation** – Passes the user token through GPT-RAG to Azure AI Search using `x-ms-query-source-authorization`.
* **Multi-source compatibility** – Supports ADLS Gen2, Blob Storage, and SharePoint permission models.
* **Safe-by-default behavior** – Prevents unauthorized results from being returned even when service keys are used.
* **Debug and troubleshooting support** – Enables elevated-read mode for administrators to diagnose permission-related issues.

**Technical Guidelines**

* Permission metadata must be stored in **filterable string fields** in the index.
* GPT-RAG must propagate the **end-user identity token** to Azure AI Search using `x-ms-query-source-authorization`.
* The Orchestrator must not implement custom ACL filtering logic; authorization must be enforced by Azure AI Search.
* Public content must be explicitly modeled (for example, “Everyone” or equivalent).
* Queries without a valid user token must not return ACL-protected content.
* Elevated-read mode (`x-ms-enable-elevated-read: true`) must only be used for debugging and require a dedicated custom role.
* Indexers or ingestion pipelines must normalize and validate ACL metadata format.
* The solution must support both:

  * POSIX-style ACLs (user/group permissions)
  * RBAC scopes (container-level access)


**References**

* https://learn.microsoft.com/en-us/azure/search/search-query-access-control-rbac-enforcement

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document-Level Security in GPT-RAG (Azure AI Search ACLs) #416

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Document-Level Security in GPT-RAG (Azure AI Search ACLs) #416

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions