Skip to content

Document-Level Security in GPT-RAG (Azure AI Search ACLs) #416

@placerda

Description

@placerda

Why are we doing this?
To ensure GPT-RAG enforces document-level access control at query time, so users only retrieve content they are explicitly authorized to see. This is critical for enterprise, regulated, and compliance-driven deployments where data exposure must follow identity, group membership, and resource-level permissions.

By integrating Azure AI Search’s native ACL and RBAC-based security filtering, GPT-RAG can provide secure multi-tenant and enterprise-grade search experiences without duplicating authorization logic in the application layer. This reduces the risk of data leakage, simplifies compliance, and aligns GPT-RAG with Microsoft-recommended security patterns.

What does it do?

  • Native Azure AI Search ACL enforcement – Uses built-in query-time access control based on userIds, groupIds, and/or rbacScope metadata.
  • Permission-aware indexing – Ensures permission metadata is ingested alongside content (via indexers or push APIs).
  • Automatic security filters – Relies on Azure AI Search to dynamically append internal security filters at query time.
  • Identity propagation – Passes the user token through GPT-RAG to Azure AI Search using x-ms-query-source-authorization.
  • Multi-source compatibility – Supports ADLS Gen2, Blob Storage, and SharePoint permission models.
  • Safe-by-default behavior – Prevents unauthorized results from being returned even when service keys are used.
  • Debug and troubleshooting support – Enables elevated-read mode for administrators to diagnose permission-related issues.

Technical Guidelines

  • Permission metadata must be stored in filterable string fields in the index.

  • GPT-RAG must propagate the end-user identity token to Azure AI Search using x-ms-query-source-authorization.

  • The Orchestrator must not implement custom ACL filtering logic; authorization must be enforced by Azure AI Search.

  • Public content must be explicitly modeled (for example, “Everyone” or equivalent).

  • Queries without a valid user token must not return ACL-protected content.

  • Elevated-read mode (x-ms-enable-elevated-read: true) must only be used for debugging and require a dedicated custom role.

  • Indexers or ingestion pipelines must normalize and validate ACL metadata format.

  • The solution must support both:

    • POSIX-style ACLs (user/group permissions)
    • RBAC scopes (container-level access)

References

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions