Skip to content

Conversation

@TCeason
Copy link
Collaborator

@TCeason TCeason commented Nov 3, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Prevents dropping security policies (masking and row access policies) that are still in use by tables, with robust concurrency protection.

Changes

  1. Prevent dropping in-use policies: Add constraint checks and dedicated error types (MaskingPolicyError, RowAccessPolicyError) to block dropping policies still bound to tables

  2. Move reference management to table operations:

  • drop_table: delete table-policy references when dropping
  • undrop_table: verify policies exist, cleanup stale references, restore valid ones
  • drop_policy: simplified from incremental cleanup (N/100+1 txns) to single check (1 txn)
  1. Concurrent safety: Add txn_cond_eq_keys_with_prefix conditions to guard against race conditions during policy drop

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@TCeason TCeason marked this pull request as draft November 3, 2025 03:42
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Nov 3, 2025
- raise constraint errors when masking/row access policies remain bound to tables
- introduce dedicated masking/row-access policy error types and surface them via APIs
- extend SQL logic tests to cover drop failures for active policies
@TCeason TCeason force-pushed the drop_policy branch 8 times, most recently from a94a180 to 920f807 Compare November 4, 2025 01:27
  - Add a KeysWithPrefix transaction condition to masking/row policy drops so freshly added bindings abort the txn instead of leaving dangling references.
  - Replace the duplicated usage structs/functions with a shared collect_policy_usage, reusing the gathered table updates and avoiding the extra clones.
  - Introduce txn_cond_eq_keys_with_prefix helper for prefix cardinality checks we can reuse elsewhere.
@TCeason TCeason marked this pull request as ready for review November 4, 2025 02:50
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AI has no idea what it's actually building. You're still the one who needs to understand the problem and verify the solution makes sense.

@drmingdrmer reviewed 3 of 13 files at r1, all commit messages.
Reviewable status: 3 of 13 files reviewed, all discussions resolved (waiting on @dantengsky and @KKould)

@TCeason TCeason marked this pull request as draft November 4, 2025 05:26
@TCeason TCeason force-pushed the drop_policy branch 2 times, most recently from a0a7c91 to 7ab5725 Compare November 4, 2025 06:25
@TCeason TCeason force-pushed the drop_policy branch 2 times, most recently from 645e77c to 5de5f1f Compare November 4, 2025 07:08
@TCeason TCeason requested a review from drmingdrmer November 4, 2025 07:09
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 1 of 13 files at r1, 4 of 11 files at r2, all commit messages.
Reviewable status: 7 of 15 files reviewed, 3 unresolved discussions (waiting on @dantengsky and @KKould)


src/meta/api/src/data_mask_api_impl.rs line 354 at r2 (raw file):

) -> Result<MaskPolicyUsage, MetaTxnError> {
    collect_policy_usage(kv_api, tenant, policy_id).await
}

This wrapper function is weird. Why do we need that?

Code quote:

async fn collect_mask_policy_usage(
    kv_api: &(impl kvapi::KVApi<Error = MetaError> + ?Sized),
    tenant: &Tenant,
    policy_id: u64,
) -> Result<MaskPolicyUsage, MetaTxnError> {
    collect_policy_usage(kv_api, tenant, policy_id).await
}

src/meta/api/src/security_policy_usage.rs line 177 at r2 (raw file):

    K: PolicyBinding + Clone + Send + Sync + 'static,
    K::ValueType: FromToProto + Send,
    E: From<MetaError>,

This function is used only once. Why is the generic type parameter is necessary?

Code quote:

pub(crate) async fn collect_policy_usage<K, E>(
    kv_api: &(impl kvapi::KVApi<Error = MetaError> + ?Sized),
    tenant: &Tenant,
    policy_id: u64,
) -> Result<PolicyUsage<K>, E>
where
    K: PolicyBinding + Clone + Send + Sync + 'static,
    K::ValueType: FromToProto + Send,
    E: From<MetaError>,

src/query/ee/src/row_access_policy/row_access_policy_handler.rs line 56 at r2 (raw file):

        req: DropRowAccessPolicyReq,
    ) -> Result<()> {
        let dropped = (meta_api.drop_row_access(&req.name).await?)?;

The braces is weird. Why do we need that?

Copy link
Collaborator Author

@TCeason TCeason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 15 files reviewed, 3 unresolved discussions (waiting on @dantengsky, @drmingdrmer, and @KKould)


src/meta/api/src/data_mask_api_impl.rs line 354 at r2 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

This wrapper function is weird. Why do we need that?

If not , call collect_policy_usage in mask will write as collect_policy_usage::<MaskPolicyTableIdIdent, MetaTxnError>(self, tenant, policy_id).await?. This enables automatic type deduction. It can be changed to manually specify the type.


src/meta/api/src/security_policy_usage.rs line 177 at r2 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

This function is used only once. Why is the generic type parameter is necessary?

This function is used by:

  • Mask policies via MaskPolicyTableIdIdent binding type
  • Row access policies via RowAccessPolicyTableIdIdent binding type

src/query/ee/src/row_access_policy/row_access_policy_handler.rs line 56 at r2 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

The braces is weird. Why do we need that?

will modify as wait??

@TCeason TCeason marked this pull request as ready for review November 4, 2025 08:28
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be critical of AI-generated code. This PR is a perfect example—it's full of unclear concepts that add no value and should be removed.

@drmingdrmer reviewed all commit messages.
Reviewable status: 5 of 15 files reviewed, 5 unresolved discussions (waiting on @dantengsky and @KKould)


src/meta/api/src/security_policy_usage.rs line 33 at r3 (raw file):

/// Represents a binding record that links a security policy to a table.
#[derive(Clone)]
pub(crate) struct PolicyBindingEntry<I> {

and is PolicyBindingEntry really needed? it seems like just a duplication of SeqV.

Code quote:

PolicyBindingEntry

src/meta/api/src/security_policy_usage.rs line 175 at r3 (raw file):

) -> Result<PolicyUsage<K>, MetaError>
where
    K: PolicyBinding + Clone + Send + Sync + 'static,

And why is PolicyBinding required? it is used only once.

Copy link
Collaborator Author

@TCeason TCeason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 15 files reviewed, 5 unresolved discussions (waiting on @dantengsky, @drmingdrmer, and @KKould)


src/meta/api/src/security_policy_usage.rs line 33 at r3 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

and is PolicyBindingEntry really needed? it seems like just a duplication of SeqV.

Yes we can directly use for (key, seq) in bindings replace it. No need to create a new struct


src/meta/api/src/security_policy_usage.rs line 175 at r3 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

And why is PolicyBinding required? it is used only once.

twice in drop_data_mask and collect_policy_usage

Move the table-policy reference lifecycle management from drop policy operations
to drop/undrop table operations.  This simplifies the drop policy logic and improves
performance.

Key changes:
- drop_table: delete all table-policy references when marking table as dropped
- undrop_table: verify policies still exist, cleanup stale references, restore valid ones
- drop_policy: simplified from incremental cleanup (N/100+1 txns) to single existence check (1 txn)

All operations are protected by seq conditions for concurrent safety using optimistic
locking pattern.
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 1 of 11 files at r2, 2 of 6 files at r3, 1 of 4 files at r4, 4 of 6 files at r5, 7 of 13 files at r6.
Reviewable status: 14 of 22 files reviewed, 6 unresolved discussions (waiting on @dantengsky and @KKould)


src/meta/api/src/data_mask_api_impl.rs line 187 at r5 (raw file):

            // Ensure no new references were created
            txn.condition
                .push(txn_cond_eq_keys_with_prefix(&table_policy_ref_prefix, 0));

much cleaner! great move!

Code quote:

            let tenant = name_ident.tenant();
            let table_policy_ref_prefix = DirName::new(MaskPolicyTableIdIdent::new_generic(
                tenant.clone(),
                MaskPolicyIdTableId {
                    policy_id,
                    table_id: 0,
                },
            ));

            // List all table-policy references
            let table_policy_refs = self.list_pb_vec(&table_policy_ref_prefix).await?;

            // Policy is in use - cannot drop
            if !table_policy_refs.is_empty() {
                return Ok(Err(MaskingPolicyError::policy_in_use(
                    tenant.tenant_name().to_string(),
                    name_ident.data_mask_name().to_string(),
                )));
            }

            // No references - drop the policy
            let id_ident = seq_id.data.into_t_ident(tenant);
            let mut txn = TxnRequest::default();

            // Ensure no new references were created
            txn.condition
                .push(txn_cond_eq_keys_with_prefix(&table_policy_ref_prefix, 0));

src/meta/api/src/schema_api.rs line 256 at r5 (raw file):

                    table_id,
                },
            )));

There may be duplicated entries, need to de-dup the entries to delete

Code quote:

    for policy_map in tb_meta.column_mask_policy_columns_ids.values() {
        txn.if_then
            .push(txn_op_del(&MaskPolicyTableIdIdent::new_generic(
                tenant.clone(),
                MaskPolicyIdTableId {
                    policy_id: policy_map.policy_id,
                    table_id,
                },
            )));

src/meta/api/src/schema_api.rs line 597 at r5 (raw file):

}

async fn cleanup_missing_policies_on_undrop(

this function should be removed right?

Copy link
Collaborator Author

@TCeason TCeason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 14 of 22 files reviewed, 6 unresolved discussions (waiting on @dantengsky, @drmingdrmer, and @KKould)


src/meta/api/src/schema_api.rs line 256 at r5 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

There may be duplicated entries, need to de-dup the entries to delete

Good optimize. I will fix it.


src/meta/api/src/schema_api.rs line 597 at r5 (raw file):

Previously, drmingdrmer (张炎泼) wrote…

this function should be removed right?

cleanup_missing_policies_on_undrop is still required. When a table is dropped we delete the policy→table reference keys (MaskPolicyTableIdIdent / RowAccessPolicyTableIdIdent). During undrop the table meta we revive still lists those policy IDs, so we must recreate any surviving
references (and prune ones whose policies were removed meanwhile). That reconciliation happens exclusively inside cleanup_missing_policies_on_undrop (src/meta/api/src/schema_api. rs:597), and it also wires the necessary txn conditions/ops for the undrop transaction (src/meta/api/src/
schema_api.rs:520). Removing it would leave undropped tables with stale policy metadata and missing reverse-reference keys.

Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drmingdrmer reviewed 1 of 13 files at r1, 1 of 4 files at r4, 4 of 13 files at r6, 2 of 3 files at r7, 1 of 1 files at r10, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @dantengsky and @KKould)

@TCeason TCeason merged commit 0efebe5 into databendlabs:main Nov 6, 2025
250 of 254 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants