-
Notifications
You must be signed in to change notification settings - Fork 22
improvement doc #387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
improvement doc #387
Conversation
WalkthroughA new GroveDB improvement proposal document was added detailing hierarchical probabilistic subtree filters for private query planning. It introduces the merK_with_filters subtree type with Bloom-filter-based union-composable filters, client protocol specifications, and implementation guidance including security and deployment considerations. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes This is an informational architectural proposal document without code implementation. Review effort focuses on validating technical soundness, design clarity, and completeness of the specification rather than functional correctness.
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
docs/grovedb-improvement-proposals/subtree-filters.md (3)
78-78: Reduce verbosity in filter invariant statement.Line 78 uses "exactly the same" which is slightly wordy. Consider tightening:
- Bitwise OR must produce exactly the same filter as inserting all child elements into an empty Bloom filter with the same (`m`,`k`,`hash family`). + Bitwise OR must produce the same filter as inserting all child elements into an empty Bloom filter with identical (`m`,`k`,`hash family`).This maintains clarity while reducing redundancy.
139-139: Reduce repetition of "exactly" in rotations discussion.Line 139 repeats "exactly" within a short span. Consider rephrasing for conciseness:
- (This is exactly the hierarchical approach used by Bloofi.) + (This mirrors the hierarchical approach used by Bloofi.)Or simply:
- (This is exactly the hierarchical approach used by Bloofi.) + (Identical to the hierarchical approach used by Bloofi.)
54-90: Terminology and filter construction sections are dense; consider expanded explanation.Sections 4 and 5.2–5.3 introduce core concepts (cut heights, filter ladders, OR-invariance, hash commitments, double hashing) in a compressed form. For readers unfamiliar with Bloom filter hierarchies, a brief intuitive explanation or diagram reference before diving into formulas (lines 82–84, 86–87) could improve accessibility. This is particularly relevant given the document's scope as an informational proposal meant to guide implementation.
For example, before line 80, consider adding 1–2 sentences explaining why this particular parameterization is chosen (e.g., "To minimize false positives while keeping filter size manageable...").
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/grovedb-improvement-proposals/subtree-filters.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/grovedb-improvement-proposals/subtree-filters.md
[style] ~78-~78: ‘exactly the same’ might be wordy. Consider a shorter alternative.
Context: ...OR BF(right(v)) Bitwise OR must produce exactly the same filter as inserting all child elements ...
(EN_WORDINESS_PREMIUM_EXACTLY_THE_SAME)
[style] ~139-~139: Consider an alternative for the overused word “exactly”.
Context: ...iant** guarantees correctness. (This is exactly the hierarchical approach used by Bloof...
(EXACTLY_PRECISELY)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Code Coverage
- GitHub Check: Compilation errors
- GitHub Check: Tests
- GitHub Check: Linting
🔇 Additional comments (3)
docs/grovedb-improvement-proposals/subtree-filters.md (3)
34-51: Well‑structured motivation and prior work.The Abstract, Motivation, and Prior Work sections clearly establish the problem (privacy in GroveDB queries), the solution (probabilistic filters), and the relevant prior art (DIP‑0016, Bloofi, BIP157/158). The rationale for adapting these techniques to index layers is compelling.
63-149: Technical specification is comprehensive and sound.The Specification sections (5.1–5.7) cover all essential elements: subtree type definition, filter construction with the OR-invariant, hash commitments, client protocol, parameter sizing with a concrete worked example, API additions, and update/rotation handling. The choice of Bloom filters for their OR-composability, the union-invariant property, and the connection to AVL rotation correctness is technically justified. The worked example (lines 116–123) effectively demonstrates bandwidth and parameter trade-offs.
152-181: Privacy, security, and backwards compatibility are adequately addressed.Sections 7–10 cover privacy considerations (including per-epoch salting and cover traffic), security concerns (filter integrity, DoS mitigation, false-positive bounds), backwards compatibility (opt-in per-subtree, no consensus changes), and implementation notes. These are appropriate for an informational proposal and show thoughtful risk assessment.
| @@ -0,0 +1,197 @@ | |||
| # XX — Hierarchical Probabilistic Subtree Filters for Private Query Planning in GroveDB | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace placeholder "XX" with an actual proposal number.
Line 1 contains XX — which appears to be a placeholder. Assign a concrete proposal number (e.g., GIP-001 or similar) following your project's numbering convention.
🤖 Prompt for AI Agents
In docs/grovedb-improvement-proposals/subtree-filters.md around lines 1 to 1,
the title starts with the placeholder "XX —"; replace "XX" with the correct
proposal identifier following the repository convention (for example "GIP-001"
or the next available GIP number). Update only the leading token in the title so
it reads e.g. "GIP-001 — Hierarchical Probabilistic Subtree Filters for Private
Query Planning in GroveDB", and ensure the chosen number is consistent with your
project's numbering registry (or create/confirm the new GIP number in the
proposals index).
|
|
||
| ## 1. Abstract | ||
|
|
||
| This proposal introduces a **specialized Merk‑AVL subtree type** in GroveDB whose internal nodes at selected levels carry **union‑composable approximate‑membership filters**. By choosing a filter whose **parent equals the bitwise union of its children**, the structure remains correct across inserts, deletes, and **AVL rotations**. Clients can first download and verify a **whole “cut” layer** of filters, privately test their own keys locally, and then selectively descend only into subtrees that **might** contain matches. The technique provides **predictable bandwidth** and **reduced information leakage** compared to querying subtrees directly. GroveDB’s design already supports authenticated subtrees and **cross‑tree references**; this DIP adds a verifiable, privacy‑preserving **map‑before‑fetch** capability. [oai_citation:0‡GitHub](https://github.com/dashpay/grovedb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove or restructure oai_citation artifacts.
The document contains multiple oai_citation tags (e.g., [oai_citation:0‡GitHub], [oai_citation:1‡Dash Documentation], etc.) scattered throughout. These appear to be AI-generated artifacts that should be either removed or converted to a standard markdown reference format (e.g., [1] with a numbered reference list at the end, or inline [link text](url)). The current format is non-standard and reduces readability.
If you want to keep citations, consolidate them into a numbered footnote system at the end of each section or in the References section using standard markdown syntax.
Example cleanup for the Abstract (lines 34–36):
- This proposal introduces a **specialized Merk‑AVL subtree type** in GroveDB whose internal nodes at selected levels carry **union‑composable approximate‑membership filters**. By choosing a filter whose **parent equals the bitwise union of its children**, the structure remains correct across inserts, deletes, and **AVL rotations**. Clients can first download and verify a **whole "cut" layer** of filters, privately test their own keys locally, and then selectively descend only into subtrees that **might** contain matches. The technique provides **predictable bandwidth** and **reduced information leakage** compared to querying subtrees directly. GroveDB's design already supports authenticated subtrees and **cross‑tree references**; this DIP adds a verifiable, privacy‑preserving **map‑before‑fetch** capability. [oai_citation:0‡GitHub](https://github.com/dashpay/grovedb)
+ This proposal introduces a **specialized Merk‑AVL subtree type** in GroveDB whose internal nodes at selected levels carry **union‑composable approximate‑membership filters**. By choosing a filter whose **parent equals the bitwise union of its children**, the structure remains correct across inserts, deletes, and **AVL rotations**. Clients can first download and verify a **whole "cut" layer** of filters, privately test their own keys locally, and then selectively descend only into subtrees that **might** contain matches. The technique provides **predictable bandwidth** and **reduced information leakage** compared to querying subtrees directly. GroveDB's design already supports authenticated subtrees and **cross‑tree references**; this DIP adds a verifiable, privacy‑preserving **map‑before‑fetch** capability.[1]Then add a References section entry:
+ [1] GroveDB: https://github.com/dashpay/grovedbApply this pattern consistently throughout the document.
Also applies to: 42-42, 48-48, 49-49, 50-50, 56-56, 59-59, 60-60, 79-79, 88-88, 106-106, 140-140, 146-146, 172-172, 180-180, 186-186, 192-197
🤖 Prompt for AI Agents
In docs/grovedb-improvement-proposals/subtree-filters.md (approx lines 34–197),
the document contains nonstandard `oai_citation` artifacts scattered through the
text; remove or replace each `oai_citation` token with standard Markdown links
or numeric footnote references, consolidate all citations into a single numbered
References section at the end of the document, update every inline occurrence to
use either an inline link format [link text](URL) or a numeric marker [1], [2],
etc., and ensure the References list includes the corresponding full URLs and
labels in order so all replaced markers resolve correctly and consistently
throughout the file.
Issue being fixed or feature implemented
What was done?
How Has This Been Tested?
Breaking Changes
Checklist:
For repository code-owners and collaborators only
Summary by CodeRabbit