-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Description
We need to implement a query-like feature that can efficiently retrieve chunk data given a namespace and an entry key.
-- Example API (to be refined during implementation)
queryEntry ::
(HasKey a, MemPack a, Typeable a)
=> FilePath
-> Namespace
-> Key a
-> IO (Maybe a)
queryEntries ::
(HasKey a, MemPack a, Typeable a)
=> FilePath
-> Namespace
-> [Key a]
-> IO (Map (Key a) a)
queryRange ::
(HasKey a, MemPack a, Typeable a)
=> FilePath
-> Namespace
-> Key a -- start (inclusive)
-> Key a -- end (exclusive)
-> S.Stream (S.Of a) IO ()Implementation Considerations
- Index Design
- Option A: Mapping keys to chunk offsets (using INDEX records)
- Pros: Fast lookups without scanning chunks
- Cons: Additional INDEX record maintenance, consistency concerns
- Option B: Leverage sorted order + chunk metadata
- Pros: No additional INDEX record needed, leverages existing sort
- Cons: Requires scanning chunk headers, may need key range metadata
- Option C: In-memory index built on first access
- Pros: Fast subsequent queries
- Cons: Initial scan cost, memory usage
- Optimizations
- Add key range metadata to INDEX records (min/max key per chunk)
- Enable binary search over chunks for key ranges
- Skip chunks that don't contain the target key range
- Bloom filters for existence checks (also mentioned in the CIP)
- Entry-Level Access
- Currently chunks are decoded as ByteStrings with entries extracted via unpackLeftOver
- Searching for anything other than the entry key is currently unattainable?
- Streaming vs. Random Access
- Should query results be streamed or materialized?
- How to handle large result sets efficiently?
- Trade-offs between memory usage and I/O
Technical Requirements
- Performance Goals
- O(log n) chunk lookup for single key queries
- Minimize unnecessary chunk decoding
- Efficient range queries for sorted data
- API Design
- Consistent with existing Reader module patterns
- Support both single-key and batch queries
- Provide streaming interface for range queries
Implementation Steps
- Research & Design Phase
- Design query API
- Prototype Phase
- Implement basic key-based lookup (even if inefficient)
- Test with sample data
- Optimization Phase
- Implement INDEX record creation and usage for searching
- Integration Phase
- Add comprehensive tests
- Document query performance characteristics
- Update examples and user documentation
Acceptance Criteria
[ ] Query API defined and documented
[ ] Efficient key-based lookup implemented
[ ] Tests covering single key, batch, and range queries
[ ] Documentation with usage examples
Metadata
Metadata
Assignees
Labels
No labels