-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Goal
Convert the existing PoC to a working MVP of Aruna V3: a P2P distributed scientific data management system supporting single-realm federation with S3-compatible storage, CRDT-based metadata, and role-based authorization.
Core capabilities:
- Full federation where any node can serve as entry point and provide comprehensive answers by coordinating with peers
- Data replication and redundancy across configurable N nodes ensuring availability when individual nodes go offline
- Content-addressed immutable data storage with S3-compatible interface (PUT/GET/LIST/multipart)
- Collaborative metadata editing using Automerge CRDTs with RO-Crate JSON-LD format
- Full-text and structured search across distributed metadata with bitmap-based authorization filtering
- Path-based role permissions with wildcard support and deny-overrides-allow semantics
- OIDC-based authentication with realm-signed tokens
Federation behavior:
- A user connecting to Node A can discover and access data stored on Node B (subject to authorization)
- Search queries return results from across the realm, deduplicated and authorization-filtered
- Metadata updates propagate to all nodes holding replicas via Automerge sync
- Data objects replicate according to configured replication factor
- Node discovery via DHT allows dynamic cluster membership
Workplan
-
aruna-network - P2P networking foundation [feat] aruna-network: Iroh P2P network implementation #221
- Iroh-based connectivity and connection management
- Gossip protocol for lightweight message broadcast
- DHT integration for node and resource discovery
- Automerge sync service for CRDT document synchronization
- Bao sync service for verified streaming of content-addressed data
- Trait abstractions enabling mock injection for testing
-
aruna-auth - Authentication and authorization
- OIDC token validation against configurable providers
- Permission engine using globset for path pattern matching
- Heed/LMDB storage for permission rules
- Roaring bitmap generation and maintenance for search authorization
- Path-based wildcard support (middle
*for single segment, trailing**for recursive) - Deny-overrides-allow rule evaluation
-
aruna-orga - Organizational structure management
- Realm, group, user, and node entities as Automerge documents
- CRDT merge semantics for concurrent organizational changes
- Integration with aruna-network for document sync
- Callbacks to aruna-auth when group/role changes require bitmap rebuilds
- User-to-group membership management
- Role assignment within groups
-
aruna-data - Content-addressed data storage
- S3 interface implementation using s3s library
- BLAKE3 content hashing for all stored objects
- OpenDAL backend abstraction for pluggable storage
- Multipart upload handling with node-local state
- S3 versioning tracking hash sequences per key
- Integration with aruna-network's Bao sync for verified replication
- Replication coordination based on configured factor
-
aruna-metadata - Scientific metadata management
- RO-Crate JSON-LD document structure
- Automerge CRDT storage for collaborative editing
- Tantivy index for full-text and structured search
- Bitmap integration for authorization-filtered search results
- DHT publication of resource ULID to commit hash mappings
- Integration with aruna-network's Automerge sync
-
aruna - Application composition
- Axum HTTP API exposing metadata and management operations
- S3 endpoint routing to aruna-data
- OpenAPI documentation via utoipa
- Configuration management (realm keys, OIDC providers, replication settings)
- Startup orchestration and graceful shutdown
- Health checks and basic observability
-
Integration and testing
- Multi-node test harness running 3-5 instances in single binary
- End-to-end scenarios covering federation, replication, and failure recovery
- Performance baseline measurements
Definition of Done
- All sub-issues for aruna-network completed and tested [feat] aruna-network: Iroh P2P network implementation #221
- All sub-issues for aruna-auth completed and tested
- All sub-issues for aruna-orga completed and tested
- All sub-issues for aruna-data completed and tested
- All sub-issues for aruna-metadata completed and tested
- All sub-issues for aruna main crate completed and tested
- Multi-node cluster (3-5 nodes) forms, replicates data, and serves federated queries
- End-to-end test suite passes consistently
- Fuzz test corpus established for protocol and parsing code
Test Concept
Unit testing (per crate):
Each crate maintains unit tests covering its internal logic. Network and storage dependencies are abstracted behind traits allowing mock injection. Target coverage >80% for core logic paths.
Fuzz testing (per crate where applicable):
- aruna-network: Fuzz protocol message deserialization to catch panics and undefined behavior on malformed input. Test gossip message handling with arbitrary payloads. [feat] aruna-network: Iroh P2P network implementation #221
- aruna-auth: Fuzz path pattern matching with arbitrary path strings and patterns. Fuzz permission rule combinations to find evaluation edge cases.
- aruna-orga: Fuzz Automerge document merging with arbitrary change sequences. Test handling of corrupted or truncated sync messages.
- aruna-data: Fuzz S3 request parsing (headers, query parameters, XML bodies). Fuzz multipart upload sequences (out-of-order parts, duplicate parts, missing parts).
- aruna-metadata: Fuzz RO-Crate JSON-LD parsing with malformed documents. Fuzz Tantivy query parsing with arbitrary search strings.
Fuzz tests use cargo-fuzz or arbitrary crate. Goal is discovering unhappy paths and crash conditions before they occur in production. Fuzz corpus maintained in repository.
Integration testing (aruna crate):
Multi-node integration tests run multiple Aruna instances within a single test binary, each bound to different ports. Tests verify cluster formation, cross-node data access, search aggregation, permission propagation, and behavior during simulated node failures.
End-to-end testing:
Scripted scenarios exercising the full user journey: authentication, group creation, data upload with replication verification, concurrent metadata editing, search with authorization filtering, permission revocation, and continued access during node failure.
Risks / Blockers
Technical risks:
- Iroh ecosystem evolving; API changes may require adaptation. Mitigation: pin versions, abstract behind traits.
- Automerge performance on large documents untested at scale. Mitigation: benchmark early, consider document splitting if needed.
- Roaring bitmap memory usage with many roles and large indexes. Mitigation: monitor during testing, evaluate sparse optimizations if needed.
Design questions to resolve:
- Message routing from network layer to domain crates: channel-based dispatch with typed message enums appears most testable, but needs validation.
- Replication factor configuration: per-realm default with per-resource override, exact configuration format TBD.
- DHT query caching duration: affects consistency vs network overhead tradeoff.
Dependencies:
- OIDC provider availability for authentication testing (can use mock provider for CI).
- iroh-dht-experiment stability (experimental status acknowledged).
Scope boundaries:
- Multi-realm federation explicitly out of scope; single realm only.
- Policy engine (DR-007) deferred to post-MVP.
- Compute orchestration, TES, DRS (DR-008) deferred to post-MVP.
- Full S3 API compatibility not targeted; focus on core operations.