-
Notifications
You must be signed in to change notification settings - Fork 2
S3UTILS-213: add CLAUDE.md file #363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,4 @@ | ||
| node_modules/ | ||
| */node_modules | ||
| **/CLAUDE.md | ||
| **/AGENTS.md |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| CLAUDE.md |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,145 @@ | ||
| # CLAUDE.md | ||
|
|
||
| This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | ||
|
|
||
| ## Project Overview | ||
|
|
||
| s3utils is a collection of utilities and scripts for managing S3-compatible storage at scale, specifically designed for Scality S3 Connector (S3C) and Zenko deployments. The project provides operational tools for Cross-Region Replication (CRR), metadata verification, Raft consensus debugging, and data integrity checks. | ||
|
|
||
| ## Common Commands | ||
|
|
||
| ### Testing | ||
|
|
||
| ```bash | ||
| # Run unit tests | ||
| yarn test:unit | ||
|
|
||
| # Run functional tests (requires MongoDB and workbench environment) | ||
| # With workbench installed, use: workbench up --env-dir ./workbench/env -d | ||
| yarn test:functional | ||
|
|
||
| # Run both test suites with coverage | ||
| yarn test:unit && yarn test:functional | ||
| ``` | ||
|
|
||
| ### Linting | ||
|
|
||
| ```bash | ||
| # Run ESLint on tracked files | ||
| yarn lint | ||
|
|
||
| # Lint with zero warnings (CI mode) | ||
| yarn lint -- --max-warnings 0 | ||
| ``` | ||
|
|
||
| ### Running Scripts | ||
|
|
||
| Most scripts are executed directly with Node.js and require environment variables for configuration: | ||
|
|
||
| ```bash | ||
| # Example: CRR existing objects | ||
| node crrExistingObjects.js bucket1,bucket2 | ||
|
|
||
| # Example: Verify bucket sproxyd keys | ||
| node verifyBucketSproxydKeys.js | ||
|
|
||
| # Example: Cleanup noncurrent versions | ||
| node cleanupNoncurrentVersions.js bucket1,bucket2 | ||
| ``` | ||
|
|
||
| Many scripts are also designed to run inside Docker containers with proper network and volume mounts (see README.md for specific examples). | ||
|
|
||
| ## Architecture Overview | ||
|
|
||
| ### Core Component Structure | ||
|
|
||
| **Script Categories:** | ||
| - **CRR Management** (`CRR/`, `crrExistingObjects.js`, `autoRetryFailedCRR.js`, `requeueFailedCRRCronJob.js`): Cross-Region Replication utilities for triggering and managing replication of existing objects | ||
| - **Metadata Verification** (`verifyBucketSproxydKeys.js`, `VerifyBucketSproxydKeys/`): Validates sproxyd keys referenced in S3 metadata exist on storage backend | ||
| - **Raft Consensus Tools** (`CompareRaftMembers/`): Compares metadata databases between Raft leader and followers to detect and repair divergences | ||
| - **Data Cleanup** (`cleanupBuckets.js`, `cleanupNoncurrentVersions.js`, `removeDeleteMarkers.js`): Utilities for managing object versions and lifecycle | ||
| - **Replication Verification** (`VerifyReplication/`): Compares source and destination buckets to verify replication completeness | ||
| - **Object Repair** (`ObjectRepair/`, `repairDuplicateVersions*.js`): Repairs metadata inconsistencies and duplicate keys | ||
| - **Monitoring & Reporting** (`service-level-sidecar/`, `DataReport/`, `bucketVersionsStats.js`): UtapiV2 service-level metrics and bucket statistics | ||
| - **Utilities** (`Clients/`, `utils/`): Shared S3 and CloudServer client wrappers | ||
|
|
||
| ### Key Architectural Patterns | ||
|
|
||
| **Client Architecture:** | ||
| - `Clients/s3Client.js`: AWS SDK v3 S3 client with custom retry strategy (100 retries with exponential backoff) | ||
| - `Clients/CloudserverClient.js`: Internal CloudServer API client for direct metadata operations | ||
| - Both clients use configurable retry logic and keep-alive connections for high-throughput operations | ||
|
|
||
| **Replication Status Management:** | ||
| - `CRR/ReplicationStatusUpdater.js`: Core class for updating object replication metadata | ||
| - Processes objects in batches with configurable worker concurrency | ||
| - Supports filtering by replication status: NEW, PENDING, COMPLETED, FAILED, REPLICA | ||
| - Can target specific prefixes and resume from markers for large-scale operations | ||
|
|
||
| **Metadata Database Tools:** | ||
| - Uses LevelDB for local metadata storage comparisons | ||
| - `CompareRaftMembers/followerDiff.js`: Compares follower databases against leader view | ||
| - `CompareRaftMembers/compareFollowerDbs.js`: Compares two follower database sets | ||
| - `CompareRaftMembers/repairObjects.js`: Automatically repairs detected inconsistencies | ||
| - Implements block digests for efficient comparison of large datasets | ||
| - Supports filtering by Raft oplog cseq to eliminate false positives from concurrent updates | ||
|
|
||
| **Stream Processing:** | ||
| - Extensive use of Node.js streams for memory-efficient processing of large datasets | ||
| - Custom transform streams for filtering, diffing, and aggregating metadata | ||
| - Examples: `DBListStream`, `DiffStream`, `BlockDigestsStream`, `RaftOplogStream` | ||
|
|
||
| ### Technology Stack | ||
|
|
||
| - **Runtime:** Node.js 22+ | ||
| - **AWS SDK:** v3 (client-s3, smithy packages) | ||
| - **Scality Libraries:** | ||
| - `arsenal`: Core S3 data models and utilities | ||
| - `bucketclient`: Bucket metadata client | ||
| - `vaultclient`: Authentication service client | ||
| - `werelogs`: Structured logging | ||
| - **Storage:** LevelDB for local metadata caching and comparison | ||
| - **Kafka:** node-rdkafka for replication queue management | ||
| - **Testing:** Jest with mongodb-memory-server | ||
| - **Monitoring:** Warp10 and Scuba for UtapiV2 metrics | ||
|
|
||
| ### Environment Variable Patterns | ||
|
|
||
| Scripts follow consistent patterns for configuration: | ||
| - **Required:** `ACCESS_KEY`, `SECRET_KEY`, `ENDPOINT` for S3 operations | ||
| - **Optional:** `DEBUG=1` for debug logging, `WORKERS` for concurrency control | ||
| - **Limits:** `MAX_UPDATES`, `MAX_SCANNED` for batch processing with resumption | ||
| - **Markers:** `KEY_MARKER`, `VERSION_ID_MARKER` for resuming interrupted operations | ||
| - **Filters:** `TARGET_PREFIX`, `TARGET_REPLICATION_STATUS` for selective processing | ||
|
|
||
| ### Testing Architecture | ||
|
|
||
| - **Unit Tests:** `tests/unit/` - Mock-based testing of individual components | ||
| - **Functional Tests:** `tests/functional/` - Integration tests requiring MongoDB and workbench | ||
| - **Workbench:** `workbench/env/default/` - Local development environment with CloudServer and dependencies | ||
| - **CI/CD:** GitHub Actions with Docker-based MongoDB and service orchestration | ||
| - Test configuration in `jest.config.js` with separate coverage directories | ||
|
|
||
| ### Special Considerations | ||
|
|
||
| **Raft Metadata System:** | ||
| - Scripts interact with Raft-based distributed metadata storage | ||
| - Multiple raft sessions (typically 8) distribute metadata across nodes | ||
| - Leader-follower architecture requires special handling for consistency checks | ||
| - Oplog filtering prevents false positives from in-flight operations | ||
|
|
||
| **Sproxyd Storage:** | ||
| - Backend storage system for object data | ||
| - Scripts verify sproxyd key references in metadata match actual stored data | ||
| - Supports MPU (multipart upload) validation with multiple keys per object | ||
|
|
||
| **Replication Semantics:** | ||
| - Supports one-to-many replication to AWS S3, Azure Blob Storage, GCP | ||
| - Replication status lifecycle: NEW → PENDING → COMPLETED/FAILED | ||
| - REPLICA status indicates objects created via replication (not originals) | ||
|
|
||
| **Production Usage:** | ||
| - Scripts are designed for production operations on live systems | ||
| - Many scripts support dry-run modes for safety | ||
| - Progress logging and resumption markers for long-running operations | ||
| - Operations may require stopping Raft followers to ensure consistency | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.