|
| 1 | +# Access Query Fields Migration Script |
| 2 | + |
| 3 | +This script migrates existing OpenSearch documents to add the new `access_check_query` and `history_check_query` fields that were recently added as part of [PR #20](https://github.com/linuxfoundation/lfx-v2-indexer-service/pull/20). |
| 4 | + |
| 5 | +## Background |
| 6 | + |
| 7 | +The LFX indexer service was recently updated to include new document fields for Fine-Grained Authorization (FGA) of documents via the [query service](https://github.com/linuxfoundation/lfx-v2-query-service): |
| 8 | + |
| 9 | +- `access_check_query`: Combination of `access_check_object` + "#" + `access_check_relation` |
| 10 | +- `history_check_query`: Combination of `history_check_object` + "#" + `history_check_relation` |
| 11 | + |
| 12 | +These fields are automatically populated for newly indexed documents (implemented in [PR #20](https://github.com/linuxfoundation/lfx-v2-indexer-service/pull/20)), but existing documents need to be migrated. |
| 13 | + |
| 14 | +## Usage |
| 15 | + |
| 16 | +### Basic Usage |
| 17 | + |
| 18 | +```bash |
| 19 | +# Run in dry-run mode to see what would be changed |
| 20 | +DRY_RUN=true go run scripts/migration/001_add_access_query_fields/main.go |
| 21 | + |
| 22 | +# Run the actual migration |
| 23 | +go run scripts/migration/001_add_access_query_fields/main.go |
| 24 | +``` |
| 25 | + |
| 26 | +## Environment Variables |
| 27 | + |
| 28 | +| Variable | Default | Description | |
| 29 | +|----------|---------|-------------| |
| 30 | +| `OPENSEARCH_URL` | `http://localhost:9200` | OpenSearch cluster URL | |
| 31 | +| `OPENSEARCH_INDEX` | `resources` | Target index name | |
| 32 | +| `BATCH_SIZE` | `100` | Number of documents to process per batch | |
| 33 | +| `DRY_RUN` | `false` | If true, only log what would be updated without making changes | |
| 34 | +| `SCROLL_TIMEOUT` | `5m` | Scroll context timeout | |
| 35 | + |
| 36 | +## Safety Features |
| 37 | + |
| 38 | +- **Dry Run Mode**: Use `DRY_RUN=true` to preview changes without applying them |
| 39 | +- **Idempotent**: Safe to run multiple times - skips documents that already have the new fields |
| 40 | +- **Graceful Shutdown**: Responds to SIGINT/SIGTERM signals |
| 41 | +- **Progress Tracking**: Shows detailed progress and statistics |
| 42 | +- **Error Handling**: Continues processing even if individual batches fail |
| 43 | + |
| 44 | +## Migration Logic |
| 45 | + |
| 46 | +The script: |
| 47 | + |
| 48 | +1. Searches for documents that have access control fields but are missing the new query fields |
| 49 | +2. For each document, constructs the query fields only if both object and relation are non-empty |
| 50 | +3. Updates documents in batches using the OpenSearch bulk API |
| 51 | +4. Provides detailed statistics and progress reporting |
| 52 | + |
| 53 | +### Query Construction Rules |
| 54 | + |
| 55 | +- `access_check_query` is created only if both `access_check_object` and `access_check_relation` are non-empty |
| 56 | +- `history_check_query` is created only if both `history_check_object` and `history_check_relation` are non-empty |
| 57 | +- Format: `{object}#{relation}` (e.g., `committee:abc123#viewer`) |
| 58 | + |
| 59 | +## Example Output |
| 60 | + |
| 61 | +```text |
| 62 | +Starting access query fields migration... |
| 63 | +=== Migration Configuration === |
| 64 | + OpenSearch URL: http://opensearch:9200 |
| 65 | + Index Name: resources |
| 66 | + Batch Size: 100 |
| 67 | + Dry Run: false |
| 68 | + Scroll Timeout: 5m0s |
| 69 | +============================== |
| 70 | +✓ Connected to OpenSearch successfully |
| 71 | +Searching for documents that need migration... |
| 72 | +Found 1250 documents that may need migration |
| 73 | +
|
| 74 | +Processing batch 1 (100 documents)... |
| 75 | +Progress: 100/1250 documents (8.0%) |
| 76 | +
|
| 77 | +Processing batch 2 (100 documents)... |
| 78 | +Progress: 200/1250 documents (16.0%) |
| 79 | +... |
| 80 | +
|
| 81 | +=== Migration Statistics === |
| 82 | +Total Documents Found: 1250 |
| 83 | +Documents Processed: 1250 |
| 84 | +Documents Updated: 987 |
| 85 | +Documents Skipped: 263 |
| 86 | +Documents with Errors: 0 |
| 87 | +Duration: 45.6s |
| 88 | +Processing Rate: 27.4 docs/sec |
| 89 | +============================ |
| 90 | +
|
| 91 | +✓ Migration completed successfully! |
| 92 | +``` |
| 93 | + |
| 94 | +## Troubleshooting |
| 95 | + |
| 96 | +### Connection Issues |
| 97 | + |
| 98 | +- Verify OpenSearch is running and accessible |
| 99 | +- Check the `OPENSEARCH_URL` environment variable |
| 100 | +- Ensure network connectivity and authentication if required |
| 101 | + |
| 102 | +### Performance Tuning |
| 103 | + |
| 104 | +- Increase `BATCH_SIZE` for faster processing of large datasets |
| 105 | +- Adjust `SCROLL_TIMEOUT` if processing very large result sets |
| 106 | +- Monitor OpenSearch cluster performance during migration |
| 107 | + |
| 108 | +### Partial Failures |
| 109 | + |
| 110 | +- The script continues processing even if individual batches fail |
| 111 | +- Check the error logs for specific failure reasons |
| 112 | +- Re-run the script to retry failed documents (it's idempotent) |
| 113 | + |
| 114 | +## Testing |
| 115 | + |
| 116 | +Always test in a non-production environment first: |
| 117 | + |
| 118 | +1. Run with `DRY_RUN=true` to preview changes |
| 119 | +2. Test with a small `BATCH_SIZE` initially |
| 120 | +3. Verify the query fields are constructed correctly |
| 121 | +4. Check that no data is corrupted |
| 122 | + |
| 123 | +## Technical Details |
| 124 | + |
| 125 | +- Uses OpenSearch scroll API for efficient processing of large result sets |
| 126 | +- Bulk updates for optimal performance |
| 127 | +- Only fetches necessary fields to minimize network transfer |
| 128 | +- Implements proper signal handling for graceful shutdown |
| 129 | +- Comprehensive error handling and statistics tracking |
0 commit comments