-
Notifications
You must be signed in to change notification settings - Fork 2
[LFXV2-567] Add data migration script for access query fields #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Create migration script to update existing OpenSearch documents with new access_check_query and history_check_query fields - Implement safe batch processing using OpenSearch scroll and bulk APIs - Add comprehensive error handling, progress tracking, and dry-run mode - Include graceful shutdown handling and detailed statistics reporting - Support configurable batch sizes and connection settings via environment variables - Query construction matches enricher logic: object#relation format - Migration is idempotent and safe to run multiple times - Add complete documentation with usage examples and troubleshooting guide The script addresses the need to migrate existing documents that were indexed before the new FGA query fields were added to the enricher system. 🤖 Generated with [Claude Code](https://claude.ai/code) Signed-off-by: Andres Tobon <[email protected]>
- Create pkg/env package with utilities for reading environment variables with type conversion - Implement GetString, GetInt, GetBool, and GetDuration functions with default value support - Update migration script to use new env package instead of duplicated helper functions - Simplifies environment variable handling across scripts and services - Reduces code duplication and provides consistent interface for env var access 🤖 Generated with [Claude Code](https://claude.ai/code) Signed-off-by: Andres Tobon <[email protected]>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a new pkg/env package providing typed env var helpers and a new migration under scripts/migration/001_add_access_query_fields that computes and bulk-applies access_check_query and history_check_query to OpenSearch documents (configurable, supports dry-run, progress, and graceful shutdown). Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Operator
participant Main as Migration Main
participant Env as pkg/env
participant OS as OpenSearch
Operator->>Main: Start
Main->>Env: Load config (URL, index, batch, dry-run, timeouts)
Main->>OS: Connect / Info
Main->>OS: Search (scroll) for docs needing queries
loop per batch
Main->>Main: Determine needs, build queries (JoinFgaQuery)
alt Dry-run
Main->>Operator: Log planned updates
else Execute
Main->>OS: Bulk update documents
OS-->>Main: Bulk response
end
Main->>Main: Update stats, log progress
Main->>OS: Scroll next batch
end
alt Completion / Cancel
Main->>OS: Clear scroll
Main->>Operator: Print final stats
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a data migration script to populate new FGA (Fine-Grained Authorization) query fields in OpenSearch documents and creates a shared environment utilities package.
- Creates migration script to add
access_check_queryandhistory_check_queryfields to existing documents - Introduces reusable environment variable utilities package
- Provides comprehensive documentation and safety features including dry-run mode and graceful shutdown
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
scripts/migration/001_add_access_query_fields/main.go |
Complete migration script with OpenSearch integration, batch processing, and error handling |
scripts/migration/001_add_access_query_fields/README.md |
Comprehensive documentation covering usage, configuration, and troubleshooting |
pkg/env/env.go |
Shared utilities for environment variable handling with type conversion |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (6)
pkg/env/env.go (1)
13-49: Trim env values before parsing to avoid surprising defaults.Untrimmed values like " true " or " 500 " will fall back to defaults. Trim once and reuse across helpers.
-import ( - "os" - "strconv" - "time" -) +import ( + "os" + "strconv" + "strings" + "time" +) @@ -func GetString(key, defaultValue string) string { - if value := os.Getenv(key); value != "" { +func GetString(key, defaultValue string) string { + if value := strings.TrimSpace(os.Getenv(key)); value != "" { return value } return defaultValue } @@ -func GetInt(key string, defaultValue int) int { - if value := os.Getenv(key); value != "" { +func GetInt(key string, defaultValue int) int { + if value := strings.TrimSpace(os.Getenv(key)); value != "" { if intValue, err := strconv.Atoi(value); err == nil { return intValue } } return defaultValue } @@ -func GetBool(key string, defaultValue bool) bool { - if value := os.Getenv(key); value != "" { +func GetBool(key string, defaultValue bool) bool { + if value := strings.TrimSpace(os.Getenv(key)); value != "" { if boolValue, err := strconv.ParseBool(value); err == nil { return boolValue } } return defaultValue } @@ -func GetDuration(key string, defaultValue time.Duration) time.Duration { - if value := os.Getenv(key); value != "" { +func GetDuration(key string, defaultValue time.Duration) time.Duration { + if value := strings.TrimSpace(os.Getenv(key)); value != "" { if duration, err := time.ParseDuration(value); err == nil { return duration } } return defaultValue }scripts/migration/001_add_access_query_fields/main.go (5)
23-37: Add imports for bytes and url to support safer logging and request bodies.import ( + "bytes" "context" "encoding/json" "fmt" "log" + "net/url" "os" "os/signal" "strings" "syscall" "time"
176-215: Prefer scroll-friendly sort and accurate totals; also tighten request body creation.
- Add sort: "_doc" for scroll performance.
- Enable track_total_hits to make progress stats accurate.
searchBody := map[string]interface{}{ "query": map[string]interface{}{ "bool": map[string]interface{}{ "should": []map[string]interface{}{ @@ "minimum_should_match": 1, }, }, + "sort": []interface{}{"_doc"}, + "track_total_hits": true, "size": config.BatchSize, "_source": []string{ "access_check_object", "access_check_relation", "history_check_object", "history_check_relation", "access_check_query", "history_check_query", }, } @@ - req := opensearchapi.SearchRequest{ - Index: []string{config.IndexName}, - Body: strings.NewReader(string(searchBodyJSON)), - Scroll: config.ScrollTimeout, - } + req := opensearchapi.SearchRequest{ + Index: []string{config.IndexName}, + Body: bytes.NewReader(searchBodyJSON), + Scroll: config.ScrollTimeout, + }
410-413: Guard against zero duration in rate calculation.Avoid logging Inf docs/sec on very fast runs.
- if stats.ProcessedDocuments > 0 { - rate := float64(stats.ProcessedDocuments) / duration.Seconds() + if stats.ProcessedDocuments > 0 && duration > 0 { + rate := float64(stats.ProcessedDocuments) / duration.Seconds() log.Printf("Processing Rate: %.2f docs/sec", rate) }
484-491: Rely on empty hits to terminate scrolls.Breaking when len(hits) < batch size can prematurely end if the server returns a short page mid-scroll. Prefer fetching until a 0-length page.
- // Check if we have more documents to process - if len(searchResponse.Hits.Hits) < config.BatchSize { - break - } + // Continue scrolling; termination is handled when the next page returns 0 hits.
120-131: Support OPENSEARCH_USERNAME/OPENSEARCH_PASSWORD for optional basic authscripts/migration/001_add_access_query_fields/main.go currently only reads OPENSEARCH_URL; extend Config with Username/Password, read OPENSEARCH_USERNAME/OPENSEARCH_PASSWORD in loadConfig, pass them into opensearch.Config when creating the client, and update the file's environment-variable docs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
pkg/env/env.go(1 hunks)scripts/migration/001_add_access_query_fields/README.md(1 hunks)scripts/migration/001_add_access_query_fields/main.go(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
PR: linuxfoundation/lfx-v2-indexer-service#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-13T19:35:15.510Z
Learning: Set required environment variables (NATS_URL, OPENSEARCH_URL, JWKS_URL, etc.) for local and deployment environments
🧬 Code graph analysis (1)
scripts/migration/001_add_access_query_fields/main.go (1)
pkg/env/env.go (4)
GetString(14-19)GetInt(22-29)GetBool(32-39)GetDuration(42-49)
🪛 markdownlint-cli2 (0.18.1)
scripts/migration/001_add_access_query_fields/README.md
74-74: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: MegaLinter
🔇 Additional comments (1)
scripts/migration/001_add_access_query_fields/main.go (1)
139-141: LGTM: client connectivity check.Connection probe and resource cleanup are correct.
- Add proper error handling for JSON marshaling in bulk update processing - Fix scroll timeout format to use duration string instead of minutes - Add links to PR #20 and query service repository in documentation - Simplify usage examples to use go run directly instead of building - Remove redundant custom configuration section from README 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Andres Tobon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
scripts/migration/001_add_access_query_fields/main.go (3)
175-214: Ensure accurate totals for progress by enabling track_total_hitsWithout
track_total_hits: true, total can be a low ceiling, skewing progress percentages.searchBody := map[string]interface{}{ "query": map[string]interface{}{ "bool": map[string]interface{}{ @@ }, }, - "size": config.BatchSize, + "size": config.BatchSize, + "track_total_hits": true, "_source": []string{ "access_check_object", "access_check_relation", "history_check_object", "history_check_relation", "access_check_query", "history_check_query", }, }
492-499: Don’t early-exit the scroll loop based on page sizeScroll can return fewer than
sizebefore exhaustion in some scenarios; rely on the next scroll returning zero hits instead.- // Check if we have more documents to process - if len(searchResponse.Hits.Hits) < config.BatchSize { - break - } - // Get next batch searchResponse, err = scrollDocuments(ctx, client, scrollID, config.ScrollTimeout) if err != nil { log.Printf("Warning: Failed to scroll documents: %v", err) break }
221-225: Avoid extra allocations when sending JSON bodiesUse
bytes.NewReaderto prevent converting JSON bytes to string.- Body: strings.NewReader(string(searchBodyJSON)), + Body: bytes.NewReader(searchBodyJSON),- Body: strings.NewReader(string(scrollBodyJSON)), + Body: bytes.NewReader(scrollBodyJSON),Add import:
import "bytes"Also applies to: 257-259
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
scripts/migration/001_add_access_query_fields/README.md(1 hunks)scripts/migration/001_add_access_query_fields/main.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/migration/001_add_access_query_fields/README.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-23T21:34:45.507Z
Learnt from: andrest50
PR: linuxfoundation/lfx-v2-indexer-service#21
File: scripts/migration/001_add_access_query_fields/main.go:108-114
Timestamp: 2025-09-23T21:34:45.507Z
Learning: In the lfx-v2-indexer-service repository, logging OpenSearch URLs directly is acceptable and the URLs are not considered secrets that need redaction.
Applied to files:
scripts/migration/001_add_access_query_fields/main.go
🧬 Code graph analysis (1)
scripts/migration/001_add_access_query_fields/main.go (1)
pkg/env/env.go (4)
GetString(14-19)GetInt(22-29)GetBool(32-39)GetDuration(42-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: MegaLinter
🔇 Additional comments (1)
scripts/migration/001_add_access_query_fields/main.go (1)
108-114: Config logging looks goodPer team guidance, logging the OpenSearch URL here is acceptable.
prabodhcs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary
access_check_queryandhistory_check_queryfieldspkg/envpackage for environment variable utilitiesChanges
Migration Script:
scripts/migration/001_add_access_query_fields/main.goobject#relationformat matching enricher logicEnvironment Package:
pkg/env/env.goGetString,GetInt,GetBool,GetDurationDocumentation: Complete README with usage examples and troubleshooting
Test plan
Manual Testing
Output from running the script with data in my local environment containing 149 documents that needed updating to include the new fields. Upon checking the opensearch index, the documents were in fact updated to include the
access_check_queryandhistory_check_queryfields correctly.Ticket
LFXV2-567
🤖 Generated with Claude Code