Skip to content

Conversation

@andrest50
Copy link
Contributor

@andrest50 andrest50 commented Sep 23, 2025

Summary

  • Add OpenSearch data migration script to populate new access_check_query and history_check_query fields
  • Create shared pkg/env package for environment variable utilities
  • Migration script supports batch processing, dry-run mode, and comprehensive error handling

Changes

  • Migration Script: scripts/migration/001_add_access_query_fields/main.go

    • Migrates existing OpenSearch documents to include new FGA query fields
    • Uses scroll API for efficient processing of large datasets
    • Constructs query fields as object#relation format matching enricher logic
    • Includes comprehensive safety features and progress tracking
  • Environment Package: pkg/env/env.go

    • Shared utilities for environment variable handling with type conversion
    • Functions: GetString, GetInt, GetBool, GetDuration
    • Reduces code duplication across scripts and services
  • Documentation: Complete README with usage examples and troubleshooting

Test plan

  • Script builds successfully
  • Environment package functions work correctly
  • Migration logic correctly constructs FGA query fields
  • Dry-run mode validates changes without applying them
  • Comprehensive documentation provided

Manual Testing

Output from running the script with data in my local environment containing 149 documents that needed updating to include the new fields. Upon checking the opensearch index, the documents were in fact updated to include the access_check_query and history_check_query fields correctly.

001_add_access_query_fields % go run main.go
2025/09/23 13:36:09 Starting access query fields migration...
2025/09/23 13:36:09 === Migration Configuration ===
2025/09/23 13:36:09   OpenSearch URL: http://opensearch-cluster-master.lfx.svc.cluster.local:9200
2025/09/23 13:36:09   Index Name: resources
2025/09/23 13:36:09   Batch Size: 100
2025/09/23 13:36:09   Dry Run: false
2025/09/23 13:36:09   Scroll Timeout: 5m0s
2025/09/23 13:36:09 ==============================
2025/09/23 13:36:09 ✓ Connected to OpenSearch successfully
2025/09/23 13:36:09 Searching for documents that need migration...
2025/09/23 13:36:09 Found 149 documents that may need migration
2025/09/23 13:36:09 
Processing batch 1 (100 documents)...
2025/09/23 13:36:09 Progress: 100/149 documents (67.1%)
2025/09/23 13:36:09 
Processing batch 2 (49 documents)...
2025/09/23 13:36:10 Progress: 149/149 documents (100.0%)
2025/09/23 13:36:10 
=== Migration Statistics ===
2025/09/23 13:36:10 Total Documents Found: 149
2025/09/23 13:36:10 Documents Processed: 149
2025/09/23 13:36:10 Documents Updated: 149
2025/09/23 13:36:10 Documents Skipped: 0
2025/09/23 13:36:10 Documents with Errors: 0
2025/09/23 13:36:10 Duration: 656.927875ms
2025/09/23 13:36:10 Processing Rate: 226.81 docs/sec
2025/09/23 13:36:10 ============================
2025/09/23 13:36:10 
✓ Migration completed successfully!

Ticket

LFXV2-567

🤖 Generated with Claude Code

- Create migration script to update existing OpenSearch documents with new access_check_query and history_check_query fields
- Implement safe batch processing using OpenSearch scroll and bulk APIs
- Add comprehensive error handling, progress tracking, and dry-run mode
- Include graceful shutdown handling and detailed statistics reporting
- Support configurable batch sizes and connection settings via environment variables
- Query construction matches enricher logic: object#relation format
- Migration is idempotent and safe to run multiple times
- Add complete documentation with usage examples and troubleshooting guide

The script addresses the need to migrate existing documents that were indexed before the new FGA query fields were added to the enricher system.

🤖 Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Andres Tobon <[email protected]>
- Create pkg/env package with utilities for reading environment variables with type conversion
- Implement GetString, GetInt, GetBool, and GetDuration functions with default value support
- Update migration script to use new env package instead of duplicated helper functions
- Simplifies environment variable handling across scripts and services
- Reduces code duplication and provides consistent interface for env var access

🤖 Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Andres Tobon <[email protected]>
Copilot AI review requested due to automatic review settings September 23, 2025 21:20
@andrest50 andrest50 requested a review from a team as a code owner September 23, 2025 21:20
@coderabbitai
Copy link

coderabbitai bot commented Sep 23, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a new pkg/env package providing typed env var helpers and a new migration under scripts/migration/001_add_access_query_fields that computes and bulk-applies access_check_query and history_check_query to OpenSearch documents (configurable, supports dry-run, progress, and graceful shutdown).

Changes

Cohort / File(s) Summary
Env utilities
pkg/env/env.go
New package with exported helpers: GetString, GetInt, GetBool, GetDuration — read env vars, parse to types, fall back to provided defaults on missing/invalid values.
Access Query Fields migration
scripts/migration/001_add_access_query_fields/main.go, scripts/migration/001_add_access_query_fields/README.md
New migration program and README: load config from env, create OpenSearch client, search/scroll documents missing query fields, construct FGA-style queries, bulk-update fields (or dry-run), track stats, handle scroll cleanup and graceful shutdown.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Operator
  participant Main as Migration Main
  participant Env as pkg/env
  participant OS as OpenSearch

  Operator->>Main: Start
  Main->>Env: Load config (URL, index, batch, dry-run, timeouts)
  Main->>OS: Connect / Info
  Main->>OS: Search (scroll) for docs needing queries
  loop per batch
    Main->>Main: Determine needs, build queries (JoinFgaQuery)
    alt Dry-run
      Main->>Operator: Log planned updates
    else Execute
      Main->>OS: Bulk update documents
      OS-->>Main: Bulk response
    end
    Main->>Main: Update stats, log progress
    Main->>OS: Scroll next batch
  end
  alt Completion / Cancel
    Main->>OS: Clear scroll
    Main->>Operator: Print final stats
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title "LFXV2-567 Add data migration script for access query fields" is concise, includes the ticket reference, and accurately captures the primary change (adding the migration script to populate access query fields), so it clearly reflects the main changeset. It is specific and relevant to the files and objectives described in the PR.
Description Check ✅ Passed The PR description details the migration script, the new pkg/env utilities, README, test plan, and manual run output, and these items map directly to the provided changeset summaries, so the description is on-topic and informative for reviewers. It is specific enough for this lenient check and documents the intent and verification steps performed.
Docstring Coverage ✅ Passed Docstring coverage is 90.91% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch andrest50/reindex-script

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a data migration script to populate new FGA (Fine-Grained Authorization) query fields in OpenSearch documents and creates a shared environment utilities package.

  • Creates migration script to add access_check_query and history_check_query fields to existing documents
  • Introduces reusable environment variable utilities package
  • Provides comprehensive documentation and safety features including dry-run mode and graceful shutdown

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
scripts/migration/001_add_access_query_fields/main.go Complete migration script with OpenSearch integration, batch processing, and error handling
scripts/migration/001_add_access_query_fields/README.md Comprehensive documentation covering usage, configuration, and troubleshooting
pkg/env/env.go Shared utilities for environment variable handling with type conversion

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (6)
pkg/env/env.go (1)

13-49: Trim env values before parsing to avoid surprising defaults.

Untrimmed values like " true " or " 500 " will fall back to defaults. Trim once and reuse across helpers.

-import (
-	"os"
-	"strconv"
-	"time"
-)
+import (
+	"os"
+	"strconv"
+	"strings"
+	"time"
+)
@@
-func GetString(key, defaultValue string) string {
-	if value := os.Getenv(key); value != "" {
+func GetString(key, defaultValue string) string {
+	if value := strings.TrimSpace(os.Getenv(key)); value != "" {
 		return value
 	}
 	return defaultValue
 }
@@
-func GetInt(key string, defaultValue int) int {
-	if value := os.Getenv(key); value != "" {
+func GetInt(key string, defaultValue int) int {
+	if value := strings.TrimSpace(os.Getenv(key)); value != "" {
 		if intValue, err := strconv.Atoi(value); err == nil {
 			return intValue
 		}
 	}
 	return defaultValue
 }
@@
-func GetBool(key string, defaultValue bool) bool {
-	if value := os.Getenv(key); value != "" {
+func GetBool(key string, defaultValue bool) bool {
+	if value := strings.TrimSpace(os.Getenv(key)); value != "" {
 		if boolValue, err := strconv.ParseBool(value); err == nil {
 			return boolValue
 		}
 	}
 	return defaultValue
 }
@@
-func GetDuration(key string, defaultValue time.Duration) time.Duration {
-	if value := os.Getenv(key); value != "" {
+func GetDuration(key string, defaultValue time.Duration) time.Duration {
+	if value := strings.TrimSpace(os.Getenv(key)); value != "" {
 		if duration, err := time.ParseDuration(value); err == nil {
 			return duration
 		}
 	}
 	return defaultValue
 }
scripts/migration/001_add_access_query_fields/main.go (5)

23-37: Add imports for bytes and url to support safer logging and request bodies.

 import (
+	"bytes"
 	"context"
 	"encoding/json"
 	"fmt"
 	"log"
+	"net/url"
 	"os"
 	"os/signal"
 	"strings"
 	"syscall"
 	"time"

176-215: Prefer scroll-friendly sort and accurate totals; also tighten request body creation.

  • Add sort: "_doc" for scroll performance.
  • Enable track_total_hits to make progress stats accurate.
 	searchBody := map[string]interface{}{
 		"query": map[string]interface{}{
 			"bool": map[string]interface{}{
 				"should": []map[string]interface{}{
@@
 				"minimum_should_match": 1,
 			},
 		},
+		"sort": []interface{}{"_doc"},
+		"track_total_hits": true,
 		"size": config.BatchSize,
 		"_source": []string{
 			"access_check_object",
 			"access_check_relation",
 			"history_check_object",
 			"history_check_relation",
 			"access_check_query",
 			"history_check_query",
 		},
 	}
@@
-	req := opensearchapi.SearchRequest{
-		Index:  []string{config.IndexName},
-		Body:   strings.NewReader(string(searchBodyJSON)),
-		Scroll: config.ScrollTimeout,
-	}
+	req := opensearchapi.SearchRequest{
+		Index:  []string{config.IndexName},
+		Body:   bytes.NewReader(searchBodyJSON),
+		Scroll: config.ScrollTimeout,
+	}

410-413: Guard against zero duration in rate calculation.

Avoid logging Inf docs/sec on very fast runs.

-	if stats.ProcessedDocuments > 0 {
-		rate := float64(stats.ProcessedDocuments) / duration.Seconds()
+	if stats.ProcessedDocuments > 0 && duration > 0 {
+		rate := float64(stats.ProcessedDocuments) / duration.Seconds()
 		log.Printf("Processing Rate: %.2f docs/sec", rate)
 	}

484-491: Rely on empty hits to terminate scrolls.

Breaking when len(hits) < batch size can prematurely end if the server returns a short page mid-scroll. Prefer fetching until a 0-length page.

-		// Check if we have more documents to process
-		if len(searchResponse.Hits.Hits) < config.BatchSize {
-			break
-		}
+		// Continue scrolling; termination is handled when the next page returns 0 hits.

120-131: Support OPENSEARCH_USERNAME/OPENSEARCH_PASSWORD for optional basic auth

scripts/migration/001_add_access_query_fields/main.go currently only reads OPENSEARCH_URL; extend Config with Username/Password, read OPENSEARCH_USERNAME/OPENSEARCH_PASSWORD in loadConfig, pass them into opensearch.Config when creating the client, and update the file's environment-variable docs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8765755 and 87d6932.

📒 Files selected for processing (3)
  • pkg/env/env.go (1 hunks)
  • scripts/migration/001_add_access_query_fields/README.md (1 hunks)
  • scripts/migration/001_add_access_query_fields/main.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
PR: linuxfoundation/lfx-v2-indexer-service#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-13T19:35:15.510Z
Learning: Set required environment variables (NATS_URL, OPENSEARCH_URL, JWKS_URL, etc.) for local and deployment environments
🧬 Code graph analysis (1)
scripts/migration/001_add_access_query_fields/main.go (1)
pkg/env/env.go (4)
  • GetString (14-19)
  • GetInt (22-29)
  • GetBool (32-39)
  • GetDuration (42-49)
🪛 markdownlint-cli2 (0.18.1)
scripts/migration/001_add_access_query_fields/README.md

74-74: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: MegaLinter
🔇 Additional comments (1)
scripts/migration/001_add_access_query_fields/main.go (1)

139-141: LGTM: client connectivity check.

Connection probe and resource cleanup are correct.

- Add proper error handling for JSON marshaling in bulk update processing
- Fix scroll timeout format to use duration string instead of minutes
- Add links to PR #20 and query service repository in documentation
- Simplify usage examples to use go run directly instead of building
- Remove redundant custom configuration section from README

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

Signed-off-by: Andres Tobon <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
scripts/migration/001_add_access_query_fields/main.go (3)

175-214: Ensure accurate totals for progress by enabling track_total_hits

Without track_total_hits: true, total can be a low ceiling, skewing progress percentages.

   searchBody := map[string]interface{}{
     "query": map[string]interface{}{
       "bool": map[string]interface{}{
@@
       },
     },
-    "size": config.BatchSize,
+    "size": config.BatchSize,
+    "track_total_hits": true,
     "_source": []string{
       "access_check_object",
       "access_check_relation",
       "history_check_object",
       "history_check_relation",
       "access_check_query",
       "history_check_query",
     },
   }

492-499: Don’t early-exit the scroll loop based on page size

Scroll can return fewer than size before exhaustion in some scenarios; rely on the next scroll returning zero hits instead.

-    // Check if we have more documents to process
-    if len(searchResponse.Hits.Hits) < config.BatchSize {
-      break
-    }
-
     // Get next batch
     searchResponse, err = scrollDocuments(ctx, client, scrollID, config.ScrollTimeout)
     if err != nil {
       log.Printf("Warning: Failed to scroll documents: %v", err)
       break
     }

221-225: Avoid extra allocations when sending JSON bodies

Use bytes.NewReader to prevent converting JSON bytes to string.

-    Body:   strings.NewReader(string(searchBodyJSON)),
+    Body:   bytes.NewReader(searchBodyJSON),
-    Body: strings.NewReader(string(scrollBodyJSON)),
+    Body: bytes.NewReader(scrollBodyJSON),

Add import:

import "bytes"

Also applies to: 257-259

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87d6932 and 70c809e.

📒 Files selected for processing (2)
  • scripts/migration/001_add_access_query_fields/README.md (1 hunks)
  • scripts/migration/001_add_access_query_fields/main.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/migration/001_add_access_query_fields/README.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-23T21:34:45.507Z
Learnt from: andrest50
PR: linuxfoundation/lfx-v2-indexer-service#21
File: scripts/migration/001_add_access_query_fields/main.go:108-114
Timestamp: 2025-09-23T21:34:45.507Z
Learning: In the lfx-v2-indexer-service repository, logging OpenSearch URLs directly is acceptable and the URLs are not considered secrets that need redaction.

Applied to files:

  • scripts/migration/001_add_access_query_fields/main.go
🧬 Code graph analysis (1)
scripts/migration/001_add_access_query_fields/main.go (1)
pkg/env/env.go (4)
  • GetString (14-19)
  • GetInt (22-29)
  • GetBool (32-39)
  • GetDuration (42-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: MegaLinter
🔇 Additional comments (1)
scripts/migration/001_add_access_query_fields/main.go (1)

108-114: Config logging looks good

Per team guidance, logging the OpenSearch URL here is acceptable.

Copy link
Contributor

@prabodhcs prabodhcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrest50 andrest50 merged commit f544390 into main Sep 24, 2025
5 checks passed
@andrest50 andrest50 deleted the andrest50/reindex-script branch September 24, 2025 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants