Skip to content

Conversation

@mtclinton
Copy link
Contributor

@mtclinton mtclinton commented Dec 16, 2025

fix(taskomatic): remove systems with null inventory ids

  • Systems with null as inventory_id are not properly deleted (see steps below)
  • The delete_systems job selects inventory_id without filtering NULL values.
  • When inventory_id is NULL, fetchone() returns (None,), and delete_system(None) is called.
  • The delete_system() function uses WHERE inventory_id = inventory_id_in, which fails when inventory_id_in is NULL (because NULL = NULL is false in SQL).
  • This leaves systems with NULL inventory_id stuck in the database.

Secure Coding Practices Checklist GitHub Link

Secure Coding Checklist

  • Input Validation
  • Output Encoding
  • Authentication and Password Management
  • Session Management
  • Access Control
  • Cryptographic Practices
  • Error Handling and Logging
  • Data Protection
  • Communication Security
  • System Configuration
  • Database Security
  • File Management
  • Memory Management
  • General Coding Practices

Summary by Sourcery

Bug Fixes:

  • Ensure systems with NULL inventory_id are also deleted during periodic cleanup without leaving orphaned related data.

@jira-linking
Copy link

jira-linking bot commented Dec 16, 2025

Commits missing Jira IDs:
a152235

@sourcery-ai
Copy link

sourcery-ai bot commented Dec 16, 2025

Reviewer's Guide

This PR updates the taskomatic cleanup job to ensure that systems with null inventory_ids are also fully cleaned up, by tightening the existing deletion query to only process valid inventory_ids and adding a new deletion loop that safely removes legacy/orphaned systems lacking inventory_ids along with their related records.

ER diagram for cleanup of systems with null inventory_id

erDiagram
    system_platform {
        bigint id PK
        uuid inventory_id
        bigint rh_account_id
        timestamp when_deleted
        boolean opt_out
    }

    system_vulnerabilities {
        bigint id PK
        bigint system_id FK
        bigint rh_account_id
        text vulnerability_data
    }

    system_vulnerable_package {
        bigint id PK
        bigint system_id FK
        bigint rh_account_id
        text package_data
    }

    system_repo {
        bigint id PK
        bigint system_id FK
        text repo_data
    }

    system_cve_data {
        bigint id PK
        bigint system_id FK
        text cve_data
    }

    system_platform ||--o{ system_vulnerabilities : system_id
    system_platform ||--o{ system_vulnerable_package : system_id
    system_platform ||--o{ system_repo : system_id
    system_platform ||--o{ system_cve_data : system_id
Loading

Flow diagram for updated system deletion job logic

flowchart TD
    A[start_run] --> B[open_db_connection]
    B --> C[init_deleted_counter]

    subgraph Loop_valid_inventory_id
        C --> D[query_next_system_where_when_deleted_threshold_and_inventory_id_not_null]
        D -->|no_row| H[proceed_to_null_inventory_loop]
        D -->|row_found| E[delete_system_with_inventory_id_via_existing_logic]
        E --> F[increment_deleted_counter]
        F --> G[commit_transaction]
        G --> D
    end

    H --> I[query_next_system_where_when_deleted_threshold_and_inventory_id_null]

    subgraph Loop_null_inventory_id
        I -->|no_row| O[close_cursor_and_connection]
        I -->|row_found| J[load_system_id_and_rh_account_id]
        J --> K[update_system_platform_set_opt_out_true]
        K --> L[delete_from_system_vulnerabilities_by_system_id_and_rh_account_id]
        L --> M[delete_from_system_vulnerable_package_by_system_id_and_rh_account_id]
        M --> N[delete_from_system_repo_by_system_id]
        N --> P[delete_from_system_cve_data_by_system_id]
        P --> Q{delete_from_system_platform_by_id}
        Q -->|success| S[increment_deleted_counter]
        Q -->|foreign_key_violation| R[delete_from_system_vulnerabilities_all_partitions_and_system_vulnerable_package_all_partitions_then_delete_system_platform]
        R --> S
        S --> T[log_success_for_null_inventory_system]
        T --> U[commit_transaction]
        U --> I
    end

    O --> V[end_run]
Loading

File-Level Changes

Change Details Files
Restrict the existing deletion loop to systems with non-null inventory_ids and add a new cleanup loop to safely delete systems with null inventory_ids and their related data.
  • Add an inventory_id IS NOT NULL predicate to the existing system selection query used for inventory-based deletion.
  • Introduce a second while loop that selects one legacy system at a time by id and rh_account_id where inventory_id IS NULL and when_deleted is older than the configured threshold, using FOR UPDATE for row locking.
  • Within the new loop, mark each system as opted out and delete associated records from system_vulnerabilities, system_vulnerable_package, system_repo, and system_cve_data before attempting to delete the system row itself.
  • Handle ForeignKeyViolation when deleting the system row by retrying deletions on system_vulnerabilities and system_vulnerable_package without rh_account_id filtering, then deleting the system row again.
  • Increment the deletion counter and add structured logging for both successful deletions and failures when processing systems with null inventory_ids.
taskomatic/jobs/delete_systems.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The two while True cleanup loops share a lot of structure (time threshold, single-row SELECT with FOR UPDATE, per-row commit, logging); consider extracting a helper to reduce duplication and make the deletion strategies (with/without inventory_id) easier to reason about and maintain.
  • The nested try for handling psycopg2.errors.ForeignKeyViolation could be made clearer and safer by using an explicit savepoint/rollback to avoid partially-applied statements in the outer try and by documenting why a second round of deletions without rh_account_id is necessary.
  • Committing inside each loop iteration can be expensive and may lead to inconsistent batches; consider processing multiple rows per transaction (e.g., by increasing the LIMIT or batching deletes) to improve performance while still keeping transactions small.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The two `while True` cleanup loops share a lot of structure (time threshold, single-row SELECT with `FOR UPDATE`, per-row commit, logging); consider extracting a helper to reduce duplication and make the deletion strategies (with/without inventory_id) easier to reason about and maintain.
- The nested `try` for handling `psycopg2.errors.ForeignKeyViolation` could be made clearer and safer by using an explicit savepoint/rollback to avoid partially-applied statements in the outer `try` and by documenting why a second round of deletions without `rh_account_id` is necessary.
- Committing inside each loop iteration can be expensive and may lead to inconsistent batches; consider processing multiple rows per transaction (e.g., by increasing the `LIMIT` or batching deletes) to improve performance while still keeping transactions small.

## Individual Comments

### Comment 1
<location> `taskomatic/jobs/delete_systems.py:31` </location>
<code_context>
     deleted = 0

+    # Delete systems with valid inventory_id
     while True:
         curr_time = datetime.now(tz=pytz.utc)
         cur.execute(
</code_context>

<issue_to_address>
**issue (complexity):** Consider refactoring the two deletion loops into shared helper functions that encapsulate the looping, per-row deletion, and error handling logic to reduce duplication and flatten control flow.

You can keep the new behavior but reduce complexity by extracting the common loop structure and the per-system delete logic into helpers. That removes duplication, flattens control flow, and centralizes error handling.

### 1. Extract shared cursor loop

Both loops share the same pattern: “select one row with `FOR UPDATE`, process it, commit, repeat”. You can move that into a small helper to avoid the duplicated `while True` logic:

```python
def process_deletions(cur, select_sql, select_params_fn, handle_row_fn):
    deleted = 0
    while True:
        curr_time = datetime.now(tz=pytz.utc)
        cur.execute(select_sql, select_params_fn(curr_time))
        row = cur.fetchone()
        if not row:
            break

        try:
            if handle_row_fn(row):
                deleted += 1
        except Exception as e:
            LOGGER.error("Error processing row %s: %s", row, e)

        cur.connection.commit()

    return deleted
```

Then use it for both cases:

```python
# 1) inventory_id not null
deleted += process_deletions(
    cur,
    """SELECT inventory_id FROM system_platform sp
       WHERE when_deleted IS NOT NULL
         AND when_deleted < %s
         AND inventory_id IS NOT NULL
       LIMIT 1 FOR UPDATE OF sp""",
    lambda curr_time: (curr_time - timedelta(hours=CFG.system_deletion_threshold),),
    lambda row: delete_with_inventory_id(cur, row[0]),
)

# 2) inventory_id is null
deleted += process_deletions(
    cur,
    """SELECT id, rh_account_id FROM system_platform sp
       WHERE when_deleted IS NOT NULL
         AND when_deleted < %s
         AND inventory_id IS NULL
       LIMIT 1 FOR UPDATE OF sp""",
    lambda curr_time: (curr_time - timedelta(hours=CFG.system_deletion_threshold),),
    lambda row: delete_orphan_system(cur, row[0], row[1]),
)
```

### 2. Extract per-system delete logic for NULL `inventory_id`

Move the “delete all data for system X” sequence into its own function. This keeps the main loop linear and removes nested `try/except`:

```python
def delete_orphan_system(cur, system_id, rh_account_id):
    # Mark as opt_out to match current behavior
    cur.execute(
        "UPDATE system_platform SET opt_out = true WHERE id = %s",
        (system_id,),
    )

    # Delete partitioned data first (with account filter)
    cur.execute(
        "DELETE FROM system_vulnerabilities "
        "WHERE system_id = %s AND rh_account_id = %s",
        (system_id, rh_account_id),
    )
    cur.execute(
        "DELETE FROM system_vulnerable_package "
        "WHERE system_id = %s AND rh_account_id = %s",
        (system_id, rh_account_id),
    )

    # Non-partitioned tables
    cur.execute("DELETE FROM system_repo WHERE system_id = %s", (system_id,))
    cur.execute("DELETE FROM system_cve_data WHERE system_id = %s", (system_id,))

    # Try delete system_platform, fall back to “all partitions” delete if needed
    try:
        cur.execute("DELETE FROM system_platform WHERE id = %s", (system_id,))
    except psycopg2.errors.ForeignKeyViolation:
        cur.execute(
            "DELETE FROM system_vulnerabilities WHERE system_id = %s",
            (system_id,),
        )
        cur.execute(
            "DELETE FROM system_vulnerable_package WHERE system_id = %s",
            (system_id,),
        )
        cur.execute("DELETE FROM system_platform WHERE id = %s", (system_id,))

    LOGGER.info(
        "Deleted system with NULL inventory_id: system_id=%s, rh_account_id=%s",
        system_id,
        rh_account_id,
    )
    return True
```

### 3. Reuse existing `delete_system` behavior

Similarly, wrap the existing `delete_system` call so the loop uses the same error/logging pattern:

```python
def delete_with_inventory_id(cur, inventory_id):
    cur.execute("SELECT deleted_inventory_id FROM delete_system(%s)", (inventory_id,))
    success = cur.fetchone()
    if success:
        return True

    LOGGER.error("Unable to delete inventory_id: %s", inventory_id)
    return False
```

This keeps all existing behavior (including the partition-aware FK handling) but:

- Removes the duplicated `while True` loops.
- Centralizes error handling and logging in one place.
- Makes the per-system delete path linear and easier to reason about.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

deleted = 0

# Delete systems with valid inventory_id
while True:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider refactoring the two deletion loops into shared helper functions that encapsulate the looping, per-row deletion, and error handling logic to reduce duplication and flatten control flow.

You can keep the new behavior but reduce complexity by extracting the common loop structure and the per-system delete logic into helpers. That removes duplication, flattens control flow, and centralizes error handling.

1. Extract shared cursor loop

Both loops share the same pattern: “select one row with FOR UPDATE, process it, commit, repeat”. You can move that into a small helper to avoid the duplicated while True logic:

def process_deletions(cur, select_sql, select_params_fn, handle_row_fn):
    deleted = 0
    while True:
        curr_time = datetime.now(tz=pytz.utc)
        cur.execute(select_sql, select_params_fn(curr_time))
        row = cur.fetchone()
        if not row:
            break

        try:
            if handle_row_fn(row):
                deleted += 1
        except Exception as e:
            LOGGER.error("Error processing row %s: %s", row, e)

        cur.connection.commit()

    return deleted

Then use it for both cases:

# 1) inventory_id not null
deleted += process_deletions(
    cur,
    """SELECT inventory_id FROM system_platform sp
       WHERE when_deleted IS NOT NULL
         AND when_deleted < %s
         AND inventory_id IS NOT NULL
       LIMIT 1 FOR UPDATE OF sp""",
    lambda curr_time: (curr_time - timedelta(hours=CFG.system_deletion_threshold),),
    lambda row: delete_with_inventory_id(cur, row[0]),
)

# 2) inventory_id is null
deleted += process_deletions(
    cur,
    """SELECT id, rh_account_id FROM system_platform sp
       WHERE when_deleted IS NOT NULL
         AND when_deleted < %s
         AND inventory_id IS NULL
       LIMIT 1 FOR UPDATE OF sp""",
    lambda curr_time: (curr_time - timedelta(hours=CFG.system_deletion_threshold),),
    lambda row: delete_orphan_system(cur, row[0], row[1]),
)

2. Extract per-system delete logic for NULL inventory_id

Move the “delete all data for system X” sequence into its own function. This keeps the main loop linear and removes nested try/except:

def delete_orphan_system(cur, system_id, rh_account_id):
    # Mark as opt_out to match current behavior
    cur.execute(
        "UPDATE system_platform SET opt_out = true WHERE id = %s",
        (system_id,),
    )

    # Delete partitioned data first (with account filter)
    cur.execute(
        "DELETE FROM system_vulnerabilities "
        "WHERE system_id = %s AND rh_account_id = %s",
        (system_id, rh_account_id),
    )
    cur.execute(
        "DELETE FROM system_vulnerable_package "
        "WHERE system_id = %s AND rh_account_id = %s",
        (system_id, rh_account_id),
    )

    # Non-partitioned tables
    cur.execute("DELETE FROM system_repo WHERE system_id = %s", (system_id,))
    cur.execute("DELETE FROM system_cve_data WHERE system_id = %s", (system_id,))

    # Try delete system_platform, fall back to “all partitions” delete if needed
    try:
        cur.execute("DELETE FROM system_platform WHERE id = %s", (system_id,))
    except psycopg2.errors.ForeignKeyViolation:
        cur.execute(
            "DELETE FROM system_vulnerabilities WHERE system_id = %s",
            (system_id,),
        )
        cur.execute(
            "DELETE FROM system_vulnerable_package WHERE system_id = %s",
            (system_id,),
        )
        cur.execute("DELETE FROM system_platform WHERE id = %s", (system_id,))

    LOGGER.info(
        "Deleted system with NULL inventory_id: system_id=%s, rh_account_id=%s",
        system_id,
        rh_account_id,
    )
    return True

3. Reuse existing delete_system behavior

Similarly, wrap the existing delete_system call so the loop uses the same error/logging pattern:

def delete_with_inventory_id(cur, inventory_id):
    cur.execute("SELECT deleted_inventory_id FROM delete_system(%s)", (inventory_id,))
    success = cur.fetchone()
    if success:
        return True

    LOGGER.error("Unable to delete inventory_id: %s", inventory_id)
    return False

This keeps all existing behavior (including the partition-aware FK handling) but:

  • Removes the duplicated while True loops.
  • Centralizes error handling and logging in one place.
  • Makes the per-system delete path linear and easier to reason about.

fix(taskomatic): remove systems with null inventory ids
@mtclinton mtclinton force-pushed the remove-system-null-inventory-id branch from fd03f27 to a152235 Compare December 16, 2025 23:15
@mtclinton
Copy link
Contributor Author

/retest

@mtclinton
Copy link
Contributor Author

@jdobes not sure if I should add a new database function to migrations or if this is even needed (based on how FedRamp is testing systems)

@jdobes
Copy link
Member

jdobes commented Jan 2, 2026

Closing as NULL inventory IDs are not supported, they can't occur in database - https://github.com/RedHatInsights/vulnerability-engine/blob/master/database/schema/ve_db_postgresql.sql#L197

@jdobes jdobes closed this Jan 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants