-
Notifications
You must be signed in to change notification settings - Fork 51
fix(taskomatic): remove systems with null inventory ids #2106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(taskomatic): remove systems with null inventory ids #2106
Conversation
|
Commits missing Jira IDs: |
Reviewer's GuideThis PR updates the taskomatic cleanup job to ensure that systems with null inventory_ids are also fully cleaned up, by tightening the existing deletion query to only process valid inventory_ids and adding a new deletion loop that safely removes legacy/orphaned systems lacking inventory_ids along with their related records. ER diagram for cleanup of systems with null inventory_iderDiagram
system_platform {
bigint id PK
uuid inventory_id
bigint rh_account_id
timestamp when_deleted
boolean opt_out
}
system_vulnerabilities {
bigint id PK
bigint system_id FK
bigint rh_account_id
text vulnerability_data
}
system_vulnerable_package {
bigint id PK
bigint system_id FK
bigint rh_account_id
text package_data
}
system_repo {
bigint id PK
bigint system_id FK
text repo_data
}
system_cve_data {
bigint id PK
bigint system_id FK
text cve_data
}
system_platform ||--o{ system_vulnerabilities : system_id
system_platform ||--o{ system_vulnerable_package : system_id
system_platform ||--o{ system_repo : system_id
system_platform ||--o{ system_cve_data : system_id
Flow diagram for updated system deletion job logicflowchart TD
A[start_run] --> B[open_db_connection]
B --> C[init_deleted_counter]
subgraph Loop_valid_inventory_id
C --> D[query_next_system_where_when_deleted_threshold_and_inventory_id_not_null]
D -->|no_row| H[proceed_to_null_inventory_loop]
D -->|row_found| E[delete_system_with_inventory_id_via_existing_logic]
E --> F[increment_deleted_counter]
F --> G[commit_transaction]
G --> D
end
H --> I[query_next_system_where_when_deleted_threshold_and_inventory_id_null]
subgraph Loop_null_inventory_id
I -->|no_row| O[close_cursor_and_connection]
I -->|row_found| J[load_system_id_and_rh_account_id]
J --> K[update_system_platform_set_opt_out_true]
K --> L[delete_from_system_vulnerabilities_by_system_id_and_rh_account_id]
L --> M[delete_from_system_vulnerable_package_by_system_id_and_rh_account_id]
M --> N[delete_from_system_repo_by_system_id]
N --> P[delete_from_system_cve_data_by_system_id]
P --> Q{delete_from_system_platform_by_id}
Q -->|success| S[increment_deleted_counter]
Q -->|foreign_key_violation| R[delete_from_system_vulnerabilities_all_partitions_and_system_vulnerable_package_all_partitions_then_delete_system_platform]
R --> S
S --> T[log_success_for_null_inventory_system]
T --> U[commit_transaction]
U --> I
end
O --> V[end_run]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The two
while Truecleanup loops share a lot of structure (time threshold, single-row SELECT withFOR UPDATE, per-row commit, logging); consider extracting a helper to reduce duplication and make the deletion strategies (with/without inventory_id) easier to reason about and maintain. - The nested
tryfor handlingpsycopg2.errors.ForeignKeyViolationcould be made clearer and safer by using an explicit savepoint/rollback to avoid partially-applied statements in the outertryand by documenting why a second round of deletions withoutrh_account_idis necessary. - Committing inside each loop iteration can be expensive and may lead to inconsistent batches; consider processing multiple rows per transaction (e.g., by increasing the
LIMITor batching deletes) to improve performance while still keeping transactions small.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The two `while True` cleanup loops share a lot of structure (time threshold, single-row SELECT with `FOR UPDATE`, per-row commit, logging); consider extracting a helper to reduce duplication and make the deletion strategies (with/without inventory_id) easier to reason about and maintain.
- The nested `try` for handling `psycopg2.errors.ForeignKeyViolation` could be made clearer and safer by using an explicit savepoint/rollback to avoid partially-applied statements in the outer `try` and by documenting why a second round of deletions without `rh_account_id` is necessary.
- Committing inside each loop iteration can be expensive and may lead to inconsistent batches; consider processing multiple rows per transaction (e.g., by increasing the `LIMIT` or batching deletes) to improve performance while still keeping transactions small.
## Individual Comments
### Comment 1
<location> `taskomatic/jobs/delete_systems.py:31` </location>
<code_context>
deleted = 0
+ # Delete systems with valid inventory_id
while True:
curr_time = datetime.now(tz=pytz.utc)
cur.execute(
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the two deletion loops into shared helper functions that encapsulate the looping, per-row deletion, and error handling logic to reduce duplication and flatten control flow.
You can keep the new behavior but reduce complexity by extracting the common loop structure and the per-system delete logic into helpers. That removes duplication, flattens control flow, and centralizes error handling.
### 1. Extract shared cursor loop
Both loops share the same pattern: “select one row with `FOR UPDATE`, process it, commit, repeat”. You can move that into a small helper to avoid the duplicated `while True` logic:
```python
def process_deletions(cur, select_sql, select_params_fn, handle_row_fn):
deleted = 0
while True:
curr_time = datetime.now(tz=pytz.utc)
cur.execute(select_sql, select_params_fn(curr_time))
row = cur.fetchone()
if not row:
break
try:
if handle_row_fn(row):
deleted += 1
except Exception as e:
LOGGER.error("Error processing row %s: %s", row, e)
cur.connection.commit()
return deleted
```
Then use it for both cases:
```python
# 1) inventory_id not null
deleted += process_deletions(
cur,
"""SELECT inventory_id FROM system_platform sp
WHERE when_deleted IS NOT NULL
AND when_deleted < %s
AND inventory_id IS NOT NULL
LIMIT 1 FOR UPDATE OF sp""",
lambda curr_time: (curr_time - timedelta(hours=CFG.system_deletion_threshold),),
lambda row: delete_with_inventory_id(cur, row[0]),
)
# 2) inventory_id is null
deleted += process_deletions(
cur,
"""SELECT id, rh_account_id FROM system_platform sp
WHERE when_deleted IS NOT NULL
AND when_deleted < %s
AND inventory_id IS NULL
LIMIT 1 FOR UPDATE OF sp""",
lambda curr_time: (curr_time - timedelta(hours=CFG.system_deletion_threshold),),
lambda row: delete_orphan_system(cur, row[0], row[1]),
)
```
### 2. Extract per-system delete logic for NULL `inventory_id`
Move the “delete all data for system X” sequence into its own function. This keeps the main loop linear and removes nested `try/except`:
```python
def delete_orphan_system(cur, system_id, rh_account_id):
# Mark as opt_out to match current behavior
cur.execute(
"UPDATE system_platform SET opt_out = true WHERE id = %s",
(system_id,),
)
# Delete partitioned data first (with account filter)
cur.execute(
"DELETE FROM system_vulnerabilities "
"WHERE system_id = %s AND rh_account_id = %s",
(system_id, rh_account_id),
)
cur.execute(
"DELETE FROM system_vulnerable_package "
"WHERE system_id = %s AND rh_account_id = %s",
(system_id, rh_account_id),
)
# Non-partitioned tables
cur.execute("DELETE FROM system_repo WHERE system_id = %s", (system_id,))
cur.execute("DELETE FROM system_cve_data WHERE system_id = %s", (system_id,))
# Try delete system_platform, fall back to “all partitions” delete if needed
try:
cur.execute("DELETE FROM system_platform WHERE id = %s", (system_id,))
except psycopg2.errors.ForeignKeyViolation:
cur.execute(
"DELETE FROM system_vulnerabilities WHERE system_id = %s",
(system_id,),
)
cur.execute(
"DELETE FROM system_vulnerable_package WHERE system_id = %s",
(system_id,),
)
cur.execute("DELETE FROM system_platform WHERE id = %s", (system_id,))
LOGGER.info(
"Deleted system with NULL inventory_id: system_id=%s, rh_account_id=%s",
system_id,
rh_account_id,
)
return True
```
### 3. Reuse existing `delete_system` behavior
Similarly, wrap the existing `delete_system` call so the loop uses the same error/logging pattern:
```python
def delete_with_inventory_id(cur, inventory_id):
cur.execute("SELECT deleted_inventory_id FROM delete_system(%s)", (inventory_id,))
success = cur.fetchone()
if success:
return True
LOGGER.error("Unable to delete inventory_id: %s", inventory_id)
return False
```
This keeps all existing behavior (including the partition-aware FK handling) but:
- Removes the duplicated `while True` loops.
- Centralizes error handling and logging in one place.
- Makes the per-system delete path linear and easier to reason about.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| deleted = 0 | ||
|
|
||
| # Delete systems with valid inventory_id | ||
| while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider refactoring the two deletion loops into shared helper functions that encapsulate the looping, per-row deletion, and error handling logic to reduce duplication and flatten control flow.
You can keep the new behavior but reduce complexity by extracting the common loop structure and the per-system delete logic into helpers. That removes duplication, flattens control flow, and centralizes error handling.
1. Extract shared cursor loop
Both loops share the same pattern: “select one row with FOR UPDATE, process it, commit, repeat”. You can move that into a small helper to avoid the duplicated while True logic:
def process_deletions(cur, select_sql, select_params_fn, handle_row_fn):
deleted = 0
while True:
curr_time = datetime.now(tz=pytz.utc)
cur.execute(select_sql, select_params_fn(curr_time))
row = cur.fetchone()
if not row:
break
try:
if handle_row_fn(row):
deleted += 1
except Exception as e:
LOGGER.error("Error processing row %s: %s", row, e)
cur.connection.commit()
return deletedThen use it for both cases:
# 1) inventory_id not null
deleted += process_deletions(
cur,
"""SELECT inventory_id FROM system_platform sp
WHERE when_deleted IS NOT NULL
AND when_deleted < %s
AND inventory_id IS NOT NULL
LIMIT 1 FOR UPDATE OF sp""",
lambda curr_time: (curr_time - timedelta(hours=CFG.system_deletion_threshold),),
lambda row: delete_with_inventory_id(cur, row[0]),
)
# 2) inventory_id is null
deleted += process_deletions(
cur,
"""SELECT id, rh_account_id FROM system_platform sp
WHERE when_deleted IS NOT NULL
AND when_deleted < %s
AND inventory_id IS NULL
LIMIT 1 FOR UPDATE OF sp""",
lambda curr_time: (curr_time - timedelta(hours=CFG.system_deletion_threshold),),
lambda row: delete_orphan_system(cur, row[0], row[1]),
)2. Extract per-system delete logic for NULL inventory_id
Move the “delete all data for system X” sequence into its own function. This keeps the main loop linear and removes nested try/except:
def delete_orphan_system(cur, system_id, rh_account_id):
# Mark as opt_out to match current behavior
cur.execute(
"UPDATE system_platform SET opt_out = true WHERE id = %s",
(system_id,),
)
# Delete partitioned data first (with account filter)
cur.execute(
"DELETE FROM system_vulnerabilities "
"WHERE system_id = %s AND rh_account_id = %s",
(system_id, rh_account_id),
)
cur.execute(
"DELETE FROM system_vulnerable_package "
"WHERE system_id = %s AND rh_account_id = %s",
(system_id, rh_account_id),
)
# Non-partitioned tables
cur.execute("DELETE FROM system_repo WHERE system_id = %s", (system_id,))
cur.execute("DELETE FROM system_cve_data WHERE system_id = %s", (system_id,))
# Try delete system_platform, fall back to “all partitions” delete if needed
try:
cur.execute("DELETE FROM system_platform WHERE id = %s", (system_id,))
except psycopg2.errors.ForeignKeyViolation:
cur.execute(
"DELETE FROM system_vulnerabilities WHERE system_id = %s",
(system_id,),
)
cur.execute(
"DELETE FROM system_vulnerable_package WHERE system_id = %s",
(system_id,),
)
cur.execute("DELETE FROM system_platform WHERE id = %s", (system_id,))
LOGGER.info(
"Deleted system with NULL inventory_id: system_id=%s, rh_account_id=%s",
system_id,
rh_account_id,
)
return True3. Reuse existing delete_system behavior
Similarly, wrap the existing delete_system call so the loop uses the same error/logging pattern:
def delete_with_inventory_id(cur, inventory_id):
cur.execute("SELECT deleted_inventory_id FROM delete_system(%s)", (inventory_id,))
success = cur.fetchone()
if success:
return True
LOGGER.error("Unable to delete inventory_id: %s", inventory_id)
return FalseThis keeps all existing behavior (including the partition-aware FK handling) but:
- Removes the duplicated
while Trueloops. - Centralizes error handling and logging in one place.
- Makes the per-system delete path linear and easier to reason about.
fix(taskomatic): remove systems with null inventory ids
fd03f27 to
a152235
Compare
|
/retest |
|
@jdobes not sure if I should add a new database function to migrations or if this is even needed (based on how FedRamp is testing systems) |
|
Closing as NULL inventory IDs are not supported, they can't occur in database - https://github.com/RedHatInsights/vulnerability-engine/blob/master/database/schema/ve_db_postgresql.sql#L197 |
fix(taskomatic): remove systems with null inventory ids
Secure Coding Practices Checklist GitHub Link
Secure Coding Checklist
Summary by Sourcery
Bug Fixes: