Skip to content

Conversation

@Ulthran
Copy link
Collaborator

@Ulthran Ulthran commented Oct 27, 2025

No description provided.

Copilot AI review requested due to automatic review settings October 27, 2025 17:40
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the Kraken2 report parsing logic to use indentation-based depth calculation instead of rank-based hierarchical tracking. The change simplifies the lineage tracking by inferring taxonomic depth from leading spaces in the scientific name field.

Key Changes:

  • Replaced rank-based parsing with indentation-based depth calculation
  • Simplified lineage stack management using depth instead of rank order
  • Removed rank filtering and global rank order dictionaries

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
scripts/summarize_kraken2_reports.py Refactored parsing to calculate depth from indentation, simplified lineage tracking logic, and moved Snakemake execution code into conditional block
scripts/test_summarize_kraken2_reports.py Updated import path and added assertion for debugging

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

reports, _ = reports
report = reports[0]
parsed_report = parse_kraken2_tsv_report(open(report))
assert False
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert False statement will cause the test to always fail. Remove this debug assertion before merging.

Suggested change
assert False

Copilot uses AI. Check for mistakes.

def consensus_lineage_str(lineage_stack: list[str]) -> str:
# Fill missing levels with empty placeholders if needed
full_lineage = lineage_stack + [f"__"] * (max(7 - len(lineage_stack), 0))
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 7 represents the expected taxonomic levels but is unexplained. Consider defining it as a named constant (e.g., EXPECTED_TAXONOMIC_LEVELS = 7) to improve code clarity.

Copilot uses AI. Check for mistakes.
if len(lineage_stack) <= depth:
lineage_stack.extend(["__"] * (depth - len(lineage_stack) + 1))
lineage_stack = lineage_stack[: depth + 1]
lineage_stack[depth] = f"__{name}"
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lineage format is inconsistent with the placeholder format. Placeholders use '__' but actual entries use '__{name}', creating strings like '__Species_name'. This should likely be f'{rank}__{name}' to match standard taxonomic notation (e.g., 's__Species_name'), but the rank information is no longer available in the current implementation.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants