Parse kraken by indentation #23

Ulthran · 2025-10-27T17:40:39Z

No description provided.

Copilot

Pull Request Overview

This PR refactors the Kraken2 report parsing logic to use indentation-based depth calculation instead of rank-based hierarchical tracking. The change simplifies the lineage tracking by inferring taxonomic depth from leading spaces in the scientific name field.

Key Changes:

Replaced rank-based parsing with indentation-based depth calculation
Simplified lineage stack management using depth instead of rank order
Removed rank filtering and global rank order dictionaries

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`scripts/summarize_kraken2_reports.py`	Refactored parsing to calculate depth from indentation, simplified lineage tracking logic, and moved Snakemake execution code into conditional block
`scripts/test_summarize_kraken2_reports.py`	Updated import path and added assertion for debugging

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-27T17:41:22Z

scripts/test_summarize_kraken2_reports.py

    reports, _ = reports
    report = reports[0]
    parsed_report = parse_kraken2_tsv_report(open(report))
+    assert False


This assert False statement will cause the test to always fail. Remove this debug assertion before merging.

Suggested change

assert False

Copilot · 2025-10-27T17:41:22Z

scripts/summarize_kraken2_reports.py

+
+def consensus_lineage_str(lineage_stack: list[str]) -> str:
+    # Fill missing levels with empty placeholders if needed
+    full_lineage = lineage_stack + [f"__"] * (max(7 - len(lineage_stack), 0))


The magic number 7 represents the expected taxonomic levels but is unexplained. Consider defining it as a named constant (e.g., EXPECTED_TAXONOMIC_LEVELS = 7) to improve code clarity.

Copilot · 2025-10-27T17:41:23Z

scripts/summarize_kraken2_reports.py

+            if len(lineage_stack) <= depth:
+                lineage_stack.extend(["__"] * (depth - len(lineage_stack) + 1))
+            lineage_stack = lineage_stack[: depth + 1]
+            lineage_stack[depth] = f"__{name}"


The lineage format is inconsistent with the placeholder format. Placeholders use '__' but actual entries use '__{name}', creating strings like '__Species_name'. This should likely be f'{rank}__{name}' to match standard taxonomic notation (e.g., 's__Species_name'), but the rank information is no longer available in the current implementation.

Parse kraken by indentation

2d6846d

Copilot AI review requested due to automatic review settings October 27, 2025 17:40

Copilot AI reviewed Oct 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parse kraken by indentation #23

Parse kraken by indentation #23

Uh oh!

Ulthran commented Oct 27, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 27, 2025

Uh oh!

Copilot AI Oct 27, 2025

Uh oh!

Copilot AI Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Parse kraken by indentation #23

Are you sure you want to change the base?

Parse kraken by indentation #23

Uh oh!

Conversation

Ulthran commented Oct 27, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants