Skip to content

Conversation

@hverdonk
Copy link
Collaborator

Fixes #1

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, add a test dataset to the test_data directory.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • [N/A] Usage Documentation in docs/usage.md is updated.
  • [N/A] Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

What changed

  • Added a dedicated module to sanitize user-provided foreground taxa lists
    • Introduced the cleanforegroundlist module (CLEAN_FOREGROUND_LIST) to convert “messy” IDs in the foreground taxa list into HyPhy-compatible IDs by replacing non-alphanumeric characters with underscores.
    • Updated/added module tests and refreshed snapshots to reflect the module’s actual runtime environment / versions.
  • Integrated foreground list sanitization into the viral non-recombinant subworkflow
    • Updated subworkflows/local/process_viral_nonrecombinant/main.nf to run CLEAN_FOREGROUND_LIST when --foreground_list is provided and pass the sanitized list into downstream labeling steps.
    • Updated the associated nf-tests to cover lists with “messy” unsanitized taxa IDs
  • Hardened parameter validation for foreground_regexp
    • Added validation in subworkflows/local/utils_nfcore_capheine_pipeline/main.nf
      • --foreground_regexp can no longer be an empty string (e.g. "" / whitespace).
      • tightened handling around invalid types / invalid foreground list paths
  • Docs + test data updates
    • Updated README parameter docs to clarify that matching/sanitization uses underscore escaping.
    • Added new test data for messy foreground sequences (pretend-foreground-seqs-messy.txt).
    • Cleaned up and re-enabled/adjusted relevant workflow/subworkflow tests after introducing the sanitization step.

Why

  • Ensures foreground sequence selection works reliably with HyPhy’s sanitized taxon naming conventions.
  • Prevents confusing “provided but empty” configurations for --foreground_regexp, which previously could slip through and cause failures after the pipeline had run several modules.
  • Improves test coverage for realistic “messy ID” inputs and keeps module metadata up-to-date.

Testing / Validation

  • nf-test module tests for cleanforegroundlist updated and passing (snapshots generated).
  • Subworkflow nf-tests updated to include more foreground list + messy foreground list + regexp scenarios.

hverdonk and others added 30 commits October 29, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants