Skip to content

Conversation

@yumisims
Copy link
Contributor

…ages db

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@sanger-tolsoft
Copy link
Contributor

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.2.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@github-actions
Copy link

github-actions bot commented Sep 22, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit f999572

+| ✅ 187 tests passed       |+
#| ❔  28 tests were ignored |#
!| ❗  19 tests had warnings |!

❗ Test warnings:

❔ Tests ignored:

  • files_exist - File is ignored: CODE_OF_CONDUCT.md
  • files_exist - File is ignored: assets/nf-core-blobtoolkit_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-blobtoolkit_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-blobtoolkit_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: conf/igenomes.config
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File does not exist: .github/ISSUE_TEMPLATE/config.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting_comment.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File ignored due to lint config: assets/nf-core-blobtoolkit_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-blobtoolkit_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-blobtoolkit_logo_dark.png
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/blobtoolkit/blobtoolkit/.github/workflows/awstest.yml
  • template_strings - template_strings
  • merge_markers - merge_markers

✅ Tests passed:

Run details

  • nf-core/tools version 3.2.1
  • Run at 2025-10-01 17:34:15

Copy link
Member

@muffato muffato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice usage of Groovy !

Usually the .nf files contain:

  1. imports
  2. workflow
  3. functions

Could you move the functions after the workflow block ?

// Direct file provided - validate it's a .nal file
if (path_file.name.endsWith('.nal')) {
log.info "Using directly specified BLAST database: ${path_file}"
return path_file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the path here leads to some problems downstream because other processes expect a directory.

For instance, if you run the test profile with --blastn /data/tol/resources/nt/latest/nt.nal, generate_config.py will fail:

Command error:
  Traceback (most recent call last):
    File "/nfs/users/nfs_m/mm49/workspace/tol-it/nextflow/sanger-tol/blobtoolkit_param/bin/generate_config.py", line 399, in <module>
      sys.exit(main())
    File "/nfs/users/nfs_m/mm49/workspace/tol-it/nextflow/sanger-tol/blobtoolkit_param/bin/generate_config.py", line 377, in main
      taxon_id = adjust_taxon_id(args.nt, taxon_info)
    File "/nfs/users/nfs_m/mm49/workspace/tol-it/nextflow/sanger-tol/blobtoolkit_param/bin/generate_config.py", line 236, in adjust_taxon_id
      con = sqlite3.connect(os.path.join(nt, "taxonomy4blast.sqlite3"))
  sqlite3.OperationalError: unable to open database file

It could be modified to find the file taxonomy4blast.sqlite3 in the directory that contains nt.nal, but I suspect other things will fail too.
Especially, blastn needs the entire directory to be staged in. I fear it won't work if the channel only has the file path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so the function should accept direct .nal file paths and returns the parent directory to ensure all associated files are available. I have added validation logic which to check file existance before proceeding. Testing now.

@yumisims yumisims requested a review from muffato September 23, 2025 14:29
* Function to validate and resolve BLAST nucleotide database paths
* Handles both directory paths (for backwards compatibility) and direct .nal file paths
*/
def validateBlastnDatabase(db_path) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a problem with the way the function is used. When people mix multiple Blast databases under the same directory (like in #184), yes, they can now refer to one .nal specifically, but if only the parent directory makes it to the blastn job, the module will then still pick up all the .nal files.
I think there should be another parameter coming out of the function, to record the name of the .nal file, and pass it down to the module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the db_name parameter is now implemented and flows from the validation function to the BLAST module.

Copy link
Member

@muffato muffato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you test the Blast db symlinking ? I'm getting this error in my tests

ERROR ~ Unknown method invocation `createLink` on UnixPath type

def db_name = path_file.name.replaceAll('\\.nal$', '')

// Create a temporary directory with symlinks to only the specified database files
def temp_dir = file("${parent_dir}/.btk_isolated_${db_name}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we may not always have write access to that directory (for instance, in production we don't – /data/tol/resources/nt).
Maybe we can use the work dir (I think there's a Nextflow variable that tells what the work dir is). But then, my concern is that the symlinks may not be resolved by Nextflow, and the files may not be mounted in the container.

// Create a temporary directory with symlinks to only the specified database files
def temp_dir = file("${parent_dir}/.btk_isolated_${db_name}")
if (!temp_dir.exists()) {
temp_dir.mkdirs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funnily, this doesn't seem to complain if the directory can't be created. It just carries on (and the pipeline fails later).

$ nextflow run -profile sanger,singularity,test --blastn /data/tol/resources/nt/latest/nt.nal
(...)
Direct BLAST database file specified: /data/tol/resources/nt/latest/nt.nal
Database name: nt
Created isolated directory: /data/tol/resources/nt/latest/.btk_isolated_nt
This ensures only the specified database is available to BLAST
(...)
Execution cancelled -- Finishing pending tasks before exit
-[sanger-tol/blobtoolkit] Pipeline completed with errors-
(...)
Command error:
(...)
    File "/nfs/users/nfs_m/mm49/workspace/tol-it/nextflow/sanger-tol/blobtoolkit_param/bin/generate_config.py", line 230, in adjust_taxon_id
      con = sqlite3.connect(os.path.join(nt, "taxonomy4blast.sqlite3"))
  sqlite3.OperationalError: unable to open database file
(...)
$ ls -ld /data/tol/resources/nt/latest/.btk_isolated_nt
ls: cannot access '/data/tol/resources/nt/latest/.btk_isolated_nt': No such file or directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants