Skip to content

Commit 5c4f4b8

Browse files
authored
Merge pull request #881 from d4straub/add-MetaBinner
add MetaBinner
2 parents b60e1f6 + 96da27f commit 5c4f4b8

28 files changed

+422
-21
lines changed

CHANGELOG.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
### `Added`
99

10+
- [#881](https://github.com/nf-core/mag/pull/881) - Add binner MetaBinner (by @d4straub, insprired by @HeshamAlmessady & @AlphaSquad)
11+
1012
### `Changed`
1113

1214
### `Fixed`
@@ -18,9 +20,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1820

1921
### `Dependencies`
2022

21-
| Tool | Previous version | New version |
22-
| ---- | ---------------- | ----------- |
23-
| | | |
23+
| Tool | Previous version | New version |
24+
| ---------- | ---------------- | ----------- |
25+
| MetaBinner | | 1.4.4-0 |
2426

2527
### `Deprecated`
2628

CITATIONS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,10 @@
5252

5353
> Alneberg, J., Bjarnason, B. S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U. Z., Lahti, L., Loman, N. J., Andersson, A. F., & Quince, C. (2014). Binning metagenomic contigs by coverage and composition. Nature Methods, 11(11), 1144–1146. doi: 10.1038/nmeth.3103
5454
55+
- [MetaBinner](https://doi.org/10.1186/s13059-022-02832-6)
56+
57+
> Wang Z, Huang P, You R, Sun F, Zhu S. MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities. Genome Biol. 2023 Jan 6;24(1):1. doi: 10.1186/s13059-022-02832-6. PMID: 36609515; PMCID: PMC9817263.
58+
5559
- [DAS Tool](https://doi.org/10.1038/s41564-018-0171-1)
5660

5761
> Sieber, C. M. K., et al. 2018. "Recovery of Genomes from Metagenomes via a Dereplication, Aggregation and Scoring Strategy." Nature Microbiology 3 (7): 836-43. doi: 10.1038/s41564-018-0171-1

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ The pipeline then:
4747
- performs assembly using [MEGAHIT](https://github.com/voutcn/megahit) and [SPAdes](http://cab.spbu.ru/software/spades/), and checks their quality using [Quast](http://quast.sourceforge.net/quast)
4848
- (optionally) performs ancient DNA assembly validation using [PyDamage](https://github.com/maxibor/pydamage) and contig consensus sequence recalling with [Freebayes](https://github.com/freebayes/freebayes) and [BCFtools](http://samtools.github.io/bcftools/bcftools.html)
4949
- predicts protein-coding genes for the assemblies using [Prodigal](https://github.com/hyattpd/Prodigal), and bins with [Prokka](https://github.com/tseemann/prokka) and optionally [MetaEuk](https://www.google.com/search?channel=fs&client=ubuntu-sn&q=MetaEuk)
50-
- performs metagenome binning using [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/), [MaxBin2](https://sourceforge.net/projects/maxbin2/), [CONCOCT](https://github.com/BinPro/CONCOCT), and/or [COMEBin](https://github.com/ziyewang/COMEBin)
50+
- performs metagenome binning using [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/), [MaxBin2](https://sourceforge.net/projects/maxbin2/), [CONCOCT](https://github.com/BinPro/CONCOCT), [COMEBin](https://github.com/ziyewang/COMEBin), and/or [MetaBinner](https://github.com/ziyewang/MetaBinner)
5151
- checks the quality of the genome bins using [Busco](https://busco.ezlab.org/), [CheckM](https://ecogenomics.github.io/CheckM/), or [CheckM2](https://github.com/chklovski/CheckM2) and optionally [GUNC](https://grp-bork.embl-community.io/gunc/)
5252
- Performs ancient DNA validation and repair with [pyDamage](https://github.com/maxibor/pydamage) and [freebayes](https://github.com/freebayes/freebayes)
5353
- optionally refines bins with [DAS Tool](https://github.com/cmks/DAS_Tool)

bin/create_metabinner_bins.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
#!/usr/bin/env python
2+
3+
## Originally written by Hesham Almessady (@HeshamAlmessady) and Adrian Fritz (@AlphaSquad) in https://github.com/hzi-bifo/mag and released under the MIT license.
4+
## See git repository (https://github.com/nf-core/mag) for full license text.
5+
6+
import sys
7+
import os
8+
from Bio import SeqIO
9+
10+
def main():
11+
# Argument parsing
12+
if len(sys.argv) != 6:
13+
print("Usage: python create_metabinner_bins.py <binning_file> <fasta_file> <output_path> <prefix> <length_threshold>")
14+
sys.exit(1)
15+
16+
binning = sys.argv[1]
17+
fasta = sys.argv[2]
18+
path = sys.argv[3]
19+
prefix = sys.argv[4]
20+
length = int(sys.argv[5])
21+
22+
# Create output directory if it doesn't exist
23+
os.makedirs(path, exist_ok=True)
24+
25+
# Load binning data into a dictionary
26+
Metabinner_bins = {}
27+
with open(binning, 'r') as b:
28+
for line in b:
29+
contig, bin = line.strip().split('\t')
30+
Metabinner_bins[contig] = bin
31+
32+
# Process the input fasta file
33+
with open(fasta) as handle:
34+
for record in SeqIO.parse(handle, "fasta"):
35+
if len(record) < length:
36+
f = prefix + ".tooShort.fa"
37+
elif record.id not in Metabinner_bins:
38+
f = prefix + ".unbinned.fa"
39+
else:
40+
f = prefix + "." + Metabinner_bins[record.id] + ".fa"
41+
with open(os.path.join(path, f), 'a') as out:
42+
SeqIO.write(record, out, "fasta")
43+
44+
if __name__ == "__main__":
45+
main()

conf/base.config

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,9 @@ process {
169169
withName: COMEBIN_RUNCOMEBIN {
170170
errorStrategy = { task.exitStatus in [1, 255] ? 'ignore' : 'retry' }
171171
}
172+
withName: METABINNER_METABINNER {
173+
errorStrategy = { task.exitStatus in [1, 255] ? 'ignore' : 'retry' }
174+
}
172175
withName: DASTOOL_DASTOOL {
173176
errorStrategy = { task.exitStatus in ((130..145) + 104 + 175) ? 'retry' : task.exitStatus == 1 ? 'ignore' : 'finish' }
174177
}

conf/modules.config

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -824,6 +824,52 @@ process {
824824
ext.prefix = { "${meta.assembler}-COMEBin-${meta.id}" }
825825
}
826826

827+
withName: METABINNER_KMER {
828+
ext.prefix = { "${meta.assembler}-MetaBinner-${meta.id}" }
829+
}
830+
831+
withName: METABINNER_TOOSHORT {
832+
ext.prefix = { "${meta.assembler}-MetaBinner-${meta.id}" }
833+
}
834+
835+
withName: METABINNER_METABINNER {
836+
publishDir = [
837+
[
838+
path: { "${params.outdir}/GenomeBinning/MetaBinner/stats"},
839+
mode: params.publish_dir_mode,
840+
pattern: '*.{log,log.gz,tsv.gz}'
841+
]
842+
]
843+
ext.prefix = { "${meta.assembler}-MetaBinner-${meta.id}" }
844+
ext.args = { "-s ${params.bin_metabinner_scale}" }
845+
}
846+
847+
withName: METABINNER_BINS {
848+
publishDir = [
849+
[
850+
path: { "${params.outdir}/GenomeBinning/MetaBinner/"},
851+
mode: params.publish_dir_mode,
852+
pattern: 'bins/*.fa.gz'
853+
],
854+
[
855+
path: { "${params.outdir}/GenomeBinning/MetaBinner/discarded" },
856+
mode: params.publish_dir_mode,
857+
pattern: '*tooShort.fa.gz'
858+
],
859+
[
860+
path: { "${params.outdir}/GenomeBinning/MetaBinner/discarded" },
861+
mode: params.publish_dir_mode,
862+
pattern: '*lowDepth.fa.gz'
863+
],
864+
[
865+
path: { "${params.outdir}/GenomeBinning/MetaBinner/unbinned" },
866+
mode: params.publish_dir_mode,
867+
pattern: '*unbinned.fa.gz'
868+
]
869+
]
870+
ext.prefix = { "${meta.assembler}-MetaBinner-${meta.id}" }
871+
}
872+
827873
withName: SEQKIT_STATS {
828874
ext.args = ""
829875
publishDir = [enabled: false]

conf/test.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ params {
2929
// Including (even length filtered) CONOCT bins adds another 5 minutes, so we skip it in the default test (testing in assemblyinput)
3030
skip_concoct = true
3131
skip_comebin = true
32+
skip_metabinner = true
3233
busco_db = params.pipelines_testdata_base_path + 'mag/databases/busco/bacteria_odb10.2024-01-08.tar.gz'
3334
busco_db_lineage = 'bacteria_odb10'
3435
busco_clean = true

conf/test_alternatives.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ params {
4040
skip_maxbin2 = true
4141
skip_concoct = true
4242
skip_comebin = true
43+
skip_metabinner = true
4344
skip_metaeuk = true
4445
megahit_fix_cpu_1 = true
4546
}

conf/test_assembly_input.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ params {
4444
gtdbtk_skip_aniscreen = true
4545
skip_concoct = false
4646
skip_comebin = true
47+
skip_metabinner = true
4748

4849
refine_bins_dastool = true
4950
refine_bins_dastool_threshold = 0.0

conf/test_hybrid.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ params {
4040
gtdbtk_skip_aniscreen = true
4141
skip_concoct = true
4242
skip_comebin = true
43+
skip_metabinner = true
4344

4445
spadeshybrid_fix_cpus = 2
4546
}

0 commit comments

Comments
 (0)