You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-5Lines changed: 12 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,13 +14,16 @@ and the fraction of reads assigned to each taxon.
14
14
15
15
Slacken is based on Apache Spark and is thus a distributed application. It can run on a single machine, but can
16
16
also scale to a cluster with hundreds or thousands of machines. It does not keep all data in RAM during processing, but
17
-
processes data in batches.
17
+
processes data in batches. On a 16-core PC, Slacken needs only 16 GB of RAM to classify with the genomes from the Kraken 2 standard library.
18
18
19
-
We do not currently support translated mode (protein/AA sequence classification) but only nucleotide sequences. Also,
19
+
Unfortunately, Slacken does not currently support translated mode (protein/AA sequence classification) but only nucleotide sequences. Also,
20
20
Slacken has its own database format (Parquet based) and can not use pre-built Kraken 2 databases as they are.
21
21
22
22
For more motivation and details, please see [our 2025 paper in NAR Genomics and Bioinformatics](https://academic.oup.com/nargab/article/7/2/lqaf076/8158581).
23
23
24
+
**Users of version 1.x, please note the new command line syntax in version 2.0.** All commands and examples in this
25
+
README and on the Wiki have been updated. [See the commands overview.](https://github.com/JNP-Solutions/Slacken/wiki/Slacken-commands-overview)
26
+
24
27
Copyright (c) Johan Nyström-Persson 2019-2025.
25
28
26
29
## Contents
@@ -168,7 +171,7 @@ Here,
168
171
*`--reads 100` is the threshold for including a taxon in the initial set (R100).
169
172
*`-l /data/standard-224c` is required, and indicates where genomes for library building may be found.
170
173
*`--bracken-length 150` specifies that Bracken weights for the given read length (150) should be generated. That can be slow, and
171
-
also requires extra space, so we recommend omitting `--bracken-length` when Bracken is not needed.
174
+
also requires extra space, so we recommend omitting `--bracken-length` when Bracken is not needed. When generating Bracken weights, we recommend giving Slacken at least 32 GB of RAM.
172
175
173
176
When the command has finished, the following files will be generated:
174
177
@@ -404,13 +407,17 @@ These options may also be permanently configured by editing `slacken.sh`.
404
407
405
408
Slacken can run on AWS EMR (Elastic MapReduce) and should also work similarly on other commercial cloud providers
406
409
that support Apache Spark. In this scenario, data can be stored on AWS S3 and the computation can run on a mix of
407
-
on-demand and spot (interruptible) instances. We refer the reader to the AWS EMR documentation for more details.
410
+
on-demand and spot (interruptible) instances.
411
+
412
+
A [tutorial on Slacken with AWS EMR](https://github.com/JNP-Solutions/Slacken/wiki/Classifying-metagenomic-samples-on-AWS-Elastic-MapReduce)
413
+
is available. The tutorial shows how to use Slacken to classify samples using the public indexes on AWS S3.
408
414
409
415
The cluster configuration we generally recommend is 4 GB RAM per CPU (but 2 GB per CPU may be enough for small workloads).
410
416
For large workloads, the worker nodes should have fast physical hard drives, such as NVMe. On EMR Spark will automatically use
411
417
these drives for temporary space. We have found the m7gd and m6gd machine families to work well.
412
418
413
-
To run on AWS EMR, first, install the AWS CLI.
419
+
The tutorial above shows how to run Slacken using the EMR GUI. You can also run it on EMR from the command line.
420
+
To do this, first install the [AWS CLI](https://aws.amazon.com/cli/).
414
421
Copy `slacken-aws.sh.template` to a new file, e.g. `slacken-aws.sh` and edit the file to configure
415
422
some settings such as the S3 bucket to use for the Slacken jar. Then, create the AWS EMR cluster. You will receive a
416
423
cluster ID, either from the web GUI or from the CLI. Set the `AWS_EMR_CLUSTER` environment variable to this id:
0 commit comments