This work can be reproduced by installing conda and the Snakemake environment in
workflow/envs.
For the analysis, install the conda environment in workflow/envs/r.yaml or the
corresponding "pinned", i.e. explicit environment definition file.
The most important/interesting stuff is probably only contained in a handfull
of directories. The workflow/ directory which includes all data processing
scripts as well as workflow definitions for Snakemake. The analysis/ and
results/analysis directories contain the analysis of the data in raw R Markdown and
rendered HTML, respectively.
./
├── analysis/ # (!!!) Analysis conducted in R; rendered to HTML
├── logs/ # Logs from the data processing steps
├── plots/ # (!!!) Plots produced in the workflow
├── reports/ # Snakemake reports and rulegraphs
├── resources/ # Raw-data and other resources
│ ├── adapters/ # Adapters for trimming
│ └── raw_data/ # Raw-data downloaded in here
├── results/ # Results produced in the workflow
│ └── analysis/ # (!!!) Rendered analysis reports
├── workflow/ # (!!!) Workflow definitions
│ ├── envs/ # Conda environments
│ ├── profile/ # Snakemake profile
│ ├── rules/ # Snakemake rules
│ ├── scripts/ # Scripts (Python, R, Bash)
│ ├── config.yaml # Workflow config
│ └── Snakefile # Main Snakefile
└── README.md
- Prefetch SRA files using
prefetchutility (01_preprocessing.smk). - Convert SRA reads to .fastq.gz using
fastq-dump(01_preprocessing.smk).
- Gather all file paths for raw data as well as metadata from in-house datasets using the inhouse_data.R script.
- Convert BAM files to .fastq.gz using bamtofastq (01_preprocessing.smk).
- Quality check of reads using
fastqc. Gather all results for each dataset in amultiqcreport (qc.smk). - Trimming reads using
trimmomatic(02_trimming.smk). - Quality check of trimmed reads using
fastqc. Again, gather results for each dataset withmultiqc(02_trimming.smk, qc.smk). - Align reads to the PICRH genome using
STAR(03_alignment.smk). - Quantify mapped reads using
featureCounts(04_quantification.smk). - Chromatin states enrichments with
chromHMM(05_chromatin-states.smk).
This is mostly done in Quarto files. See directories
analysis/ for .qmd files and results/analysis/ for corresponding HTML
reports.
The corresponding Snakemake rules are defined in
06_analysis.smk.
- Figure 1: analysis/figure1.R
- Figure 2: analysis/figure2.R
- Figure 3: analysis/indeterminate-genes.R
- Figure 4: analysis/figure4.R
- Figure S1: analysis/figureS1.R
- Figure S2: analysis/figureS2.R
- Figure S3: analysis/figureS3.R
- Figure S4: analysis/figureS4.R
- Figure S5: analysis/indeterminate-genes.R
- Figure S6: analysis/figureS6.R
- Figure S7: analysis/06_chromatin-states.qmd