This snakemake pipeline identifies ERs from RNA-seq data (bigwig files) using the package ODER (Optimise the Definition of Expressed Regions). The type of detected ERs is annotated by comparing to known annotation (from the provided GTF), and the ERs are associated to their nearest gene. From this dataset, unannotated intergenic ERs within 10 kb of a protein-coding gene are selected.
- Ensembl GTF annotation: http://ftp.ensembl.org/pub/current_gtf/homo_sapiens/
- Aligned RNA-seq reads in bigwig format. Multiple RNA-seq replicates can be provided. Please note that contig names in the bigwig should be Ensembl style:
1,2....MT. - chromosome lengths are required by the code. A file containing chromosome lengths for hg38 is provided in
/dataand is automatically used by the pipeline.
The main output files generated by the pipeline are:
ERs/<sample_name>_ers_raw.txt- detected ERs with meta-dataIntergenic_ERs/<sample_name>_3prime_intergenic_ers.txt- filtered 3' intergenic ERsIntergenic_ERs/<sample_name>_5prime_intergenic_ers.txt- filtered 5' intergenic ERs
- miniconda
- snakemake - can be installed via conda (
snakemake>=5.3) - The rest of the dependencies (R packages) are installed via conda.
Clone the pipeline:
git clone --recursive https://github.com/sid-sethi/Generate-ERs.gitEdit config.yml to set up the working directory and input files. snakemake command should be issued from within the pipeline directory.
cd Generate-ERs
snakemake --use-conda -j <num_cores> allIf you provide more than one core, independent snakemake rules will be processed simultaneously. This pipeline only uses 2 cores at most. It is a good idea to do a dry run (using -n parameter) to view what would be done by the pipeline before executing the pipeline.
snakemake --use-conda -n allSnakemake can be run to only install the required conda environments without running the full workflow. Subsequent runs with --use-conda will make use of the local environments without requiring internet access. This is suitable for running the pipeline offline.
snakemake --use-conda --conda-create-envs-onlyCopyright 2020 Astex Therapeutics Ltd.
This repository is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This repository is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the LICENSE file (GNU General Public License) for more details.