Pipeline for generating expressed regions (ERs) and selecting optimal unannotated intergenic ERs

This snakemake pipeline identifies ERs from RNA-seq data (bigwig files) using the package ODER (Optimise the Definition of Expressed Regions). The type of detected ERs is annotated by comparing to known annotation (from the provided GTF), and the ERs are associated to their nearest gene. From this dataset, unannotated intergenic ERs within 10 kb of a protein-coding gene are selected.

Getting Started

Input

Ensembl GTF annotation: http://ftp.ensembl.org/pub/current_gtf/homo_sapiens/
Aligned RNA-seq reads in bigwig format. Multiple RNA-seq replicates can be provided. Please note that contig names in the bigwig should be Ensembl style: 1, 2.... MT.
chromosome lengths are required by the code. A file containing chromosome lengths for hg38 is provided in /data and is automatically used by the pipeline.

Output

The main output files generated by the pipeline are:

ERs/<sample_name>_ers_raw.txt - detected ERs with meta-data
Intergenic_ERs/<sample_name>_3prime_intergenic_ers.txt - filtered 3' intergenic ERs
Intergenic_ERs/<sample_name>_5prime_intergenic_ers.txt - filtered 5' intergenic ERs

Depedencies

miniconda
snakemake - can be installed via conda (snakemake>=5.3)
The rest of the dependencies (R packages) are installed via conda.

Installation

Clone the pipeline:

git clone --recursive https://github.com/sid-sethi/Generate-ERs.git

Usage

Edit config.yml to set up the working directory and input files. snakemake command should be issued from within the pipeline directory.

cd Generate-ERs
snakemake --use-conda -j <num_cores> all

If you provide more than one core, independent snakemake rules will be processed simultaneously. This pipeline only uses 2 cores at most. It is a good idea to do a dry run (using -n parameter) to view what would be done by the pipeline before executing the pipeline.

snakemake --use-conda -n all

Snakemake can be run to only install the required conda environments without running the full workflow. Subsequent runs with --use-conda will make use of the local environments without requiring internet access. This is suitable for running the pipeline offline.

snakemake --use-conda --conda-create-envs-only

Licence

This repository is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This repository is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the LICENSE file (GNU General Public License) for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
envs		envs
scripts		scripts
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
snakefile		snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pipeline for generating expressed regions (ERs) and selecting optimal unannotated intergenic ERs

Getting Started

Input

Output

Depedencies

Installation

Usage

Licence

About

Uh oh!

Releases

Packages

Languages

License

sid-sethi/Generate-ERs

Folders and files

Latest commit

History

Repository files navigation

Pipeline for generating expressed regions (ERs) and selecting optimal unannotated intergenic ERs

Getting Started

Input

Output

Depedencies

Installation

Usage

Licence

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages