Skip to content

sid-sethi/Generate-ERs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pipeline for generating expressed regions (ERs) and selecting optimal unannotated intergenic ERs

This snakemake pipeline identifies ERs from RNA-seq data (bigwig files) using the package ODER (Optimise the Definition of Expressed Regions). The type of detected ERs is annotated by comparing to known annotation (from the provided GTF), and the ERs are associated to their nearest gene. From this dataset, unannotated intergenic ERs within 10 kb of a protein-coding gene are selected.

Getting Started

Input

  • Ensembl GTF annotation: http://ftp.ensembl.org/pub/current_gtf/homo_sapiens/
  • Aligned RNA-seq reads in bigwig format. Multiple RNA-seq replicates can be provided. Please note that contig names in the bigwig should be Ensembl style: 1, 2.... MT.
  • chromosome lengths are required by the code. A file containing chromosome lengths for hg38 is provided in /data and is automatically used by the pipeline.

Output

The main output files generated by the pipeline are:

  • ERs/<sample_name>_ers_raw.txt - detected ERs with meta-data
  • Intergenic_ERs/<sample_name>_3prime_intergenic_ers.txt - filtered 3' intergenic ERs
  • Intergenic_ERs/<sample_name>_5prime_intergenic_ers.txt - filtered 5' intergenic ERs

Depedencies

  • miniconda
  • snakemake - can be installed via conda (snakemake>=5.3)
  • The rest of the dependencies (R packages) are installed via conda.

Installation

Clone the pipeline:

git clone --recursive https://github.com/sid-sethi/Generate-ERs.git

Usage

Edit config.yml to set up the working directory and input files. snakemake command should be issued from within the pipeline directory.

cd Generate-ERs
snakemake --use-conda -j <num_cores> all

If you provide more than one core, independent snakemake rules will be processed simultaneously. This pipeline only uses 2 cores at most. It is a good idea to do a dry run (using -n parameter) to view what would be done by the pipeline before executing the pipeline.

snakemake --use-conda -n all

Snakemake can be run to only install the required conda environments without running the full workflow. Subsequent runs with --use-conda will make use of the local environments without requiring internet access. This is suitable for running the pipeline offline.

snakemake --use-conda --conda-create-envs-only

Licence

Copyright 2020 Astex Therapeutics Ltd.

This repository is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This repository is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the LICENSE file (GNU General Public License) for more details.

About

Snakemake pipeline for calling unannotated Expressed Regions (ERs)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages