Skip to content

NBorthLab/CHO-coding-transcriptome

Repository files navigation

CHO Coding Transcriptomes

This work can be reproduced by installing conda and the Snakemake environment in workflow/envs. For the analysis, install the conda environment in workflow/envs/r.yaml or the corresponding "pinned", i.e. explicit environment definition file.

File structure

The most important/interesting stuff is probably only contained in a handfull of directories. The workflow/ directory which includes all data processing scripts as well as workflow definitions for Snakemake. The analysis/ and results/analysis directories contain the analysis of the data in raw R Markdown and rendered HTML, respectively.

./
├── analysis/                 # (!!!) Analysis conducted in R; rendered to HTML
├── logs/                     # Logs from the data processing steps
├── plots/                    # (!!!) Plots produced in the workflow
├── reports/                  # Snakemake reports and rulegraphs
├── resources/                # Raw-data and other resources
│   ├── adapters/             # Adapters for trimming
│   └── raw_data/             # Raw-data downloaded in here
├── results/                  # Results produced in the workflow
│   └── analysis/             # (!!!) Rendered analysis reports
├── workflow/                 # (!!!) Workflow definitions
│   ├── envs/                 # Conda environments
│   ├── profile/              # Snakemake profile
│   ├── rules/                # Snakemake rules
│   ├── scripts/              # Scripts (Python, R, Bash)
│   ├── config.yaml           # Workflow config
│   └── Snakefile             # Main Snakefile
└── README.md

Pipeline

reports/pipeline.svg

NCBI specific pre-processing workflow steps

  1. Prefetch SRA files using prefetch utility (01_preprocessing.smk).
  2. Convert SRA reads to .fastq.gz using fastq-dump (01_preprocessing.smk).

In-House specific pre-processing workflow steps

  1. Gather all file paths for raw data as well as metadata from in-house datasets using the inhouse_data.R script.
  2. Convert BAM files to .fastq.gz using bamtofastq (01_preprocessing.smk).

General Workflow

Data processing

  1. Quality check of reads using fastqc. Gather all results for each dataset in a multiqc report (qc.smk).
  2. Trimming reads using trimmomatic (02_trimming.smk).
  3. Quality check of trimmed reads using fastqc. Again, gather results for each dataset with multiqc (02_trimming.smk, qc.smk).
  4. Align reads to the PICRH genome using STAR (03_alignment.smk).
  5. Quantify mapped reads using featureCounts (04_quantification.smk).
  6. Chromatin states enrichments with chromHMM (05_chromatin-states.smk).

Analysis

This is mostly done in Quarto files. See directories analysis/ for .qmd files and results/analysis/ for corresponding HTML reports. The corresponding Snakemake rules are defined in 06_analysis.smk.

Figures

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •