Analysis

This repository serves to document and make available to the community the code of the publication 'To join or not to join: handling biological replicates in long-read RNA sequencing data'.

The pre-print will be shortly submitted to bioRxiv once it is submitted to a journal for review.

Analysis

The paper focusses on investigating strategies on combining long-read RNAseq data from multiple biological replicates for transcriptome reconstruction. We investigate 2 strategies: "Join & Call" (J&C), where reads from all replicates are combined before performing transcriptome reconstruction, and "Call & Join" (C&J), where transcriptome reconstruction is performed on each replicate individually before combining the resulting annotations. We compare IsoQuant, FLAIR, Bambu, and TALON on both PacBio and ONT data, as well as Mandalorion and IsoSeq + SQANTI3 Filter on PacBio data only, using a data set of mouse brain and kidney tissue with 5 biological replicates per tissue.

Data availability

The data used in this study has been submitted to the European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena/browser/home). Mice brain and kidney data generated using PacBio sequencing are accessible under accession number PRJEB85167 and PRJEB94912, respectively.

TODO: Add ONT and Illumina accession numbers

Code & Reusability

The code is organized in a nextflow pipeline which runs a specified transcriptome reconstruction tool (out of the above mentioned) with both strategies and on both brain and kidney tissues on a specified data type (ONT or PacBio, where compatible).

The scripts used by the pipeline are specifically designed to be run on a SLURM cluster and will not be compatible with other environments out-of-the-box.

There are further options (e.g. not using supporting short-read data for FLAIR, or running partial joins with 2,3,4 samples, etc.) which are not used for the analyses in the paper.

Under /src/util/conda_envs, .yaml files to configure the needed conda environments can be found.

Under /src/util/tool_setup, instructions for cloning the repositories of tools which need to have a local copy can be found.

Example usage

Examples of how to use the SLURM-wrapper nextflow_wrapper.sbatch script to launch the main_workflow.nf:

Run FLAIR with supporting short reads on ONT data:

sbatch nextflow_wrapper.sbatch --data ont --algorithm flair --stringent true --use_sr true --sr_config star --result_name ont/flair_ar_sr/run1

Run IsoQuant on PacBio data:

sbatch nextflow_wrapper.sbatch --data isoseq --algorithm isoquant --result_name isoseq/isoquant/run1

Repository file tree

/src/util contains utility scripts, including the aforementioned environment and tool setup.

/src/data_preparation_scripts contains scripts to set up the data, including creating the concatenated fastq files needed for the J&C strategy.

/src/nextflow contains the nextflow pipeline, separated into the following subdirectories:

/src/nextflow/modules contains the .nf files defining the modules of different steps.
/src/nextflow/scripts contains the actual .sh .sbatch scripts used by the modules for execution on the SLURM cluster.
/src/nextflow/workflows contains a variety of workflows, the primary one is main_workflow.nf. On a SLURM cluster, this workflow is run through the nextflow_wrapper.sbatch script.

/src/plotting contains scripts to create the plots used in the paper from the results generated by the workflows.

/src/resouce_inspection contains the scripts used to obtain runtime and memory usage information from the SLURM jobs.

/reports/empty_report contains the basic directory structure for the reports created by the nextflow workflow along with some helper scripts.

Questions

For questions about this code and its reuse or adaptation, please use the GitHub issues or contact me through [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
reports/empty_report		reports/empty_report
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analysis

Data availability

Code & Reusability

Example usage

Repository file tree

Questions

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ConesaLab/join_and_call_paper

Folders and files

Latest commit

History

Repository files navigation

Analysis

Data availability

Code & Reusability

Example usage

Repository file tree

Questions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages