Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
uses: conda-incubator/setup-miniconda@v3
with:
mamba-version: "*"
channels: conda-forge,bioconda,defaults
channels: conda-forge,bioconda
auto-activate-base: false
activate-environment: psqan_venv
environment-file: environment.yml
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
.snakemake/
psqan_venv/
psqan_venv/
base_env.yml
19 changes: 16 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,25 @@
## Introduction
Despite the advances in tools to process long-read RNA-seq data, the downstream analysis of transcriptional data remains challenging due to the detection of thousands of novel transcripts. From such a large number of transcripts, it is difficult to distinguish between stable transcripts of potential biological importance, partially processed RNAs and splicing noise. It is important to select only the novel transcript models which are reproducible across the samples with a minimum expression value. However, it is difficult to identify optimal expression thresholds to remove artefacts. Consequently, researchers find it challenging to interpret long-read RNA-seq data effectively and generate relevant hypothesis which could be experimentally validated in the laboratory.

PSQAN (Post Sqanti QC ANalysis) is a Snakemake workflow designed to help researchers identify high-confidence transcripts associated with candidate genes. PSQAN performs a gene-based analysis on characterised transcripts generated by [SQANTI3](https://github.com/ConesaLab/SQANTI3 "SQANTI homepage") and [TALON](https://github.com/mortazavilab/TALON/tree/master "TALON homepage"). PSQAN normalises transcript expression per gene and re-groups transcripts into categories which are more appropriate for a transcript discovery analysis, hence making the results more interpretable. PSQAN generates visualisations to help users determine optimal expression thresholds for detecting both known and novel transcripts of probable biological importance. Furthermore, PSQAN allows users to apply multiple transcript level expression thresholds, both to per sample and across all samples. Lastly, PSQAN generates visualisations and an HTML report, enabling users to explore the known and novel transcripts expressed by a gene, alongside their transcript categories and transcript expression. An example of the report generated by PSQAN for a single gene can be downloaded [here](example_output/report.html).
PSQAN (Post Sqanti QC ANalysis) is a Snakemake workflow designed to help researchers identify high-confidence transcripts associated with candidate genes. PSQAN performs a gene-based analysis on characterised transcripts generated by [SQANTI3](https://github.com/ConesaLab/SQANTI3 "SQANTI homepage") and [TALON](https://github.com/mortazavilab/TALON/tree/master "TALON homepage"). PSQAN normalises transcript expression per gene and re-groups transcripts into actionable categories to support transcript prioritisation, hence making the results more interpretable. PSQAN generates visualisations to help users determine optimal expression thresholds for detecting both known and novel transcripts of probable biological importance. Furthermore, PSQAN allows users to apply multiple transcript level expression thresholds, both to per sample and across all samples. Lastly, PSQAN generates visualisations and an HTML report, enabling users to explore the known and novel transcripts expressed by a gene, alongside their transcript categories and transcript expression. An example of the report generated by PSQAN for a single gene can be downloaded [here](example_output/report.html).


### Input data

PSQAN can be used with the transcript characterisation output of either SQANTI3 or TALON, which are the two most prominently used tools in long-read RNA-seq data analysis. PSQAN takes the output produced by SQANTI3 or TALON as input, along with a list of candidate genes to analyse. For each gene, PSQAN extracts the isoforms associated with the gene from the output generated by SQANTI3/TALON and applies a set of filtering criteria to remove potential genomic contamination and rare PCR artifacts. PSQAN removes isoforms with a high percentage of genomic "A"s in their downstream 20 bp window (80% is the default), or if one of its junctions is predicted to be a template switching artifact (tagged as "RTS_stage" by SQANTI3).
PSQAN can be used with the transcript characterisation output of either SQANTI3 or TALON, which are the two most prominently used tools in long-read RNA-seq data analysis. PSQAN takes the output produced by SQANTI3 or TALON as input, along with a list of candidate genes to analyse. For each gene, PSQAN extracts the isoforms associated with the gene from the output generated by SQANTI3/TALON. Since the filtering steps in SQANTI3 and TALON are optional and may be skipped, PSQAN applies its own filtering criteria prior to processing to ensure the removal of potential genomic contamination and rare PCR artifacts. PSQAN removes isoforms with a high percentage of genomic "A"s in their downstream 20 bp window (80% is the default), or if one of its junctions is predicted to be a template switching artifact (tagged as "RTS_stage" by SQANTI3).

> **_Note:_** Output of TALON does not contain all the transcript-level descriptors required by PSQAN. As a result, certain PSQAN processes are skipped when using TALON output. The processes performed by PSQAN for SQANTI3 and TALON are summarised below:

PSQAN process | SQANTI3 | TALON
------------- | ------- | --------
Filtering internal priming artifacts | Yes | Yes
Filtering template switching artifacts | Yes | No (missing required data)
Normalising transcript expression | Yes | Yes
Isoform re-categorisation | Yes | No (missing required data)
Transcript-level filtering | Yes | Yes
Visualisations | Yes | Yes



### Normalising transcript expression per gene

Expand Down Expand Up @@ -153,7 +166,7 @@ working directory
|--- report.html # if snakemake report is generated at the end of the run
|--- Gene_A/
|--- pre-filtering/ # plots generated before performing filtering
|--- post-filtering/ # plots generated after performing filtering
|--- post-filtering/transcriptsRanked.txt # plots generated after performing filtering
|--- logs/
|--- gene_normalised_abundance.txt
|--- filtered_transcripts.txt
Expand Down
Loading
Loading