Skip to content

Commit d653a48

Browse files
authored
Merge pull request #3 from sid-sethi/dev
Saving output of rankTranscripts
2 parents 0b8afb8 + d7427e6 commit d653a48

File tree

6 files changed

+335
-295
lines changed

6 files changed

+335
-295
lines changed

.github/workflows/main.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ jobs:
2424
uses: conda-incubator/setup-miniconda@v3
2525
with:
2626
mamba-version: "*"
27-
channels: conda-forge,bioconda,defaults
27+
channels: conda-forge,bioconda
2828
auto-activate-base: false
2929
activate-environment: psqan_venv
3030
environment-file: environment.yml

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
.snakemake/
2-
psqan_venv/
2+
psqan_venv/
3+
base_env.yml

README.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,25 @@
3232
## Introduction
3333
Despite the advances in tools to process long-read RNA-seq data, the downstream analysis of transcriptional data remains challenging due to the detection of thousands of novel transcripts. From such a large number of transcripts, it is difficult to distinguish between stable transcripts of potential biological importance, partially processed RNAs and splicing noise. It is important to select only the novel transcript models which are reproducible across the samples with a minimum expression value. However, it is difficult to identify optimal expression thresholds to remove artefacts. Consequently, researchers find it challenging to interpret long-read RNA-seq data effectively and generate relevant hypothesis which could be experimentally validated in the laboratory.
3434

35-
PSQAN (Post Sqanti QC ANalysis) is a Snakemake workflow designed to help researchers identify high-confidence transcripts associated with candidate genes. PSQAN performs a gene-based analysis on characterised transcripts generated by [SQANTI3](https://github.com/ConesaLab/SQANTI3 "SQANTI homepage") and [TALON](https://github.com/mortazavilab/TALON/tree/master "TALON homepage"). PSQAN normalises transcript expression per gene and re-groups transcripts into categories which are more appropriate for a transcript discovery analysis, hence making the results more interpretable. PSQAN generates visualisations to help users determine optimal expression thresholds for detecting both known and novel transcripts of probable biological importance. Furthermore, PSQAN allows users to apply multiple transcript level expression thresholds, both to per sample and across all samples. Lastly, PSQAN generates visualisations and an HTML report, enabling users to explore the known and novel transcripts expressed by a gene, alongside their transcript categories and transcript expression. An example of the report generated by PSQAN for a single gene can be downloaded [here](example_output/report.html).
35+
PSQAN (Post Sqanti QC ANalysis) is a Snakemake workflow designed to help researchers identify high-confidence transcripts associated with candidate genes. PSQAN performs a gene-based analysis on characterised transcripts generated by [SQANTI3](https://github.com/ConesaLab/SQANTI3 "SQANTI homepage") and [TALON](https://github.com/mortazavilab/TALON/tree/master "TALON homepage"). PSQAN normalises transcript expression per gene and re-groups transcripts into actionable categories to support transcript prioritisation, hence making the results more interpretable. PSQAN generates visualisations to help users determine optimal expression thresholds for detecting both known and novel transcripts of probable biological importance. Furthermore, PSQAN allows users to apply multiple transcript level expression thresholds, both to per sample and across all samples. Lastly, PSQAN generates visualisations and an HTML report, enabling users to explore the known and novel transcripts expressed by a gene, alongside their transcript categories and transcript expression. An example of the report generated by PSQAN for a single gene can be downloaded [here](example_output/report.html).
3636

3737

3838
### Input data
3939

40-
PSQAN can be used with the transcript characterisation output of either SQANTI3 or TALON, which are the two most prominently used tools in long-read RNA-seq data analysis. PSQAN takes the output produced by SQANTI3 or TALON as input, along with a list of candidate genes to analyse. For each gene, PSQAN extracts the isoforms associated with the gene from the output generated by SQANTI3/TALON and applies a set of filtering criteria to remove potential genomic contamination and rare PCR artifacts. PSQAN removes isoforms with a high percentage of genomic "A"s in their downstream 20 bp window (80% is the default), or if one of its junctions is predicted to be a template switching artifact (tagged as "RTS_stage" by SQANTI3).
40+
PSQAN can be used with the transcript characterisation output of either SQANTI3 or TALON, which are the two most prominently used tools in long-read RNA-seq data analysis. PSQAN takes the output produced by SQANTI3 or TALON as input, along with a list of candidate genes to analyse. For each gene, PSQAN extracts the isoforms associated with the gene from the output generated by SQANTI3/TALON. Since the filtering steps in SQANTI3 and TALON are optional and may be skipped, PSQAN applies its own filtering criteria prior to processing to ensure the removal of potential genomic contamination and rare PCR artifacts. PSQAN removes isoforms with a high percentage of genomic "A"s in their downstream 20 bp window (80% is the default), or if one of its junctions is predicted to be a template switching artifact (tagged as "RTS_stage" by SQANTI3).
41+
42+
> **_Note:_** Output of TALON does not contain all the transcript-level descriptors required by PSQAN. As a result, certain PSQAN processes are skipped when using TALON output. The processes performed by PSQAN for SQANTI3 and TALON are summarised below:
43+
44+
PSQAN process | SQANTI3 | TALON
45+
------------- | ------- | --------
46+
Filtering internal priming artifacts | Yes | Yes
47+
Filtering template switching artifacts | Yes | No (missing required data)
48+
Normalising transcript expression | Yes | Yes
49+
Isoform re-categorisation | Yes | No (missing required data)
50+
Transcript-level filtering | Yes | Yes
51+
Visualisations | Yes | Yes
52+
53+
4154

4255
### Normalising transcript expression per gene
4356

@@ -153,7 +166,7 @@ working directory
153166
|--- report.html # if snakemake report is generated at the end of the run
154167
|--- Gene_A/
155168
|--- pre-filtering/ # plots generated before performing filtering
156-
|--- post-filtering/ # plots generated after performing filtering
169+
|--- post-filtering/transcriptsRanked.txt # plots generated after performing filtering
157170
|--- logs/
158171
|--- gene_normalised_abundance.txt
159172
|--- filtered_transcripts.txt

0 commit comments

Comments
 (0)