Skip to content

Commit 0b8afb8

Browse files
authored
Merge pull request #2 from sid-sethi/dev
updating readme
2 parents 19141d9 + 54ea5cc commit 0b8afb8

File tree

2 files changed

+12
-5
lines changed

2 files changed

+12
-5
lines changed

README.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# PSQAN - Post Sqanti QC ANalysis of long-read RNA sequencing
1+
# PSQAN - a pipeline to prioritise novel and biologically relevant transcripts from long-read RNA sequencing
22

33
<!-- badges: start -->
44
[![Snakemake](https://img.shields.io/badge/snakemake-≥6.1.0-brightgreen.svg)](https://snakemake.github.io)
@@ -10,7 +10,7 @@
1010

1111
## Table of contents
1212
- [Introduction](#introduction)
13-
- [Removing internal priming artefacts](#removing-internal-priming-artefacts)
13+
- [Input data](#input-data)
1414
- [Normalising transcript expression per gene](#normalising-transcript-expression-per-gene)
1515
- [Isoform categorisation](#isoform-categorisation)
1616
- [Filtering isoforms](#filtering-isoforms)
@@ -30,11 +30,14 @@
3030

3131

3232
## Introduction
33-
PSQAN is a Snakemake workflow for performing QC analysis on long-read sequencing data post transcript characterisation by [SQANTI](https://github.com/ConesaLab/SQANTI3 "SQANTI homepage") or [TALON](https://github.com/mortazavilab/TALON/tree/master "TALON homepage"). Current long-read platforms are prone to errors, which makes the downstream analysis of such data very challenging. A typical analysis workflow coupled with SQANTI or TALON, results in a large number of novel transcripts. It is important to select only the novel transcript models which are reproducible across the samples with a minimum expression value. However, it is difficult to identify optimal expression thresholds to remove artefacts. PSQAN aims to help in the process of identifying high-confidence novel transcripts associated with a gene. PSQAN performs **gene-based analysis** which can be used to explore the novel transcript categories and transcript expression associated with a gene. PSQAN filters transcripts which could be possible artifacts, normalises transcript expression and generates multiple visualisations which can help in determining optimal expression thresholds to identify genuine transcripts (for both known and novel). An example of the report generated by PSQAN for a single gene can be downloaded [here](example_output/report.html).
33+
Despite the advances in tools to process long-read RNA-seq data, the downstream analysis of transcriptional data remains challenging due to the detection of thousands of novel transcripts. From such a large number of transcripts, it is difficult to distinguish between stable transcripts of potential biological importance, partially processed RNAs and splicing noise. It is important to select only the novel transcript models which are reproducible across the samples with a minimum expression value. However, it is difficult to identify optimal expression thresholds to remove artefacts. Consequently, researchers find it challenging to interpret long-read RNA-seq data effectively and generate relevant hypothesis which could be experimentally validated in the laboratory.
3434

35-
### Removing internal priming artefacts
35+
PSQAN (Post Sqanti QC ANalysis) is a Snakemake workflow designed to help researchers identify high-confidence transcripts associated with candidate genes. PSQAN performs a gene-based analysis on characterised transcripts generated by [SQANTI3](https://github.com/ConesaLab/SQANTI3 "SQANTI homepage") and [TALON](https://github.com/mortazavilab/TALON/tree/master "TALON homepage"). PSQAN normalises transcript expression per gene and re-groups transcripts into categories which are more appropriate for a transcript discovery analysis, hence making the results more interpretable. PSQAN generates visualisations to help users determine optimal expression thresholds for detecting both known and novel transcripts of probable biological importance. Furthermore, PSQAN allows users to apply multiple transcript level expression thresholds, both to per sample and across all samples. Lastly, PSQAN generates visualisations and an HTML report, enabling users to explore the known and novel transcripts expressed by a gene, alongside their transcript categories and transcript expression. An example of the report generated by PSQAN for a single gene can be downloaded [here](example_output/report.html).
3636

37-
Following transcript characterisation from SQANTI or TALON, PSQAN applies a set of filtering criteria to remove potential genomic contamination and rare PCR artifacts. PSQAN removes isoforms with high percent of genomic "A"s in the downstream 20 bp window and if one of its junctions is predicted to be template switching artifact (tagged as "RTS_stage" = TRUE in SQANTI output).
37+
38+
### Input data
39+
40+
PSQAN can be used with the transcript characterisation output of either SQANTI3 or TALON, which are the two most prominently used tools in long-read RNA-seq data analysis. PSQAN takes the output produced by SQANTI3 or TALON as input, along with a list of candidate genes to analyse. For each gene, PSQAN extracts the isoforms associated with the gene from the output generated by SQANTI3/TALON and applies a set of filtering criteria to remove potential genomic contamination and rare PCR artifacts. PSQAN removes isoforms with a high percentage of genomic "A"s in their downstream 20 bp window (80% is the default), or if one of its junctions is predicted to be a template switching artifact (tagged as "RTS_stage" by SQANTI3).
3841

3942
### Normalising transcript expression per gene
4043

@@ -206,6 +209,10 @@ conda deactivate
206209
```
207210

208211
## Licence
212+
<p align="left">
213+
<img src="images/Astex_Logo_H_no_tag_R.jpg" width="600" height="150"/>
214+
</p>
215+
209216
Copyright 2024 Astex Therapeutics Ltd.
210217

211218
This repository is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

images/Astex_Logo_H_no_tag_R.jpg

643 KB
Loading

0 commit comments

Comments
 (0)