sid-sethi · sid-sethi · Mar 5, 2025 · Mar 5, 2025
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# PSQAN - Post Sqanti QC ANalysis of long-read RNA sequencing
+# PSQAN - a pipeline to prioritise novel and biologically relevant transcripts from long-read RNA sequencing
 
 <!-- badges: start -->
 [![Snakemake](https://img.shields.io/badge/snakemake-≥6.1.0-brightgreen.svg)](https://snakemake.github.io)
@@ -10,7 +10,7 @@
 
 ## Table of contents
 - [Introduction](#introduction)
-  - [Removing internal priming artefacts](#removing-internal-priming-artefacts)
+  - [Input data](#input-data)
   - [Normalising transcript expression per gene](#normalising-transcript-expression-per-gene)
   - [Isoform categorisation](#isoform-categorisation)
   - [Filtering isoforms](#filtering-isoforms)
@@ -30,11 +30,14 @@
 
 
 ## Introduction
-PSQAN is a Snakemake workflow for performing QC analysis on long-read sequencing data post transcript characterisation by [SQANTI](https://github.com/ConesaLab/SQANTI3 "SQANTI homepage") or [TALON](https://github.com/mortazavilab/TALON/tree/master "TALON homepage"). Current long-read platforms are prone to errors, which makes the downstream analysis of such data very challenging. A typical analysis workflow coupled with SQANTI or TALON, results in a large number of novel transcripts. It is important to select only the novel transcript models which are reproducible across the samples with a minimum expression value. However, it is difficult to identify optimal expression thresholds to remove artefacts. PSQAN aims to help in the process of identifying high-confidence novel transcripts associated with a gene. PSQAN performs **gene-based analysis** which can be used to explore the novel transcript categories and transcript expression associated with a gene. PSQAN filters transcripts which could be possible artifacts, normalises transcript expression and generates multiple visualisations which can help in determining optimal expression thresholds to identify genuine transcripts (for both known and novel). An example of the report generated by PSQAN for a single gene can be downloaded [here](example_output/report.html).
+Despite the advances in tools to process long-read RNA-seq data, the downstream analysis of transcriptional data remains challenging due to the detection of thousands of novel transcripts. From such a large number of transcripts, it is difficult to distinguish between stable transcripts of potential biological importance, partially processed RNAs and splicing noise. It is important to select only the novel transcript models which are reproducible across the samples with a minimum expression value. However, it is difficult to identify optimal expression thresholds to remove artefacts. Consequently, researchers find it challenging to interpret long-read RNA-seq data effectively and generate relevant hypothesis which could be experimentally validated in the laboratory.
 
-### Removing internal priming artefacts
+PSQAN (Post Sqanti QC ANalysis) is a Snakemake workflow designed to help researchers identify high-confidence transcripts associated with candidate genes. PSQAN performs a gene-based analysis on characterised transcripts generated by [SQANTI3](https://github.com/ConesaLab/SQANTI3 "SQANTI homepage") and [TALON](https://github.com/mortazavilab/TALON/tree/master "TALON homepage"). PSQAN normalises transcript expression per gene and re-groups transcripts into categories which are more appropriate for a transcript discovery analysis, hence making the results more interpretable. PSQAN generates visualisations to help users determine optimal expression thresholds for detecting both known and novel transcripts of probable biological importance. Furthermore, PSQAN allows users to apply multiple transcript level expression thresholds, both to per sample and across all samples. Lastly, PSQAN generates visualisations and an HTML report, enabling users to explore the known and novel transcripts expressed by a gene, alongside their transcript categories and transcript expression. An example of the report generated by PSQAN for a single gene can be downloaded [here](example_output/report.html).
 
-Following transcript characterisation from SQANTI or TALON, PSQAN applies a set of filtering criteria to remove potential genomic contamination and rare PCR artifacts. PSQAN removes isoforms with high percent of genomic "A"s in the downstream 20 bp window and if one of its junctions is predicted to be template switching artifact (tagged as "RTS_stage" = TRUE in SQANTI output). 
+
+### Input data
+
+PSQAN can be used with the transcript characterisation output of either SQANTI3 or TALON, which are the two most prominently used tools in long-read RNA-seq data analysis. PSQAN takes the output produced by SQANTI3 or TALON as input, along with a list of candidate genes to analyse. For each gene, PSQAN extracts the isoforms associated with the gene from the output generated by SQANTI3/TALON and applies a set of filtering criteria to remove potential genomic contamination and rare PCR artifacts. PSQAN removes isoforms with a high percentage of genomic "A"s in their downstream 20 bp window (80% is the default), or if one of its junctions is predicted to be a template switching artifact (tagged as "RTS_stage" by SQANTI3). 
 
 ### Normalising transcript expression per gene
 
@@ -206,6 +209,10 @@ conda deactivate
 ```
 
 ## Licence
+<p align="left">
+  <img src="images/Astex_Logo_H_no_tag_R.jpg" width="600" height="150"/>  
+</p>
+
 Copyright 2024 Astex Therapeutics Ltd.
 
 This repository is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

diff --git a/images/Astex_Logo_H_no_tag_R.jpg b/images/Astex_Logo_H_no_tag_R.jpg