sid-sethi · sid-sethi · Jul 18, 2025 · Jul 18, 2025 · Jul 18, 2025
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -24,7 +24,7 @@ jobs:
         uses: conda-incubator/setup-miniconda@v3
         with:
           mamba-version: "*"
-          channels: conda-forge,bioconda,defaults
+          channels: conda-forge,bioconda
           auto-activate-base: false
           activate-environment: psqan_venv
           environment-file: environment.yml

diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,3 @@
 .snakemake/
-psqan_venv/
+psqan_venv/
+base_env.yml
diff --git a/README.md b/README.md
@@ -32,12 +32,25 @@
 ## Introduction
 Despite the advances in tools to process long-read RNA-seq data, the downstream analysis of transcriptional data remains challenging due to the detection of thousands of novel transcripts. From such a large number of transcripts, it is difficult to distinguish between stable transcripts of potential biological importance, partially processed RNAs and splicing noise. It is important to select only the novel transcript models which are reproducible across the samples with a minimum expression value. However, it is difficult to identify optimal expression thresholds to remove artefacts. Consequently, researchers find it challenging to interpret long-read RNA-seq data effectively and generate relevant hypothesis which could be experimentally validated in the laboratory.
 
-PSQAN (Post Sqanti QC ANalysis) is a Snakemake workflow designed to help researchers identify high-confidence transcripts associated with candidate genes. PSQAN performs a gene-based analysis on characterised transcripts generated by [SQANTI3](https://github.com/ConesaLab/SQANTI3 "SQANTI homepage") and [TALON](https://github.com/mortazavilab/TALON/tree/master "TALON homepage"). PSQAN normalises transcript expression per gene and re-groups transcripts into categories which are more appropriate for a transcript discovery analysis, hence making the results more interpretable. PSQAN generates visualisations to help users determine optimal expression thresholds for detecting both known and novel transcripts of probable biological importance. Furthermore, PSQAN allows users to apply multiple transcript level expression thresholds, both to per sample and across all samples. Lastly, PSQAN generates visualisations and an HTML report, enabling users to explore the known and novel transcripts expressed by a gene, alongside their transcript categories and transcript expression. An example of the report generated by PSQAN for a single gene can be downloaded [here](example_output/report.html).
+PSQAN (Post Sqanti QC ANalysis) is a Snakemake workflow designed to help researchers identify high-confidence transcripts associated with candidate genes. PSQAN performs a gene-based analysis on characterised transcripts generated by [SQANTI3](https://github.com/ConesaLab/SQANTI3 "SQANTI homepage") and [TALON](https://github.com/mortazavilab/TALON/tree/master "TALON homepage"). PSQAN normalises transcript expression per gene and re-groups transcripts into actionable categories to support transcript prioritisation, hence making the results more interpretable. PSQAN generates visualisations to help users determine optimal expression thresholds for detecting both known and novel transcripts of probable biological importance. Furthermore, PSQAN allows users to apply multiple transcript level expression thresholds, both to per sample and across all samples. Lastly, PSQAN generates visualisations and an HTML report, enabling users to explore the known and novel transcripts expressed by a gene, alongside their transcript categories and transcript expression. An example of the report generated by PSQAN for a single gene can be downloaded [here](example_output/report.html).
 
 
 ### Input data
 
-PSQAN can be used with the transcript characterisation output of either SQANTI3 or TALON, which are the two most prominently used tools in long-read RNA-seq data analysis. PSQAN takes the output produced by SQANTI3 or TALON as input, along with a list of candidate genes to analyse. For each gene, PSQAN extracts the isoforms associated with the gene from the output generated by SQANTI3/TALON and applies a set of filtering criteria to remove potential genomic contamination and rare PCR artifacts. PSQAN removes isoforms with a high percentage of genomic "A"s in their downstream 20 bp window (80% is the default), or if one of its junctions is predicted to be a template switching artifact (tagged as "RTS_stage" by SQANTI3). 
+PSQAN can be used with the transcript characterisation output of either SQANTI3 or TALON, which are the two most prominently used tools in long-read RNA-seq data analysis. PSQAN takes the output produced by SQANTI3 or TALON as input, along with a list of candidate genes to analyse. For each gene, PSQAN extracts the isoforms associated with the gene from the output generated by SQANTI3/TALON. Since the filtering steps in SQANTI3 and TALON are optional and may be skipped, PSQAN applies its own filtering criteria prior to processing to ensure the removal of potential genomic contamination and rare PCR artifacts. PSQAN removes isoforms with a high percentage of genomic "A"s in their downstream 20 bp window (80% is the default), or if one of its junctions is predicted to be a template switching artifact (tagged as "RTS_stage" by SQANTI3).
+
+> **_Note:_** Output of TALON does not contain all the transcript-level descriptors required by PSQAN. As a result, certain PSQAN processes are skipped when using TALON output. The processes performed by PSQAN for SQANTI3 and TALON are summarised below:
+
+PSQAN process | SQANTI3	| TALON
+------------- | ------- | -------- 
+Filtering internal priming artifacts | Yes | Yes
+Filtering template switching artifacts	| Yes	| No (missing required data)
+Normalising transcript expression	| Yes	| Yes
+Isoform re-categorisation	| Yes	| No (missing required data)
+Transcript-level filtering	| Yes	| Yes
+Visualisations	| Yes	| Yes
+
+
 
 ### Normalising transcript expression per gene
 
@@ -153,7 +166,7 @@ working directory
 |--- report.html               # if snakemake report is generated at the end of the run
 |--- Gene_A/  
      |--- pre-filtering/       # plots generated before performing filtering  
-     |--- post-filtering/      # plots generated after performing filtering   
+     |--- post-filtering/transcriptsRanked.txt      # plots generated after performing filtering   
      |--- logs/              
      |--- gene_normalised_abundance.txt              
      |--- filtered_transcripts.txt