Skip to content

Gillingham-Lab/deliqc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deliqc

A tool to measure and quantify the quality of DNA encoded libraries.

Installation

pip install -e .

Usage

Extract point mutations from a file

deliqc run [-h] [--threads THREADS] [--max-reads MAX_READS] [--max-pair-mismatches MAX_PAIR_MISMATCHES]
                 [--max-point-mutations MAX_POINT_MUTATIONS] [--split-on-codon SPLIT_ON_CODON] [--codons CODONS [CODONS ...]] [--title TITLE]
                 sequence r1 r2 save-as

sequence is the DNA template sequence to compare the reads to. r1 and r2 are the filename of a read pair (from paired end sequencing). By using * within a filename, multiple files can be used (by utilising "glob"). save-as is the target filename where the result is supposed to get saved - the file is a standard python pickle containing the extracted data which can be imported by python. Its essentially a pickled dictionary containing the numpy arrays of each individual sample as well as the average. Example (by using the example data provided in example/cuaac):

deliqc run "AGAGTATCCATCCGTAGTAAAAAATCCATTCACCGAACTTGGATCCGCACACAAAAGACAATTCACACACGTCCCATCCAGAATTCACAAGCTCC" example/cuaac/clicked*_1.fq.gz example/cuaac/clicked_rep*_2.fq.gz cuaac.pickle --max-reads 100 --threads 5 --max-point-mutations=1

Plot point mutation rate

deliqc plot [-h] filename

Provide a pickle file generated by deliqc run to create a quick plot of the point mutation rate contained within. Example:

deliqc plot cuaac.pickle

dictionary structure for custom scripts

  • title: str, title given via --title parameter from deliqc run, or the first filename of r1
  • reference: str, sanitised DNA sequence as stated from the deliqc run
  • splitOnCodon: int, on which codon number the result is going to get split. Default 0 for not splitting.
  • splitOnCodonSequences: List[str], a list of the codons the result should get split on
  • mismatches: Tuple[np.array, np.array], mean and standard deviation of mismatch counts across all replicates. LxC dimensional, where L is the length of the reference and C the amount of codons given + 1
  • indels: Tuple[np.array, np.array], mean and standard deviation of insertions and deletions. Lx2xC dimensional, [:, 0, :] refers to insertions and [:, 1, :] to deletions.
  • mutationTarget: Tuple[np.array, np.array], mean and standard deviation of mutation targets. Lx5xC dimensional, where the second dimension is A (0), C (1), G (2), T (3) or N (4).
  • insertionTarget: Tuple[np.array, np.array], mean and standard deviation of insertion targets. Lx5xC dimensional, where the second dimension is A (0), C (1), G (2), T (3) or N (4).
  • reads: float, mean of the number of reads used
  • mismatchedPairs, mean of the number of pairs that mismatched.
  • tooManyPointMutations, mean of the number of reads that exceeded the point mutation threshold.
  • alignedReads, mean of the number of reads aligned.
  • replicates, the individual results, a dictionary using the same structure.

About

DNA encoded library information quality control

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages