A tool to measure and quantify the quality of DNA encoded libraries.
pip install -e .deliqc run [-h] [--threads THREADS] [--max-reads MAX_READS] [--max-pair-mismatches MAX_PAIR_MISMATCHES]
[--max-point-mutations MAX_POINT_MUTATIONS] [--split-on-codon SPLIT_ON_CODON] [--codons CODONS [CODONS ...]] [--title TITLE]
sequence r1 r2 save-assequence is the DNA template sequence to compare the reads to. r1 and r2 are the filename of a read pair (from paired end sequencing). By using * within a filename, multiple files can be used (by utilising "glob"). save-as is the target filename where the result is supposed to get saved - the file is a standard python pickle containing the extracted data which can be imported by python. Its essentially a pickled dictionary containing the numpy arrays of each individual sample as well as the average. Example (by using the example data provided in example/cuaac):
deliqc run "AGAGTATCCATCCGTAGTAAAAAATCCATTCACCGAACTTGGATCCGCACACAAAAGACAATTCACACACGTCCCATCCAGAATTCACAAGCTCC" example/cuaac/clicked*_1.fq.gz example/cuaac/clicked_rep*_2.fq.gz cuaac.pickle --max-reads 100 --threads 5 --max-point-mutations=1deliqc plot [-h] filenameProvide a pickle file generated by deliqc run to create a quick plot of the point mutation rate contained within. Example:
deliqc plot cuaac.pickletitle: str, title given via --title parameter from deliqc run, or the first filename of r1reference: str, sanitised DNA sequence as stated from the deliqc runsplitOnCodon: int, on which codon number the result is going to get split. Default 0 for not splitting.splitOnCodonSequences: List[str], a list of the codons the result should get split onmismatches: Tuple[np.array, np.array], mean and standard deviation of mismatch counts across all replicates. LxC dimensional, where L is the length of the reference and C the amount of codons given + 1indels: Tuple[np.array, np.array], mean and standard deviation of insertions and deletions. Lx2xC dimensional, [:, 0, :] refers to insertions and [:, 1, :] to deletions.mutationTarget: Tuple[np.array, np.array], mean and standard deviation of mutation targets. Lx5xC dimensional, where the second dimension is A (0), C (1), G (2), T (3) or N (4).insertionTarget: Tuple[np.array, np.array], mean and standard deviation of insertion targets. Lx5xC dimensional, where the second dimension is A (0), C (1), G (2), T (3) or N (4).reads: float, mean of the number of reads usedmismatchedPairs, mean of the number of pairs that mismatched.tooManyPointMutations, mean of the number of reads that exceeded the point mutation threshold.alignedReads, mean of the number of reads aligned.replicates, the individual results, a dictionary using the same structure.