deliqc

A tool to measure and quantify the quality of DNA encoded libraries.

Installation

pip install -e .

Usage

Extract point mutations from a file

deliqc run [-h] [--threads THREADS] [--max-reads MAX_READS] [--max-pair-mismatches MAX_PAIR_MISMATCHES]
                 [--max-point-mutations MAX_POINT_MUTATIONS] [--split-on-codon SPLIT_ON_CODON] [--codons CODONS [CODONS ...]] [--title TITLE]
                 sequence r1 r2 save-as

sequence is the DNA template sequence to compare the reads to. r1 and r2 are the filename of a read pair (from paired end sequencing). By using * within a filename, multiple files can be used (by utilising "glob"). save-as is the target filename where the result is supposed to get saved - the file is a standard python pickle containing the extracted data which can be imported by python. Its essentially a pickled dictionary containing the numpy arrays of each individual sample as well as the average. Example (by using the example data provided in example/cuaac):

deliqc run "AGAGTATCCATCCGTAGTAAAAAATCCATTCACCGAACTTGGATCCGCACACAAAAGACAATTCACACACGTCCCATCCAGAATTCACAAGCTCC" example/cuaac/clicked*_1.fq.gz example/cuaac/clicked_rep*_2.fq.gz cuaac.pickle --max-reads 100 --threads 5 --max-point-mutations=1

Plot point mutation rate

deliqc plot [-h] filename

Provide a pickle file generated by deliqc run to create a quick plot of the point mutation rate contained within. Example:

deliqc plot cuaac.pickle

dictionary structure for custom scripts

title: str, title given via --title parameter from deliqc run, or the first filename of r1
reference: str, sanitised DNA sequence as stated from the deliqc run
splitOnCodon: int, on which codon number the result is going to get split. Default 0 for not splitting.
splitOnCodonSequences: List[str], a list of the codons the result should get split on
mismatches: Tuple[np.array, np.array], mean and standard deviation of mismatch counts across all replicates. LxC dimensional, where L is the length of the reference and C the amount of codons given + 1
indels: Tuple[np.array, np.array], mean and standard deviation of insertions and deletions. Lx2xC dimensional, [:, 0, :] refers to insertions and [:, 1, :] to deletions.
mutationTarget: Tuple[np.array, np.array], mean and standard deviation of mutation targets. Lx5xC dimensional, where the second dimension is A (0), C (1), G (2), T (3) or N (4).
insertionTarget: Tuple[np.array, np.array], mean and standard deviation of insertion targets. Lx5xC dimensional, where the second dimension is A (0), C (1), G (2), T (3) or N (4).
reads: float, mean of the number of reads used
mismatchedPairs, mean of the number of pairs that mismatched.
tooManyPointMutations, mean of the number of reads that exceeded the point mutation threshold.
alignedReads, mean of the number of reads aligned.
replicates, the individual results, a dictionary using the same structure.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
deliqc		deliqc
example		example
res		res
.gitignore		.gitignore
README.md		README.md
cuaac.pickle		cuaac.pickle
nar.pickle		nar.pickle
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

deliqc

Installation

Usage

Extract point mutations from a file

Plot point mutation rate

dictionary structure for custom scripts

About

Uh oh!

Releases

Packages

Languages

Gillingham-Lab/deliqc

Folders and files

Latest commit

History

Repository files navigation

deliqc

Installation

Usage

Extract point mutations from a file

Plot point mutation rate

dictionary structure for custom scripts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages