Glycosyltransferase Acceptor Specificity Prediction (GASP). Pan-specific prediction of reactivity between any Glycosyltransferase Superfamily 1 (GT1) and chemical acceptor.
The installation instructions here are thorough in order to allow for reproducing
any of the work.
The main randomforest train and test script is src/randomforest.py.
- Unix shell with zsh.
- R.
- Java runtime for some chemical feature generation.
E.g. install with homebrew on Mac:
brew install --cask temurin pip.- (mini)conda for python 3, e.g. miniforge
- Julia for
- adding new chemicals to an already established E3FP MDS (chemical features).
- Feature selection (v1.8.5 used)
- Run
./install.sh. It will install python packages to environment "GT" and julia packages to project-local env. Activate python env withconda activate GTand Julia env e.g. withjulia --project=@.when running julia from within the project.install.shalso defines aliasgit rootand calls./install.Rwhich installs R packages. - Miller, e.g. with homebrew on Mac:
brew install miller
- Chemical features are generated pubchem IDs or SMILESs with
src/chemistry/pipeline.sh. Seeresults/5-chemicalFeatures/for examples of adding acceptors. - Adding new enzymes: see
results/6-unaligned/and/orresults/7-align/ encode_features.py --helpandrandomforest.py --helpfor instructions.
- Scripts or python modules unavailable. Make sure they are available in PATH
by modifying
$PATHand$PYTHONPATHto easily access code. Assuming zsh this can done by running./PATHS.sh >> ~/.zshrcor by copy-pasting the output of./PATHS.shto somewhere in~/.zshrc.