Pipeline:
- Apply Gaussian model and extract features
- Perform QC (rmse - var)
- Detect non-responders
- Subpopulation detection
Install packages
numpy
pandas
scipy
matplotlib
openpyxl
tqdm
scikit-learn # needed for prediction only
To run the pipeline use the script gm.py, which requres experimant files as *.xlsm.
Example 1) run the pipeline for one experimant
python gm.py --input_dir Data/experiments_all/20240305_EXP13.xlsm
Example 2) run the pipeline for all experimants in a directory (e.g. Data/experiments_all)
python gm.py --input_dir Data/experiments_all
Promp terminal parameters:
--input_dir: input directiry. Either a*.xlsmfile or a folder containing them.--param_dir: directory to parameters json file (see below).--no_plot: no plot will be created if apecified.
A parameter .json file is also needed to specify the parameters. By default the json file should be in the same folder as the gm.py named gm_parameters.json. You can change this by the parameter --param_dir.
Parameters in the json file:
save_image2d_dir: Directory to save 2D landscapes; set it as""to prevent saving.save_image3d_dir: Directory to save 3D plots; set it as""to prevent saving.save_feature_dir: Directory to save feature csv file; set it as""to prevent saving.tag: Name tag for the feature .csv file.qc_thr_rmse: QC threshold for RMSE;default=[0.2, 0.25]qc_thr_n_peaks: QC threshold for N Peacks;default=[5, 8]qc_thr_variation: QC threshold for Variations;default=[0.1, 0.25]elev: Elevation of the camera in the 3D plots;default=30,azim: Angle of the camera in the 3D plots (in degrees);default=-120,nbins: Number of bins used for the interpolation (mesh grid of size nbins x nbins);default=1000,sm_method: Method used for the interpolation and smoothing ("regular_grid" or "linear_ndi");default='regular_grid',rescale: Binary value to rescale the data with respect to the WT;default=True,max_val_scale: Maximum value of the non-rescaled data;default=10000,info_box: Binary value to add an information box to the plots;default=True,max_val: Binary value to add the maximum value of the data to the information box;default=True,peak_coords: Binary value to add the coordinates of the peaks to the information box;default=True,fifty_coords: Binary value to add the coordinates of the 50% of the maximum value to the information ;default=True,plot_replicates: Binary value to plot and save data for all the replicates;default=Falseplot_extra": Binary value to plot and save the data in the log scale';default=False
When we have new data samples, the script predict.py can be used to predict the lables of clusters for each sample.
Example)
python predict.py --train_data_dir clustering/data/clustering_result.csv --new_data_dir gm_output/features/extracted_features.csv
Promp terminal parameters:
--train_data_dir: Train data directory; the output of the clustering pipeline provided in R.--new_data_dir: New data directory; the output of the gm.py for the train data.--k: Number of neighbours k in KNN.