This is the readme file that contains the guidelines and information about the compilation the code of the following paper
Paper Name:- Ensembling of Gene Clusters utilizing Deep Learning and Protein-protein Interaction Information
- Authors: Pratik Dutta1, Sriparna Saha1, Sraansh Chopra1 and Varnika Miglani2
- Affiliation: 1Indian Institute of Technology Patna, India, 2Samsung R&D Institute India-Noida
- Accepted(8th May, 2019): IEEE/ACM Transactions on Computational Biology and Bioinformatics
- Corresponding Author: Pratik Dutta ([email protected] )
If you find this code or the article useful, consider citing our work:
@article{dutta2019ensembling,
title={Ensembling of Gene Clusters utilizing Deep Learning and Protein-protein Interaction Information},
author={Dutta, Pratik and Saha, Sriparna and Chopra, Saraansh and Miglani, Varnika},
journal={IEEE/ACM transactions on computational biology and bioinformatics},
year={2019},
publisher={IEEE}
}This folder contains five preprocessed datasets which are used as the input of the proposed MOO-based clustering algorithm.
This folder contains the python code of the proposed MOO-based clustering. Use terminal(for linux users) and goto the '1. MOO-based clustering' folder. Then complie the code by following commands
cd examplesWrite the PATH DESCRIPTION of the dataset in line number 27 of the main.py
python main.py <initial_population_size> <number_of_generation>Output: Generate a file named non_dominated_solutions.txt that contains all the cluster information.
This folder contains .ipynb (Jupyter Notebok) files for creating a set of disconnected walks which further used to generate the labelled dataset. This labelled dataset is used as the training dataset for the proposed neural network models. The main components of the folder are
BCLL_FuLL_Labelslabels of the all non-dominated solutions for B-CLL datasetalgorithm1.ipynbThisjupyter notebookfile takes all non-dominated solutions as the input and gives weighted coincidence matrix. This coincidence matrix is fed toalgorithm 1and it gives a set of disconnected walks. The set of the disconnected is save indisconnected_walk.txtcreate_train_test.ipynbThisjupyter notebookgenerateslabeled_file.txtandunlabeled_file.txt.
This folder contains .ipynb files for training model which are used to generate final consensus partitionings for approach 2. For better use you can use jupyter notebook to run the files. The developed deep learning models are
NN Model.ipynbPyTorch implementation of the proposed multi-layer perceptron with two hidden layersCNN Model.ipynbPyTorch implementation of the proposed convolutional neural networkLabel Script.ipynbis used to combine the originally labeled gene expressions and model labeled gene expressions into one file for further metric evaluations (BHI and BSI).BHI_labels_CNN.txtandBHI_labels_NN_2Hidden.txtare the labels assigned to the unlabeled gene expressions by the trained models plus the originally labeled gene expression profiles.trained_model_10000_epochs.pt,trained_model_10000_epochs_2.pt,trained_model_10000_epochs_3.pt,trained_model_CNN_10000_epochs_1.pt,trained_model_CNN_10000_epochs_2.ptfiles are the weights and bias matrices that are obtained after training the above models.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.