Skip to content

Graph-based Hub Gene Selection Technique using Protein Interaction Information: Application to Sample Classification

Notifications You must be signed in to change notification settings

duttaprat/GraphPPI

Repository files navigation

GraphPPI

This is the readme file that contains the guidelines and information about the compilation the code of the following paper

Paper Name:- Graph-based Hub Gene Selection Technique using Protein Interaction Information: Application to Sample Classification

This paper explores the information of protein-protein interaction (PPI) with a graph mining technique for finding a proper subset of features (genes), which further takes part in sample classification. Here, our contribution for feature selection is three-fold: firstly, all the genes are grouped into different clusters based on the integrated information of the gene expression values and their protein interactions using a multi-objective optimization (MOO) based clustering approach. Secondly, the confidence scores of the protein interactions are incorporated in a popular graph mining algorithm namely Goldberg algorithm to find out the relevant features. These features are the topologically and functionally significant genes, named as hub genes. Finally, these hub genes are identified varying the degrees of the nodes, and those are utilized for the sample classification task.

If you consider this work as useful, please cite it as

@article{dutta2019graph,
  title={Graph-based Hub Gene Selection Technique using Protein Interaction Information: Application to Sample Classification},
  author={Dutta, Pratik and Saha, Sriparna and Gulati, Saurabh},
  journal={IEEE Journal of Biomedical and Health Informatics},
  year={2019},
  publisher={IEEE}
}

Prerequisities

Description

1. MOO-based clustering

This folder contains the python code of the proposed MOO-based clustering. Use terminal(for linux users) and goto the '1. MOO-based clustering' folder. Then complie the code by following commands

cd examples

Write the PATH DESCRIPTION of the dataset in line number 31 of the main.py

python main.py <initial_population_size> <number_of_generation>

Output: Generate a file named non_dominated_solutions.txt that contains all the cluster information.

2. Modified Goldberg Algorithm

This folder contains the modified Goldberg Algorithm.

3. Significant_genes_expression_values.py Obtain the gene expression values of the selected genes.

4. all_classifiers.py Implementation of four classifiers (SVM, Random Forest, kNN, and ANN) with 10-fold cross validation

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

About

Graph-based Hub Gene Selection Technique using Protein Interaction Information: Application to Sample Classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages