Tumor Classification

Random forest classifier achieves 96% accuracy on diagnosis from Breast Cancer (Wisconsin) Data Set. Built for Python 2.7, pandas 0.22.0, matplotlib 2.1.1, scikit-learn 0.19.1, and seaborn 0.8.1.

Data

Data is taken from the Breast Cancer (Wisconsin) Data Set.

Example

$ python tumor-classification.py

Percentage of training data that is benign:	0.63
Percentage of training data that is malignant:	0.37

Random Forest Classifier Accuracy: 0.96

Random Forest Classifier Confusion Matrix:
benign classified as benign:	106
malignant classified as benign:	2
benign classified as malignant:	4
malignant classified as malignant:	59

             precision    recall  f1-score   support

     benign       0.96      0.98      0.97       108
  malignant       0.97      0.94      0.95        63

avg / total       0.96      0.96      0.96       171

Results

This classifier assumes the true prevalence of malignant tumors is approximately equal to the sample prevalence of malignant tumors (37%). 70% of the Breast Cancer (Wisconsin) Data Set were used for training and 30% was used for testing.

Six features (texture_mean, perimeter_mean, smoothness_mean, compactness_mean, symmetry_mean, and fractal_dimension_mean) were used to predict tumor diagnosis. These features were chosen as they have a low correlation in the correlation heatmap.

A random forest classifier with 100 estimators correctly classifies the test data 96% of the time. The precision and recall for both benign and malignant tumors are above 94%. In tumor classification, it is important to minimize the number of malignant tumors classified as benign tumors. Our classifier has a false negative rate of 3%.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
correlation-heatmap.png		correlation-heatmap.png
data.csv		data.csv
requirements.txt		requirements.txt
tumor-classification.py		tumor-classification.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tumor Classification

Data

Example

Results

References

About

Uh oh!

Releases

Packages

Languages

License

SeanCooke/tumor-classification

Folders and files

Latest commit

History

Repository files navigation

Tumor Classification

Data

Example

Results

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages