Skip to content

This project compares machine learning algorithms to predict water potability using physicochemical properties from the Water Potability Dataset (Kaggle). The goal is to find the best model for classifying water as potable or non-potable.

Notifications You must be signed in to change notification settings

Adhishree87/DataMiningProject_ComparisionOfAlgorithm

Β 
Β 

Repository files navigation

Water Potability Classification Using Machine Learning Algorithms


πŸ“Œ Dataset Information

  • Source: Kaggle Water Potability Dataset
  • Features: pH, Hardness, Solids, Chloramines, Sulfates, Conductivity, Organic Carbon, Trihalomethanes, Turbidity, Potability
  • Rows: 11,115
  • Missing Values: pH, Sulfates, Trihalomethanes

Data Preprocessing

  • Missing values imputed with mean strategy
  • Outliers handled using IQR (Interquartile Range) method

Models Implemented

  • Random Forest
  • Decision Tree
  • Logistic Regression
  • K-Nearest Neighbors (KNN)
  • Support Vector Machine (SVM)


πŸ”₯ Best Performing Model

βœ… Random Forest with the highest accuracy and F1 Score.


πŸ“Œ Conclusion

Random Forest emerged as the most effective model for predicting water potability, offering robust performance across all key evaluation metrics.


About

This project compares machine learning algorithms to predict water potability using physicochemical properties from the Water Potability Dataset (Kaggle). The goal is to find the best model for classifying water as potable or non-potable.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%