- Source: Kaggle Water Potability Dataset
- Features: pH, Hardness, Solids, Chloramines, Sulfates, Conductivity, Organic Carbon, Trihalomethanes, Turbidity, Potability
- Rows: 11,115
- Missing Values: pH, Sulfates, Trihalomethanes
- Missing values imputed with mean strategy
- Outliers handled using IQR (Interquartile Range) method
- Random Forest
- Decision Tree
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
β Random Forest with the highest accuracy and F1 Score.
Random Forest emerged as the most effective model for predicting water potability, offering robust performance across all key evaluation metrics.