This project aims to predict whether a patient has diabetes using a variety of medical information such as glucose levels, BMI, insulin levels, and more. The dataset used for this project contains diagnostic information collected from patients and includes features that are relevant for predicting diabetes.
Diabetes is a chronic disease where the body cannot effectively process blood glucose, leading to high blood sugar levels. Early prediction and diagnosis can help manage the disease more effectively.
In this project, a machine learning model is built using the provided dataset to predict whether a patient has diabetes. The model takes into account several health metrics, such as blood pressure, glucose levels, and BMI.
The dataset used in this project includes diagnostic information for several patients and their corresponding diabetes status. The dataset is publicly available and contains the following features:
- Pregnancies: Number of times the patient has been pregnant
- Glucose: Plasma glucose concentration (mg/dL)
- BloodPressure: Diastolic blood pressure (mm Hg)
- SkinThickness: Triceps skinfold thickness (mm)
- Insulin: 2-hour serum insulin level (µU/mL)
- BMI: Body mass index (weight in kg/height in m²)
- DiabetesPedigreeFunction: Family history of diabetes (diabetes pedigree function)
- Age: Age of the patient (years)
- Outcome: Class label (0 for non-diabetic, 1 for diabetic)
| Pregnancies | Glucose | BloodPressure | SkinThickness | Insulin | BMI | DiabetesPedigreeFunction | Age | Outcome |
|---|---|---|---|---|---|---|---|---|
| 6 | 148 | 72 | 35 | 0 | 33.6 | 0.627 | 50 | 1 |
| 1 | 85 | 66 | 29 | 0 | 26.6 | 0.351 | 31 | 0 |
We will use a machine learning approach to build a classification model that can predict whether a patient has diabetes (Outcome = 1) or not (Outcome = 0). Possible models include:
- Logistic Regression
- Decision Trees
- Random Forest
- Support Vector Machines (SVM)
- Neural Networks
- Accuracy
- Precision
- Recall
- F1-score
- ROC AUC
To run this project on your local machine, follow these steps:
You need to have Python installed on your machine, along with the following libraries:
pandasnumpyscikit-learnmatplotlib(optional for visualization)
- Clone the repository:
git clone https://github.com/your-username/diabetes-prediction.git
- Install the required dependencies:
pip install -r requirements.txt
- Load the dataset into a pandas DataFrame.
- Preprocess the data (handle missing values, normalization, etc.).
- Train the machine learning model on the training dataset.
- Evaluate the model on the test dataset.
# Example: Train a logistic regression model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load the data
data = pd.read_csv('diabetes.csv')
# Split the data into features and target
X = data.drop('Outcome', axis=1)
y = data['Outcome']
# Split into training and testing datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train the model
model = LogisticRegression()
model.fit(X_train, y_train)
# Make predictions and evaluate
y_pred = model.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2f}')Contributions are welcome! If you'd like to improve this project, feel free to fork the repository, create a branch, and submit a pull request.
- Fork the Project
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a pull request