fraud-detection-ML-pipeline

A comprehensive end-to-end machine learning pipeline for credit card fraud detection with high accuracy (~99.6%). This project includes the complete workflow from data exploration to model deployment with an integrated CI/CD pipeline.

Project Overview

This project implements a machine learning system to detect fraudulent credit card transactions. It uses a custom-built fraud detection model trained on transaction data that includes temporal patterns, merchant information, and customer demographics.

Key features of this pipeline:

Data preprocessing with custom transformations for categorical and temporal features
Exploratory data analysis revealing insights into fraud patterns
Model training and evaluation with high performance metrics
Complete CI pipeline for automated testing and deployment
Comprehensive logging system for tracking model performance and issues
Visualization tools for understanding model decisions and fraud patterns

Key Insights from Data Analysis

Our exploratory data analysis revealed several important patterns:

Age-Based Vulnerability: People over 50 years old tend to be more vulnerable to fraud compared to younger age groups.
Temporal Patterns: Fraud rates vary significantly by hour of day and day of week, with late night - early morning showing higher risk.
Geographic Hotspots: Large cities like Washington, New York, and Los Angeles have the highest number of fraudulent transactions.
Transaction Categories: Shopping and groceries are the transaction types showing higher fraud rates than others.

Installation

Clone the Repository:

git clone https://github.com/ol1g3/fraud-detection-ML-pipeline.git
cd fraud-detection-ML-pipeline

Set Up the Virtual Environment: Choose one of the methods below.

Virtual Environment Setup Options

Using venv (default)

Create the Virtual Environment (Mac):
```
python3 -m venv .venv
```

Activate the Virtual Environment:

source .venv/bin/activate
pip install -r requirements.txt

Deactivate the Virtual Environment (when done):
```
deactivate
```

Using uv (alternative)

Create the Virtual Environment:
```
uv venv --python 3.11
```

Activate the Virtual Environment And Install Requirements:

source .venv/bin/activate
uv pip install -r requirements.txt

Deactivate the Virtual Environment (when done):
```
deactivate
```

Automated CI Pipeline

This project includes a complete CI/CT pipeline that:

Runs automated tests on every commit, including build, unit tests
Validates model performance on test data (regression test)
Generates performance reports and logs

Future improvements

Implement neural network-based approaches for potentially higher accuracy
Add more sophisticated feature engineering based on domain knowledge

Data Source

The dataset used for this project can be found on Kaggle: Fraud Detection Dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
data		data
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

fraud-detection-ML-pipeline

Project Overview

Key Insights from Data Analysis

Installation

Virtual Environment Setup Options

Using venv (default)

Using uv (alternative)

Automated CI Pipeline

Future improvements

Data Source

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ol1g3/fraud-detection-ML-pipeline

Folders and files

Latest commit

History

Repository files navigation

fraud-detection-ML-pipeline

Project Overview

Key Insights from Data Analysis

Installation

Virtual Environment Setup Options

Using venv (default)

Using uv (alternative)

Automated CI Pipeline

Future improvements

Data Source

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages