This project analyzes customer shopping behavior using a comprehensive dataset of customer transactions. It encompasses data preprocessing, exploratory data analysis (EDA), statistical insights, SQL-based querying for deeper analytics, an interactive Power BI dashboard for visualization, and detailed reports and presentations. The analysis aims to uncover patterns in customer demographics, purchasing habits, subscription impacts, and more to inform business strategies.
The analysis follows a structured approach:
-
Data Preparation and Cleaning: Loaded data using Pandas, handled missing values (e.g., imputed missing review ratings with median by category), standardized column names to snake case, and performed feature engineering (e.g., age groups, purchase frequency mapping).
-
Exploratory Data Analysis (EDA): Used Python libraries (Matplotlib, Seaborn) to visualize distributions, correlations, and initial insights into customer behavior.
-
Database Integration: Connected to PostgreSQL and uploaded the cleaned DataFrame for advanced querying.
-
SQL Analysis: Executed 10 targeted queries to answer key business questions regarding revenue, customer segments, product performance, and subscription impacts.
-
Visualization: Built an interactive Power BI dashboard for dynamic exploration of trends and KPIs.
-
Reporting: Compiled comprehensive findings into PDF reports and presentation slides with recommendations.
The project integrates Python, Jupyter Notebook, SQL, and Power BI in a seamless workflow:
-
Data Processing in Python/Jupyter Notebook: Raw data from
customer_shopping_behavior.csvis loaded into a Pandas DataFrame within a Jupyter Notebook. Data cleaning (handling missing values, column renaming), feature engineering (age groups, frequency mapping), and initial EDA visualizations are performed using libraries like Matplotlib and Seaborn. -
Database Integration via Python: The cleaned DataFrame is uploaded to a PostgreSQL database using SQLAlchemy and psycopg2 libraries in Python, creating a structured table for querying.
-
SQL Analysis: Analytical queries are written and executed in SQL against the PostgreSQL database to extract insights on revenue, customer segments, product performance, and more.
-
Power BI Visualization: Power BI connects directly to the PostgreSQL database to import data, enabling the creation of interactive dashboards for dynamic exploration of KPIs, trends, and customer segments.
The data flow can be visualized as:
SQL Database
↓
Python (Data Processing)
↓
Jupyter Notebook (Analysis)
↓
Power BI (Dashboard)
This integrated approach ensures data flows from raw processing to advanced analytics and visualization.
The dataset (customer_shopping_behavior.csv) includes the following key columns:
- Customer demographics: Age, Gender, Location
- Purchase details: Item Purchased, Category, Purchase Amount (USD), Payment Method, Shipping Type
- Behavioral data: Frequency of Purchases, Previous Purchases, Review Rating, Discount Applied, Subscription Status
customer_shopping_behavior.csv: Raw dataset with customer shopping data.Customer_Behavior.sql: SQL script containing 10 analytical queries to answer key business questions.Customer Shopping Behavior Analysis.ipynb: Jupyter Notebook for data cleaning, EDA, visualizations, and database integration using Python.Customer Shopping Behavior Analysis.pdf: Detailed PDF report of the analysis, including charts and insights.Customer_Behavior_Dashboard.pbix: Power BI dashboard file for interactive visualizations of customer segments, trends, and KPIs.Customer-Shopping-Behavior-Analysis_presentation.pdf: Presentation slides summarizing the project, methodology, and recommendations.
- Programming Language: Python
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, SQLAlchemy, psycopg2-binary
- Database: PostgreSQL
- Visualization: Power BI, Jupyter Notebook
- Other Tools: SQL for querying
- Python 3.x with Jupyter Notebook or JupyterLab.
- PostgreSQL database (for running the notebook's database integration).
- Power BI Desktop (for the dashboard).
- Required Python Libraries: Pandas, NumPy, Matplotlib, Seaborn, SQLAlchemy, psycopg2-binary.
- Install via:
pip install pandas numpy matplotlib seaborn sqlalchemy psycopg2-binary
- Install via:
- Clone or download the project files.
- Set up a PostgreSQL database named
customer_behaviorswith a tablecustomer. - Update database credentials in the notebook (username, password, host, port).
- Install required libraries as listed above.
-
Data Analysis with Jupyter Notebook:
- Open
Customer Shopping Behavior Analysis.ipynbin Jupyter Notebook. - Run cells sequentially: data loading, cleaning (handling missing values, column renaming), feature engineering (age groups, purchase frequency mapping), and database upload.
- Visualize EDA plots for insights on distributions, correlations, etc.
- Open
-
SQL Queries:
- Connect to your PostgreSQL database.
- Execute queries from
Customer_Behavior.sqlto get answers to key business questions.
Query # Question 1 What is the total revenue generated by male vs female customers? 2 Which customers used a discount but still spent more than the average purchase amount? 3 Which are the top 5 products with the highest average review rating? 4 Compare the average Purchase Amounts between Standard and Express Shipping. 5 Do subscribed customers spend more? Compare average spend and total revenue between subscribers and non-subscribers. 6 Which 5 products have the highest percentage of purchases with discounts applied? 7 Segment customers into New, Returning, and Loyal based on their total number of previous purchases, and show the count of each segment. 8 What are the top 3 most purchased products within each category? 9 Are customers who are repeat buyers (more than 5 previous purchases) also likely to subscribe? 10 What is the revenue contribution of each age group? -
Power BI Dashboard:
- Open
Customer_Behavior_Dashboard.pbixin Power BI Desktop. - Explore interactive visuals for customer segmentation, purchase trends, and key metrics.
- Open
-
Reports and Presentation:
- Review
Customer Shopping Behavior Analysis.pdffor comprehensive findings. - Use
Customer-Shopping-Behavior-Analysis_presentation.pdffor presentations.
- Review
- Revenue by Gender: Comparison of total revenue generated by male vs. female customers.
- High-Spending Discount Users: Identification of customers who used discounts but still spent above the average purchase amount.
- Top Products by Rating: Products with the highest average review ratings.
- Shipping Type Impact: Comparison of average purchase amounts between Standard and Express shipping.
- Subscription Analysis: Comparison of average spend and total revenue between subscribers and non-subscribers.
- Discount-Dependent Products: Products with the highest percentage of discounted purchases.
- Customer Segmentation: Classification into New, Returning, and Loyal segments based on purchase history.
- Top Products per Category: Most purchased products within each category.
- Repeat Buyers and Subscriptions: Analysis of whether customers with more than 5 previous purchases are more likely to subscribe.
- Revenue by Age Group: Total revenue contribution of each age group.
- Promote subscriptions with exclusive benefits to increase engagement.
- Implement loyalty programs to reward repeat buyers and encourage progression to the "Loyal" segment.
- Review discount policies to balance sales boosts with profit margins.
- Highlight top-rated and best-selling products in marketing campaigns.
- Target marketing efforts on high-revenue age groups and customers preferring express shipping.
Contributions are welcome. Please submit issues or pull requests for improvements.
Aditi Sah
Email: [email protected]