This repository contains an analysis of the Online Retail dataset, which includes transactional data from a UK-based online retailer. The analysis is performed using PySpark in Jupyter Notebooks.
The dataset used in this analysis can be found in the data
folder. The dataset contains information about customer purchases, including product descriptions, quantities, and prices.
The analysis is divided into several Jupyter Notebooks, each focusing on a specific aspect of the data:
Exploratory_Data_Analysis.ipynb
: Exploratory data analysis to understand the structure and distribution of the data.RFM_Analysis.ipynb
: RFM analysis to segment customers based on their purchasing behavior.KMeans_Clustering.ipynb
: K-means clustering to segment customers based on their order history.Product_Affinity_Analysis.ipynb
: Product affinity analysis to identify which products tend to be purchased together.Market_Basket_Analysis.ipynb
: Market basket analysis to analyze which products tend to be purchased together at different times of day, week, or year.Churn_Analysis.ipynb
: Churn analysis to identify customers who are likely to churn based on their past behavior.
The analysis requires PySpark and Jupyter Notebook. The necessary Python libraries can be installed using the requirements.txt
file.
To run the analysis, clone the repository and open the Jupyter Notebooks in order.
This project is open to contributions. If you have any suggestions or improvements, please feel free to create a pull request.
© 2023 Abel Tavares