Welcome to my Churn Prediction project, where I explored the challenge of identifying customers at risk of leaving a business. By applying machine learning techniques, I built a pipeline capable of delivering accurate predictions and valuable insights to support decision-making.
-
Exploratory Data Analysis (EDA):
-
Feature Engineering:
-
Machine Learning Models:
- Developed and evaluated Logistic Regression and Random Forest Classifier models,
- Achieved over 90% accuracy in both of them measured by F1-Score, ROC-AUC-Curve and Confusion Matrix
-
Model Comparison:
- The RandomForestRegressor performs slightly better, with both models achieving over 90% accuracy.
-
Data Limitations:
- Despite the high accuracy, the models do not appear to be overfitting, suggesting they are well-generalized.
- However, the dataset covers only 1 year, which may limit capturing long-term churn trends.
-
Feature Importance:
Recency
strongly correlates with churn, aligning with the 90-day threshold.
-
Cohort Impact:
- Customer cohorts from first transactions significantly affect churn predictions, highlighting the importance of segmentation.
- Churn Prediction - churn_prediction.ipynb