Skip to content

shivangi-soni/Telecom-Churn-Analysis

Repository files navigation

Customer churn is the percentage of customers that stopped using the company's product or service during a given time. It is one of the most important metrics for businesses because it impedes growth. Considering it is easier to retain customers than to acquire new ones, machine learning models that can predict if a customer will stop using a company's products/services can prove to be very valulable for businesses. After understanding which customer will churn, appropriate measures can be taken to retain the customer. This notebook follows the end-to-end machine learning steps to build a classification model for a Telecom company's Churn dataset. The dataset was obtained from the following link: https://www.kaggle.com/blastchar/telco-customer-churn

The dataset includes information about:

  1. Customers who churned/ stopped using the company's services
  2. Services available to customers — phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
  3. Customer account information — how long they’ve been a customer, the type of contract (month-to-month, one-year, two-years), payment method, paperless billing, monthly charges, and total charges
  4. Demographic information of customers — gender, age, and if they have partners and dependents

To build the classification model, the data was first cleaned and exploratory analysis was performed to understand the relationships between variables. Then feature engineering and selection were performed and supervised machine learning algorithms such as Logistic Regression, Random Forest, Gradient Boosting Tree, and KNN were used to build a classification model. Oversampling technique called SMOTE was also applied to the training dataset before training the models to address the problem of imbalance in the dataset. In order to measure performance of the model, F1 score was chosen as the performnace metric because it seeks a balance between Precision and Recall, which is extremely useful in scenarios involving imbalanced dataset. The best model was obtained from Gradient Boosting Tree since it had the highest F1 score.

Additionally, Causal inference using DoWhy was conducted to test different hypothesis and detect if there were any causal relationships between different treatment options and the outcome (if a customer churns or not). The results from these different hypothesis will help in understanding if there are particular factors/features that are causing people to stop using the company's services/leaving the company. Understanding of these factors can help the company come up with a strategy to retain the consumers. These hypothesis were developed on the basis of the insights that were derived from the data exploration; for example, if a certain feature showed more or less correlation to churn. The feature selection process was also used to come up with hypothesis to see if the most important features were responsible for causation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published