GitHub - shivangi-soni/Telecom-Churn-Analysis

Customer churn is the percentage of customers that stopped using the company's product or service during a given time. It is one of the most important metrics for businesses because it impedes growth. Considering it is easier to retain customers than to acquire new ones, machine learning models that can predict if a customer will stop using a company's products/services can prove to be very valulable for businesses. After understanding which customer will churn, appropriate measures can be taken to retain the customer. This notebook follows the end-to-end machine learning steps to build a classification model for a Telecom company's Churn dataset. The dataset was obtained from the following link: https://www.kaggle.com/blastchar/telco-customer-churn

The dataset includes information about:

Customers who churned/ stopped using the company's services
Services available to customers — phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
Customer account information — how long they’ve been a customer, the type of contract (month-to-month, one-year, two-years), payment method, paperless billing, monthly charges, and total charges
Demographic information of customers — gender, age, and if they have partners and dependents

To build the classification model, the data was first cleaned and exploratory analysis was performed to understand the relationships between variables. Then feature engineering and selection were performed and supervised machine learning algorithms such as Logistic Regression, Random Forest, Gradient Boosting Tree, and KNN were used to build a classification model. Oversampling technique called SMOTE was also applied to the training dataset before training the models to address the problem of imbalance in the dataset. In order to measure performance of the model, F1 score was chosen as the performnace metric because it seeks a balance between Precision and Recall, which is extremely useful in scenarios involving imbalanced dataset. The best model was obtained from Gradient Boosting Tree since it had the highest F1 score.

Additionally, Causal inference using DoWhy was conducted to test different hypothesis and detect if there were any causal relationships between different treatment options and the outcome (if a customer churns or not). The results from these different hypothesis will help in understanding if there are particular factors/features that are causing people to stop using the company's services/leaving the company. Understanding of these factors can help the company come up with a strategy to retain the consumers. These hypothesis were developed on the basis of the insights that were derived from the data exploration; for example, if a certain feature showed more or less correlation to churn. The feature selection process was also used to come up with hypothesis to see if the most important features were responsible for causation.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Causal Inference.ipynb		Causal Inference.ipynb
Data Exploration and Modelling.ipynb		Data Exploration and Modelling.ipynb
README.md		README.md
WA_Fn-UseC_-Telco-Customer-Churn.csv		WA_Fn-UseC_-Telco-Customer-Churn.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

shivangi-soni/Telecom-Churn-Analysis

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages