Skip to content

Latest commit

 

History

History
231 lines (139 loc) · 7.86 KB

README.md

File metadata and controls

231 lines (139 loc) · 7.86 KB

Credit Scoring Model Comparison

Predictive Credit Scoring using Machine Learning algorithms under the SEMMA methodology in SAS Entreprise Miner

The dataset used can be found by clicking here


Project Introduction and Goals

Introduction:

Customer churn, also known as customer attrition is the loss of clients or customers. Banks rely on exploratory data analysis and predictive techniques to discover most striking behaviors of customers most likely to churn.


Aim of the project

Build a model that determines most prominent characteristics of customers most likely to churn on their credit card service so banks can be proactive and prevent customer attrition.


Strategy

Use SAS Enterprise Miner following SEMMA methodology to compare prediction effectiveness of the following models:

  • Decision Trees
  • Logistic Regression
  • Gradient Boosting

Hypothesis

Total Transaction Count (Last 12 months) better explains that a customer will churn rather than Age and credit limit of the customer. We expect the best model to perform in the gradient boosting, a widely used model in credit scoring modelling. ZhenyaTian, 2020


Bank Churners Dataset Description

  • 21 Variables
  • 6 Class variables
  • 15 numerical variables (ordinal and interval)
  • No missing values

Diagram

Diagram


Sample

alt text

  • Training set: 80%
  • Validation set: 10%
  • Test set: 10%

Explore

alt text

DM Project (1)

Imbalanced target variable.

Most of the customer didn’t churn 8.6% of the total

alt text

In this node we identify input variables which are useful for predicting the target variable or variables.

We use the Chi-square selection criterion.(Method available only for binary variables)

  • Number of Bins: default 50

  • Maximum Pass Number: default 6

  • Minimum Chi-Square: default 3.84

DM Project (2)

Variables with a Chi-square statistic higher than 3.84 will be accepted for training the model.Since we reject the null hypothesis that our feature is independent from the target variable.

DM Project (3)

alt text

This node helps us to choose the best variables or cluster components for analysis.

Variable clustering removes collinearity, decreases variable redundancy, and helps reveal the underlying structure of the input variables in a data set.

Since clustering source is based on the covariance matrix, variables with larger variances have more importance in the analysis

We include class variables through the use of dummy variables

We keep hierarchies on in order to create a hierarchical cluster structure


Modify

alt text

When using the variable selection method before:

DM Project (6)

When using the variable selection method before:

DM Project (7)

Interactive binning results using Gini Statistic as variable selection method. Group rare levels with cutoff value percentage of 0.5


Model

Decision Trees

alt text

  • Decision Tree (1):

  Variable selection & clustering.

  Target criterion = Gini Coef

  Leaf size = 5

 

  • Decision Tree (2):

  Interactive Binning without variable selection or clustering.

  Target criterion = Gini Coef

  Leaf size = 50

 

  • Decision Tree (3):

  Without binning, variable selection or clustering.

 Target criterion = Gini Coef

 Leaf size = 5

  DM Project (8)

The ROC curve above shows a comparison of the three different decision trees. Sensitivity is on the vertical axis and plots the true positive rate while specificity is on the horizontal axis and observes the false positive rate. Performance is greatest when maximizing true positive rate while minimizing false positives. ROC curve sensitivity dips in test set meaning higher false positives. Model is a bit too overtrained.

DM Project (9)

Logistic Regression

alt text

  • Logistic Regression(1): Variable selection & interactive binning. Selection model - stepwise.

  • Logistic Regression(2): Variable selection & stepwise selection model.

  • Logistic Regression(3): Variable selection with no selection model.

DM Project (10)

The first regression model has the strongest predictive power of the three per the smallest misclassification rate of .081935.

DM Project (11)

Order of importance of grouped variables in the stepwise selection process with a p-value lower than 0.05

DM Project (12)

Gradient Boosting Model

Since this is a classification problem we decided to use misclassification as our assessment criteria. The misclassification rates are impressively low:

  • Train - .035
  • Valid - .028
  • Test - .030

DM Project (13)

Results from the gradient boosting model indicate these variables as most important:

  • Total_Trans_Ct
  • Total_Trans_Amt
  • Total_Revolving_Bal

DM Project (14)

Gradient Boosting is tree based algorithm that improves itself by building off the previous tree. By nature prone to overfitting. Lowest misclassification rate around the 200th iteration.

DM Project (15)

Final Model Assessment

Sensitivity vs specificity:

DM Project (16)

Gradient boosting was able to most accurately predict customers going to churn and predict customers not going to churn based on the largest AUC.

DM Project (17) Observing the cumulative lift, gradient boosting is able to provide a better prediction rate of the three models up to the first 50% of observations.

DM Project (18)