Supervised Machine Learning and Credit Risk

Project Overview

In this project, I used Python to build and evaluate several machine learning models to predict credit risk. I employed the following different techniques:

Oversample the data using the RandomOverSampler and SMOTE algorithms.
Undersample the data using the ClusterCentroids algorithm.
Use a combinatorial approach of over- and undersampling using the SMOTEENN algorithm.
Compare two machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier.

I will evaluate the performance of these models and make a written recommendation on whether they should be used to predict credit risk.

Resources

Data Source: LoanStats_2019Q1.csv
Software and Tools: Python, Anaconda, Jupyter Notebook & Git Bash

Results

RandomOverSampler Model

Accuracy Score is 65.4%
Precision High Risk Score is 1%
Precision Low Risk Score is 100%
Recall High Risk Score is 73%
Recall Low Risk Score is 58%

SMOTE Model

Accuracy Score is 66.3%
Precision High Risk Score is 1%
Precision Low Risk Score is 100%
Recall High Risk Score is 63%
Recall Low Risk Score is 69%

ClusterCentroids Model

Accuracy Score is 66.3%
Precision High Risk Score is 1%
Precision Low Risk Score is 100%
Recall High Risk Score is 69%
Recall Low Risk Score is 40%

SMOTEENN Model

Accuracy Score is 54.5%
Precision High Risk Score is 1%
Precision Low Risk Score is 100%
Recall High Risk Score is 79%
Recall Low Risk Score is 56%

BalancedRandomForestClassifier Model

Accuracy Score is 78.9%
Precision High Risk Score is 3%
Precision Low Risk Score is 100%
Recall High Risk Score is 70%
Recall Low Risk Score is 87%

EasyEnsembleClassifier Model

Accuracy Score is 93.2%
Precision High Risk Score is 9%
Precision Low Risk Score is 100%
Recall High Risk Score is 92%
Recall Low Risk Score is 94%

Summary

In summary, the results of these machine learning models show that the two Ensemble Classifiers perform best. When we compare the accuracy scores of all the models, the Balanced Random Forest Classifier and Easy Ensemble Classifier models had the highest scores, at 78.9% and 93.2% respectively. Since the goal of our analysis is to find a model that can best detect when a loan is high risk, we should take a close look at how all the models performed for the recall high risk score. When we compare these statistics, we find that the Easy Ensemble Classifier had the highest score at 92%. Therefore, the model I would recommend to use for predicting high risk loans is the Easy Ensemble Classifier model due to its recall high risk score and its good performance overall.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.ipynb_checkpoints		.ipynb_checkpoints
images		images
.gitattributes		.gitattributes
LoanStats_2019Q1.csv		LoanStats_2019Q1.csv
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supervised Machine Learning and Credit Risk

Project Overview

Resources

Results

RandomOverSampler Model

SMOTE Model

ClusterCentroids Model

SMOTEENN Model

BalancedRandomForestClassifier Model

EasyEnsembleClassifier Model

Summary

About

Releases

Packages

Languages

christybell/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Supervised Machine Learning and Credit Risk

Project Overview

Resources

Results

RandomOverSampler Model

SMOTE Model

ClusterCentroids Model

SMOTEENN Model

BalancedRandomForestClassifier Model

EasyEnsembleClassifier Model

Summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages