Skip to content

For this project, I predicted credit risk with the supervised machine learning models I built and evaluated using Python.

Notifications You must be signed in to change notification settings

christybell/Credit_Risk_Analysis

Repository files navigation

Supervised Machine Learning and Credit Risk

Project Overview

In this project, I used Python to build and evaluate several machine learning models to predict credit risk. I employed the following different techniques:

  • Oversample the data using the RandomOverSampler and SMOTE algorithms.
  • Undersample the data using the ClusterCentroids algorithm.
  • Use a combinatorial approach of over- and undersampling using the SMOTEENN algorithm.
  • Compare two machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier.

I will evaluate the performance of these models and make a written recommendation on whether they should be used to predict credit risk.

Resources

  • Data Source: LoanStats_2019Q1.csv
  • Software and Tools: Python, Anaconda, Jupyter Notebook & Git Bash

Results

RandomOverSampler Model

  • Accuracy Score is 65.4%
  • Precision High Risk Score is 1%
  • Precision Low Risk Score is 100%
  • Recall High Risk Score is 73%
  • Recall Low Risk Score is 58%

SMOTE Model

  • Accuracy Score is 66.3%
  • Precision High Risk Score is 1%
  • Precision Low Risk Score is 100%
  • Recall High Risk Score is 63%
  • Recall Low Risk Score is 69%

ClusterCentroids Model

  • Accuracy Score is 66.3%
  • Precision High Risk Score is 1%
  • Precision Low Risk Score is 100%
  • Recall High Risk Score is 69%
  • Recall Low Risk Score is 40%

SMOTEENN Model

  • Accuracy Score is 54.5%
  • Precision High Risk Score is 1%
  • Precision Low Risk Score is 100%
  • Recall High Risk Score is 79%
  • Recall Low Risk Score is 56%

BalancedRandomForestClassifier Model

  • Accuracy Score is 78.9%
  • Precision High Risk Score is 3%
  • Precision Low Risk Score is 100%
  • Recall High Risk Score is 70%
  • Recall Low Risk Score is 87%

EasyEnsembleClassifier Model

  • Accuracy Score is 93.2%
  • Precision High Risk Score is 9%
  • Precision Low Risk Score is 100%
  • Recall High Risk Score is 92%
  • Recall Low Risk Score is 94%

Summary

In summary, the results of these machine learning models show that the two Ensemble Classifiers perform best. When we compare the accuracy scores of all the models, the Balanced Random Forest Classifier and Easy Ensemble Classifier models had the highest scores, at 78.9% and 93.2% respectively. Since the goal of our analysis is to find a model that can best detect when a loan is high risk, we should take a close look at how all the models performed for the recall high risk score. When we compare these statistics, we find that the Easy Ensemble Classifier had the highest score at 92%. Therefore, the model I would recommend to use for predicting high risk loans is the Easy Ensemble Classifier model due to its recall high risk score and its good performance overall.

About

For this project, I predicted credit risk with the supervised machine learning models I built and evaluated using Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published