Skip to content

leowei08/CreditRisk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CreditRisk

Data Analysis on Loan Default through Lending Club data

Motivation

Almost everyone goes through a loan of some kind, whether it is for a car or a house, in their entire life and they amount to a great fortune for banks. Interests are centainly where the profits are, but defaults are where the risk is, and so it is of particular interest to dig deeper into what factors affect loan defaults to help financial institutions understand and regulate loans in the financial market. The dataset is from the Lending Club at Kaggle at https://www.kaggle.com/wordsforthewise/lending-club.

Approach

Borrowers provide many kinds of their personal information when filling out a loan application and we can use those to predict the probabiliy of default by a logistic regression. If we were in industry, we could go beyond these financial information and capture one's values and habits to better model the probablity of default.

Files

EDA: Exploratory data analysis on the dataset with plots using seaborn and matplotlib.

Model: Builds a baseline model and many larger models, performs model selection through Cumulative Accuracy Profiles and Area Under the Curve, investigates the tradeoff betweeen interpretability and predictability of a model, and conducts analysis to gain insights on the important factors of loan defaults.

Key Takeaways and Notes

  • Picked up accuracy ratio and cumulative accuracy profile
  • Model consistent with credit grades assigned by lending club
  • Investigated interpretability and predictability between simple and complicated models
  • Analyzed the subgroup performance of models
  • Random forest performs 10% better than logistic regression so we should take on the random forest if we were in industry because 10% of total loans is a large number, and logistic regression can still help us with analysis given the feature importance from the random forest as well
  • Concluded that interest rate, the number of terms, and debt-to-income ratios are the most important factors in credit risk

About

Credit Risk

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published