In this assignment, a machine learning model will be built that attempts to predict whether a loan from LendingClub will become high risk or not.
LendingClub is a peer-to-peer lending services company that allows individual investors to partially fund personal loans as well as buy and sell notes backing the loans on a secondary market. LendingClub offers their previous data through an API.
You will be using this data to create machine learning models to classify the risk level of given loans. Specifically, you will be comparing the Logistic Regression model and Random Forest Classifier.
Training Score: 0.6509031198686371
Testing Score: 0.5165886856656742
Training Score: 1.0
Testing Score: 0.6433432581880051
Training Score: 0.7078817733990148
Testing Score: 0.767333049766057
Training Score: 1.0
Testing Score: 0.6420672054444917
- Logistic Regression fails to precede high-risk customers by allowing a recall of 0.30 due to false-negative cases.
- Random Forest Classifier has the perfect training score but the testing score of 0.64. This gap allows us to say that the model is overfitting.
- Random Forest Classifier model on the scaled data, in this model, the training score is one, and the testing score is 0.64. At the moment of observing, the recall is 0. It indicates that the model is overfitting and that the false negatives are high, leading to error.
- The logistic Regression model on the scaled data has a training score of 0.71 and a testing score of 0.76. With these results, we can say that although it is not a model that predicts perfectly, it is close to the reality you want to predict. This is verified with a recall of the high-risk clients of 0.72, where the false negatives go down.