Skip to content

Latest commit

 

History

History
57 lines (26 loc) · 2.42 KB

README.md

File metadata and controls

57 lines (26 loc) · 2.42 KB

Car Evaluation

Definition of Problem

Features of Dataset

  • Buying Degree of purchase of the vehicle.

  • Maint Degree of vehicle maintenance.

  • Doors Number of vehicle doors.

  • Persons Number of people that fit in the car.

  • Lug Boot Car trunk size.

  • Safety Degree of vehicle safety.

EDA

Most of the classes have a good proportion, except for the variable to be predicted, there is a clear disproportion between both classes. It can cause the classification model to have preference in the class with the most presence, for this reason it is convenient to make a balance adjustment.

SMOTE

Applying the SMOTE algorithm, a perfect match between the variables to be predicted is achieved, generating new examples from the data set, allowing the algorithm to have more data to learn.

Random Forest

I decided to apply this algorithm, because it is not a complex problem and it is based on logical decisions, in addition to not requiring a data standardization process.

It works in a similar way to its smaller brother the decision tree, with the difference that you only use one tree you can use many more.

The prediction system for a classification problem works as follows: Each Random Forest estimator will generate its prediction and the final value will be the most voted category, solving the decision tree overfitting problem.

cm

metrics

Most of the categories are classified correctly, in addition to relentless cprecision per category, which gives us a robust model without the risk of overfitting.

Finally we save the model, with the purpose of using it to generate a web application using Shiny, which allows users to enter new data in a comfortable way and generate predictions in real time.