Decision Tree Project

Python implementation of decision tree for classification of meteorological dataset

1. Models and Methods

1.1 Decision Tree Models

Here I used a improved CART model as my basic decision tree model, which would split dataset and choose best feature based on Gini or Entropy.

1.2 Pruning Methods

Several tree pruning methods were implemented in class DecisionTree to avoid overfitting.

Reduced Error Pruning
Pessimistic Pruning Unlike other pruning methods, pessimistic pruning is a top-down algorithm, which is normally done by going through the nodes from the top of the tree. Here I also used a bottom-up method, which was brought up in a lecture from USC years ago (Machine Learning CSCI-567).
Minimum Error Pruning

1.3 Ensemble Methods

For now, I just use the ensemble methods provided in Scikit-Learn.

AdaBoost
Bagging
Random Forest

1.4 QnA

Q: Why is it recommended not to prune the trees while training random forest / bagging?
A: Pruning methods are usually used to prevent overfitting. As random forests do sampling with replacement along with random selection of features at each node for splitting the dataset, the correlation between the weak learners (individual tree models) would be low. So generally random forests can do a great job with just full depth. As for bagging, only variance can be reduced through the bagging process, not bias (we can see high bias as underfitting and high variance as overfitting, see bias-variance tradeoff). So we'd like the individual trees to have lower bias, in which case, overfitting trees are more than suitable.

2. Dataset

The dataset contains 40,000 entries of hourly meteorological data from the paper Assessing Beijing's PM 2.5 pollution: severity, weather impact, APEC and winter heating and China Meteorological Data Service Center.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Decision Tree Project

1. Models and Methods

1.1 Decision Tree Models

1.2 Pruning Methods

1.3 Ensemble Methods

1.4 QnA

2. Dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

Decision Tree Project

1. Models and Methods

1.1 Decision Tree Models

1.2 Pruning Methods

1.3 Ensemble Methods

1.4 QnA

2. Dataset