This is (extended) fastai2-version of my previous work This project helps you to interpret tabular models, made with fastai2
Some examples of using these methods are made for 2 datasets: well known Bulldozers dataset and transfermarkt's football players transfer statistics Corresponding interpretations are in bulldozer and football example notebooks.
Main interpretation methods available are:
- Dendrogram -- can help to calculate and visualize features' correlations which can be used later
- Feature importance -- can help to calculate relative and visualize importance of isolated features as well as lists of correlated (connected) features, that were determined earlier
- Partial Dependence -- shows how particular value of a feature influence dependent variable. In what direction we should move this particular feature to minimase or maximize the result
- Waterfall help to visualize how tabular model came to concluzion in the particular case. How and in what direction each feature value moves the dependent variable
- Embeddings -- this chapter helps to visualize embeddings calculated in the model
These 5 chapters works nicely with an algorithm based on Jeremy Howard's article. In short:
- We take some task (bulldozer's sales), make it's model (fastai tabular model creation).
- Then we determine what features (feature importance) influence our value the most (let's say we want sell our bulldozer as high as possible).
- Optionally dividing some features into groups (dendrogram).
- Then we look at our task and find the features we can change in the real word from the top-important features (for example we can change in what state we sell our bulldozer or some other features, in fact I know nothing about bulldozers market in US :( )
- After that we find the most useful for us value of this feature. In whole dataset (partial dependence) or in our particular case (waterfall). The last one also help us to determine what values drive price up or down the most.
- Having this information and knowing what we can really change, we can optimize our bulldozer's sell price
This work is based on my previous notebook which in turn was based on Jeremy Howard's lectures. Also some parts of this work are inspired by Zachary Mueller's lectures especially tabular interpretation lesson
Restrictions: I've tested it for regression-based models only. Don't think it will work for classification without some refactoring