Dask Backward Feature Selection

Backward step-wise feature selection using Dask, scikit-learn compatible.

Scale out feature seletion using distributed computing/Dask!

I created this due to the fact that mlxtend's SequentialFeatureSelector did not use joblib in a Dask compatable way.

Install

pip install git+https://github.com/pr38/dask_backward_feature_selection

Example Usage

import numpy as np
import pandas as pd

from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_boston

from dask.distributed import Client, LocalCluster

from dask_backward_feature_selection import DaskBackwardFeatureSelector

#You should be useing Dask's yarn or kubernates cluster deployments
#if you are going to be running this localy you are better off useing mlxtend's SequentialFeatureSelector 
cluster = LocalCluster(3)
client = Client(cluster)

boston = load_boston()
X = boston['data']
y = boston['target']

dfs = DaskBackwardFeatureSelector(DecisionTreeRegressor(),client)
#kwargs for DaskBackwardFeatureSelector are:
#k_features: the smallest combination of features DaskBackwardFeatureSelector will examine.
#cv: if "cv" is an int, it will refer to the number of  cross validation folds for each feature combination tested. 
#cv can also be a scikitlearn CV class.
#scoring: can be string (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.get_scorer.html#sklearn.metrics.get_scorer)
#, or a scikitlearn scoring class.
#if scatter is true, each thread in the cluster will keep a copy of the training data and estimator.

dfs.fit(X,y)

#positions of top performing combination of features in X matrix.
dfs.k_feature_idx_

#we can treat DaskBackwardFeatureSelector as an estimator after training.
dfs.predict(X)


#also DaskBackwardFeatureSelector can act as transformer.
dfs.transform(X,y)

#finally we can examine the best performing feature combinations for each step, for other use cases (ie:one-standard-error rule).
pd.DataFrame(dfs.metric_dict_ )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dask Backward Feature Selection

Install

Example Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dask Backward Feature Selection

Install

Example Usage