GitHub - pr38/dask_backward_feature_selection: Backward step-wise feature selection using Dask, scikit-learn compatible

Dask Backward Feature Selection

Backward step-wise feature selection using Dask, scikit-learn compatible.

Scale out feature seletion using distributed computing/Dask!

I created this due to the fact that mlxtend's SequentialFeatureSelector did not use joblib in a Dask compatable way.

Install

pip install git+https://github.com/pr38/dask_backward_feature_selection

Example Usage

import numpy as np
import pandas as pd

from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_boston

from dask.distributed import Client, LocalCluster

from dask_backward_feature_selection import DaskBackwardFeatureSelector

#You should be useing Dask's yarn or kubernates cluster deployments
#if you are going to be running this localy you are better off useing mlxtend's SequentialFeatureSelector 
cluster = LocalCluster(3)
client = Client(cluster)

boston = load_boston()
X = boston['data']
y = boston['target']

dfs = DaskBackwardFeatureSelector(DecisionTreeRegressor(),client)
#kwargs for DaskBackwardFeatureSelector are:
#k_features: the smallest combination of features DaskBackwardFeatureSelector will examine.
#cv: if "cv" is an int, it will refer to the number of  cross validation folds for each feature combination tested. 
#cv can also be a scikitlearn CV class.
#scoring: can be string (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.get_scorer.html#sklearn.metrics.get_scorer)
#, or a scikitlearn scoring class.
#if scatter is true, each thread in the cluster will keep a copy of the training data and estimator.

dfs.fit(X,y)

#positions of top performing combination of features in X matrix.
dfs.k_feature_idx_

#we can treat DaskBackwardFeatureSelector as an estimator after training.
dfs.predict(X)


#also DaskBackwardFeatureSelector can act as transformer.
dfs.transform(X,y)

#finally we can examine the best performing feature combinations for each step, for other use cases (ie:one-standard-error rule).
pd.DataFrame(dfs.metric_dict_ )

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
dask_backward_feature_selection		dask_backward_feature_selection
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dask Backward Feature Selection

Install

Example Usage

About

Releases

Packages

Languages

License

pr38/dask_backward_feature_selection

Folders and files

Latest commit

History

Repository files navigation

Dask Backward Feature Selection

Install

Example Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages