OpenFE: An efficient automated feature generation tool

Forked from IIS-Li-Group/OpenFE, see also original documentation.

Content

Simplified usage
Description of data operations
- Example GroupByThenRank
- Example CombineThenFreq
Core program flow
Core structure
Changed

Simplyfied usage

pip install OpenFE@git+https://github.com/qte77/OpenFE -qq
ofe = OpenFE()
features = ofe.fit(data=train_x, label=train_y, train_index=index_col, **ofep)
ofe.new_features_list
train_x, test_x = ofe.transform(train_x, test, features, n_jobs=n_jobs)
# only for testing after OpenFE is done, custom get_score() necessary
score = get_score(train_x, test, train_y, test_y)

Description of data operations ↥

Feature generation methods used ordered by categorial and numerical. Creates features and uses lightgbm.LGBMRegressor and lightgbm.LGBMClassifier to rank them according to importance.

All: Freq
Numerical: Abs, Log, Sqrt, Square, Sigmoid, Round, Residual
Num2Num: Add, Substract, Multiply, Divise, Max, Min
Cat2Num: GroupByThenMin, GroupByThenMax, GroupByThenMean, GroupByThenMedian, GroupByThenStd, GroupByThenRank
Cat2Cat: Combine, CombineThenFreq, GroupByThenNUnique
Symmetry: Add, Subsctract, Multiply, Divise, Min, Max, Combine, CombineThenFreq

Example GroupByThenRank ↥

Usage

df['flabel_new)'] = df.loc[:, 'flabel1'].groupby(df.loc[:, 'flabel2']).rank(ascending=True, pct=True)

Source in OpenFE

elif self.name == 'GroupByThenRank':
    new_data = d1.groupby(d2).rank(ascending=True, pct=True)

Example CombineThenFreq ↥

pandas.DataFrame.combine

Source in OpenFE

elif self.name == "CombineThenFreq":
    temp = d1.astype(str) + '_' + d2.astype(str)
    temp[d1.isna() | d2.isna()] = np.nan
    value_counts = temp.value_counts()
    value_counts.loc[np.nan] = np.nan
    new_data = temp.apply(lambda x: value_counts.loc[x])

Core program flow ↥

OpenFE() -> Obj
|-fit(data, label, metric) -> new_features_list
| |- get_init_score() -> init_metric
| |- stage1_select() -> return_results
| \- stage2_select() -> results
\-transform(X_train, X_test, new_features_list) -> _train, _test

Core structure ↥

root
|- examples
|- openfe
|   |- __init__.py
|   |- FeatureGenerator.py
|   |- FeatureSelector.py
|   |- openfe.py
|   \- utils.py
|- README.md
\- setup.py

Changed ↥

Added n_estimators to OpenFE to communicate with LGBM
Added sklearn.metrics.r2_score
Added more verbosity levels
Added code folding for structure

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.github		.github
examples		examples
openfe		openfe
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenFE: An efficient automated feature generation tool

Content

Simplyfied usage

Description of data operations ↥

Example GroupByThenRank ↥

Example CombineThenFreq ↥

Core program flow ↥

Core structure ↥

Changed ↥

About

Releases

Packages

Languages

License

qte77/OpenFE

Folders and files

Latest commit

History

Repository files navigation

OpenFE: An efficient automated feature generation tool

Content

Simplyfied usage

Description of data operations ↥

Example GroupByThenRank ↥

Example CombineThenFreq ↥

Core program flow ↥

Core structure ↥

Changed ↥

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages