Forked from IIS-Li-Group/OpenFE, see also original documentation.
pip install OpenFE@git+https://github.com/qte77/OpenFE -qq
ofe = OpenFE()
features = ofe.fit(data=train_x, label=train_y, train_index=index_col, **ofep)
ofe.new_features_list
train_x, test_x = ofe.transform(train_x, test, features, n_jobs=n_jobs)
# only for testing after OpenFE is done, custom get_score() necessary
score = get_score(train_x, test, train_y, test_y)
Description of data operations ↥
Feature generation methods used ordered by categorial and numerical. Creates features and uses lightgbm.LGBMRegressor
and lightgbm.LGBMClassifier
to rank them according to importance.
- All: Freq
- Numerical: Abs, Log, Sqrt, Square, Sigmoid, Round, Residual
- Num2Num: Add, Substract, Multiply, Divise, Max, Min
- Cat2Num: GroupByThenMin, GroupByThenMax, GroupByThenMean, GroupByThenMedian, GroupByThenStd, GroupByThenRank
- Cat2Cat: Combine, CombineThenFreq, GroupByThenNUnique
- Symmetry: Add, Subsctract, Multiply, Divise, Min, Max, Combine, CombineThenFreq
Example GroupByThenRank ↥
Usage
df['flabel_new)'] = df.loc[:, 'flabel1'].groupby(df.loc[:, 'flabel2']).rank(ascending=True, pct=True)
elif self.name == 'GroupByThenRank':
new_data = d1.groupby(d2).rank(ascending=True, pct=True)
Example CombineThenFreq ↥
elif self.name == "CombineThenFreq":
temp = d1.astype(str) + '_' + d2.astype(str)
temp[d1.isna() | d2.isna()] = np.nan
value_counts = temp.value_counts()
value_counts.loc[np.nan] = np.nan
new_data = temp.apply(lambda x: value_counts.loc[x])
Core program flow ↥
OpenFE() -> Obj
|-fit(data, label, metric) -> new_features_list
| |- get_init_score() -> init_metric
| |- stage1_select() -> return_results
| \- stage2_select() -> results
\-transform(X_train, X_test, new_features_list) -> _train, _test
Core structure ↥
root
|- examples
|- openfe
| |- __init__.py
| |- FeatureGenerator.py
| |- FeatureSelector.py
| |- openfe.py
| \- utils.py
|- README.md
\- setup.py
Changed ↥
- Added
n_estimators
toOpenFE
to communicate withLGBM
- Added
sklearn.metrics.r2_score
- Added more verbosity levels
- Added code folding for structure