Skip to content
/ OpenFE Public
forked from IIIS-Li-Group/OpenFE

OpenFE: automated feature generation with expert-level performance

License

Notifications You must be signed in to change notification settings

qte77/OpenFE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenFE: An efficient automated feature generation tool

Forked from IIS-Li-Group/OpenFE, see also original documentation.

CodeFactor Ruff Links (Fail Fast) Open in Visual Studio Code

Content

Simplyfied usage

pip install OpenFE@git+https://github.com/qte77/OpenFE -qq
ofe = OpenFE()
features = ofe.fit(data=train_x, label=train_y, train_index=index_col, **ofep)
ofe.new_features_list
train_x, test_x = ofe.transform(train_x, test, features, n_jobs=n_jobs)
# only for testing after OpenFE is done, custom get_score() necessary
score = get_score(train_x, test, train_y, test_y)

Description of data operations

Feature generation methods used ordered by categorial and numerical. Creates features and uses lightgbm.LGBMRegressor and lightgbm.LGBMClassifier to rank them according to importance.

  • All: Freq
  • Numerical: Abs, Log, Sqrt, Square, Sigmoid, Round, Residual
  • Num2Num: Add, Substract, Multiply, Divise, Max, Min
  • Cat2Num: GroupByThenMin, GroupByThenMax, GroupByThenMean, GroupByThenMedian, GroupByThenStd, GroupByThenRank
  • Cat2Cat: Combine, CombineThenFreq, GroupByThenNUnique
  • Symmetry: Add, Subsctract, Multiply, Divise, Min, Max, Combine, CombineThenFreq

Example GroupByThenRank

Usage

df['flabel_new)'] = df.loc[:, 'flabel1'].groupby(df.loc[:, 'flabel2']).rank(ascending=True, pct=True)

Source in OpenFE

elif self.name == 'GroupByThenRank':
    new_data = d1.groupby(d2).rank(ascending=True, pct=True)

Example CombineThenFreq

Source in OpenFE

elif self.name == "CombineThenFreq":
    temp = d1.astype(str) + '_' + d2.astype(str)
    temp[d1.isna() | d2.isna()] = np.nan
    value_counts = temp.value_counts()
    value_counts.loc[np.nan] = np.nan
    new_data = temp.apply(lambda x: value_counts.loc[x])

Core program flow

OpenFE() -> Obj
|-fit(data, label, metric) -> new_features_list
| |- get_init_score() -> init_metric
| |- stage1_select() -> return_results
| \- stage2_select() -> results
\-transform(X_train, X_test, new_features_list) -> _train, _test

Core structure

root
|- examples
|- openfe
|   |- __init__.py
|   |- FeatureGenerator.py
|   |- FeatureSelector.py
|   |- openfe.py
|   \- utils.py
|- README.md
\- setup.py

Changed

  • Added n_estimators to OpenFE to communicate with LGBM
  • Added sklearn.metrics.r2_score
  • Added more verbosity levels
  • Added code folding for structure

About

OpenFE: automated feature generation with expert-level performance

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%