Files with _parallel
are the same as the normal, but use ProcessPoolExecutor
to run the code in parallel.
- autogluon_standalone_linear_ensemble_read_out: It reads data from a slurm output file and only change the is_plus item using the plus_or_minus functions. Prints the output of this as a json file. This allows for much quicker full runs, as only the plus_or_minus funcs need to be run.
- autogluon_semi-supervised: Pipeline utilizing Autogluon's built-in support for Semi-Supervised Learning.
- autogluon_standalone_calc_pymfe: Trains an Autogluon based Safeguard System Predictor.
- autogluon_standalone_load_model_pymfe: Load a TabularPredictor to run it for testing
- autogluon_standalone_pymfe_full_study_*: Run through the full dataset(s) mentioned with the full linear-ensembling pipeline. Uses Autogluon based Safeguard System.
- custom_datasets: the custom datasets as curated by Fusi et al. in "Probabilistic Matrix Factorization for Automated Machine Learning"
- pyme-runner: Creates a CSV file that can be used to train a safeguard system predictor from a given slurm_file.
- read_out_compare: Counts which run performed better given 2 slurm_output runs by counting which run won for each dataset.
- read_out_txt: Reads out the output from linear-ensembling runs and converts it into a defined datastructure (OpenMLDatasetResult).
- safeguard_perf: Calculate the mean accuracy of runs with and without the safeguard system. Also scatter plot the runs with and without the safeguard system.
- sklearn_linear_ensemble_read_out: It reads data from a slurm output file and only change the is_plus item using the plus_or_minus functions. Prints the output of this as a json file. This allows for much quicker full runs, as only the plus_or_minus funcs need to be run.
- sklearn_tree_calc_pymfe: Trains an Sklearn based RandomForestClassifier Safeguard System Predictor.
- sklearn_tree_debug_pymfe: Graph and explain the output from the RandomForestClassifier. Also is able to tune the hyperparameters of the RandomForestClassifier.
- sklearn_tree_pymfe_full_study: Run through the full dataset(s) selected with the full linear-ensembling pipeline. Uses Sklearn based Safeguard System.
- test_linear_ensemble: A test file for testing the linear ensembling process.
- Run
pymfe-runner.py -i <slurm_output file> -o <csv_file>
. Decide which metric for meta-features (MFE
) should be used, e.g. landmarking, model-based or a combination. - Run
autogluon_standalone_calc_pymfe.py -f <csv file>
to create the Autogluon model, which will be saved in "AutogluonModels". To change the eval model used, it can be imported and changed when calling themain
function. The default isrecall
. - Use this AutoGluon model in e.g.,
autogluon_standalone_pymfe_full_study_100.py
. Editslurm_full_study_pymfe-diff-cluster.sh
to include the file name of the model output in step 2. Don't forget to edit MFE to use the same metrics as inpymfe-runner.py
.
./autogluon_standalone_calc_pymfe.py -f csv_files/fullpymfe-model-based-cc18+100.csv
Usage:
autogluon_standalone_pymfe_full_study.py -f <saved model from autogluon>
.
- Run
pymfe-runner.py -i <slurm_output file> -o <csv_file>
. Decide which metric for meta-features (MFE
) should be used, e.g. landmarking, model-based or a combination. - Run
sklearn_tree_calc_pymfe.py -f <csv file> -o <output file>.joblib
to create the sklearn model, which will be saved in "SavedSklearnModels". - Use this sklearn model in
sklearn_tree_pymfe_full_study.py
. Editslurm_full_study_sklearn_pymfe-diff-cluster.sh
to include the file name of the model output in step 2. Don't forget to edit MFE to use the same metrics as inpymfe-runner.py
.
./sklearn_tree_calc_pymfe.py -f csv_files/fullpymfe-model-based-cc18+100.csv
Usage:
sklearn_tree_pymfe_full_study.py -f <saved model from sklearn>.joblib
or slurm_full_study_sklearn_pymfe-diff-cluster.sh
for a slurm version
If a slurm output file of the selected dataset and an AutoGluon/sklearn model already exist, and only the _full_study
etc.. needs to be done to calculate what the model will predict ("plus" or "minus"), then the co_ensemble_read_out.py
(sklearn) and autogluon_co_ensemble_read_out.py
(autogluon) can be used. The syntax is as follows: python3.10 co_ensemble_read_out.py -f <sklearn model> -s <slurm_output> -o <output file.json>
for sklearn, similar for autogluon. Instead of a regular formatted slurm_output file, this code returns a json version, as described in "Output Format.md".
co_ensemble_read_out.py
and autogluon_co_ensemble_read_out.py
also have a parallel versions, which run the same code but in parallel with the help of ProcessPoolExecutor
, which means that the GIL is not a problem.