Skip to content

5. Evaluation

Abdurrahman Abul-Basher edited this page Jul 7, 2021 · 11 revisions

Overview

triUMPF can be evaluated using a pre-trained model (see Training). A pre-trained model, ("triUMPF.pkl") trained on Enzyme Commission (EC) number indices with embedding ("biocyc205_tier23_9255_Xe.pkl") and the pathway indices ("biocyc205_tier23_9255_y.pkl") data is made available to users in the Download files section of this wiki.

Note: Make sure to put the source code triUMPF/ (see Installing triUMPF) into the triUMPF_materials/ directory as explained in the Download files section. Additionally, create a log/ and result/ (if you have not already created one during pathway prediction) folder in the same triUMPF_materials/ directory. The final structure should look like this:

triUMPF_materials/
	├── objectset/
        │       └── ...
	├── model/
        │       └── ...
	├── dataset/
        │       └── ...
	├── result/
        │       └── ...
	└── triUMPF/
                └── ...

For all experiments, using a terminal (On Linux and macOS) or an Anaconda command prompt (On Windows), navigate to the src/ folder in the triUMPF/ directory and then run the commands as shown in the Examples section.

To display triUMPF' running options use: python main.py --help. It should be self-contained.

Table of Contents

Input:

Two matrix files namely [DATANAME]_X*.pkl and the [DATANAME]_y.pkl must be provided for evaluation of a triUMPF model.

Note: Data files such as "[DATANAME]_Xa.pkl", "[DATANAME]_X.pkl",...etc can be used for evaluation, provided triUMPF was trained using these corresponding files.

Command:

The basic command is represented below. Do not use this to run the evaluation step. This command is only a representation of all the flags used. See the Examples section below on how to run Evaluation.

python main.py \
--evaluate \
--X-name "[DATANAME]_X*.pkl" \
--y-name "[DATANAME]_y.pkl" \
--file-name "[FILENAME]" \
--model-name "triUMPF.pkl" \
--dspath "[absolute path to the dataset directory (e.g. dataset)]" \
--rspath "[absolute path to the result directory (e.g. result)]" \
--batch 50 \
--num-jobs 2

Argument descriptions:

The table below summarizes all the command-line arguments that are specific to this framework:

Argument name Description Value
--evaluate To evaluate the performance of triUMPF on the input dataset False
--X-name The input file name corresponding to EC number indices or any other feature files (see Advanced usage) [DATANAME]_X*.pkl
--y-name The input file name corresponding to pathway indices [DATANAME]_y.pkl
--file-name The names of input preprocessed files (without extension) [FILENAME]
--model-name The name of the model excluding any **EXTENSION ** triUMPF.pkl
--dspath The path to the datasets [Outside source code]
--rspath The path to store results [Outside source code]
--batch Batch size 50
--num-jobs The number of parallel workers 2

Output:

The output file generated after running the command is:

File Description
[FILENAME]_scores.txt A text file containing model performance scores for all samples used

Examples

Example 1:

To evaluate the performance of triUMPF on the golden dataset (golden_Xe.pkl and golden_y.pkl), run the following command:

Note: The flag --dsname must include the name of the dataset which is "golden" in this case.

python main.py --evaluate --X-name "golden_Xe.pkl" --y-name "golden_y.pkl" --dsname "golden" --file-name "triUMPF_golden" --model-name "triUMPF" --num-jobs 2

After running the command, the output will be saved to the result/ folder. A short description of the output is given in the table above. The tree structure for the folder with the output will look like this:

triUMPF_materials/
	├── objectset/
        │       └── ...
	├── model/
        │       ├── triUMPF.pkl
        │       └── ...
	├── dataset/
        │       └── ...
	├── result/
        |       ├── triUMPF_golden_scores.txt
        │       └── ...
	└── triUMPF/
                └── ...

Example 2:

To evaluate the performance of triUMPF on the cami dataset (cami_Xe.pkl and cami_y.pkl), run the following command:

Note: The flag --dsname must include the name of the dataset which is "cami" in this case.

python main.py --evaluate --X-name "cami_Xe.pkl" --y-name "cami_y.pkl" --dsname "cami" --file-name "triUMPF_cami" --model-name "triUMPF" --num-jobs 2

After running the command, the output will be saved to the result/ folder. A short description of the output is given in the table above. The tree structure for the folder with the output will look like this:

triUMPF_materials/
	├── objectset/
        │       └── ...
	├── model/
        │       ├── triUMPF.pkl
        │       └── ...
	├── dataset/
        │       └── ...
	├── result/
        |       ├── triUMPF_cami_scores.txt
        │       └── ...
	└── triUMPF/
                └── ...

back to top