-
Notifications
You must be signed in to change notification settings - Fork 1
5. Evaluation
triUMPF can be evaluated using a pre-trained model (see Training). A pre-trained model, ("triUMPF.pkl") trained on Enzyme Commission (EC) number indices with embedding ("biocyc205_tier23_9255_Xe.pkl") and the pathway indices ("biocyc205_tier23_9255_y.pkl") data is made available to users in the Download files section of this wiki.
Note: Make sure to put the source code triUMPF/
(see Installing triUMPF) into the triUMPF_materials/
directory as explained in the Download files section. Additionally, create a log/
and result/
(if you have not already created one during pathway prediction) folder in the same triUMPF_materials/
directory. The final structure should look like this:
triUMPF_materials/
├── objectset/
│ └── ...
├── model/
│ └── ...
├── dataset/
│ └── ...
├── result/
│ └── ...
└── triUMPF/
└── ...
For all experiments, using a terminal
(On Linux and macOS) or an Anaconda command prompt
(On Windows), navigate to the src/
folder in the triUMPF/
directory and then run the commands as shown in the Examples section.
To display triUMPF' running options use: python main.py --help
. It should be self-contained.
Two matrix files namely [DATANAME]_X*.pkl and the [DATANAME]_y.pkl must be provided for evaluation of a triUMPF model.
Note: Data files such as "[DATANAME]_Xa.pkl", "[DATANAME]_X.pkl",...etc can be used for evaluation, provided triUMPF was trained using these corresponding files.
The basic command is represented below. Do not use this to run the evaluation step. This command is only a representation of all the flags used. See the Examples section below on how to run Evaluation.
python main.py \
--evaluate \
--X-name "[DATANAME]_X*.pkl" \
--y-name "[DATANAME]_y.pkl" \
--file-name "[FILENAME]" \
--model-name "triUMPF.pkl" \
--dspath "[absolute path to the dataset directory (e.g. dataset)]" \
--rspath "[absolute path to the result directory (e.g. result)]" \
--batch 50 \
--num-jobs 2
The table below summarizes all the command-line arguments that are specific to this framework:
Argument name | Description | Value |
---|---|---|
--evaluate | To evaluate the performance of triUMPF on the input dataset | False |
--X-name | The input file name corresponding to EC number indices or any other feature files (see Advanced usage) | [DATANAME]_X*.pkl |
--y-name | The input file name corresponding to pathway indices | [DATANAME]_y.pkl |
--file-name | The names of input preprocessed files (without extension) | [FILENAME] |
--model-name | The name of the model excluding any **EXTENSION ** | triUMPF.pkl |
--dspath | The path to the datasets | [Outside source code] |
--rspath | The path to store results | [Outside source code] |
--batch | Batch size | 50 |
--num-jobs | The number of parallel workers | 2 |
The output file generated after running the command is:
File | Description |
---|---|
[FILENAME]_scores.txt | A text file containing model performance scores for all samples used |
To evaluate the performance of triUMPF on the golden dataset (golden_Xe.pkl and golden_y.pkl), run the following command:
Note: The flag --dsname
must include the name of the dataset which is "golden" in this case.
python main.py --evaluate --X-name "golden_Xe.pkl" --y-name "golden_y.pkl" --dsname "golden" --file-name "triUMPF_golden" --model-name "triUMPF" --num-jobs 2
After running the command, the output will be saved to the result/
folder. A short description of the output is given in the table above. The tree structure for the folder with the output will look like this:
triUMPF_materials/
├── objectset/
│ └── ...
├── model/
│ ├── triUMPF.pkl
│ └── ...
├── dataset/
│ └── ...
├── result/
| ├── triUMPF_golden_scores.txt
│ └── ...
└── triUMPF/
└── ...
To evaluate the performance of triUMPF on the cami dataset (cami_Xe.pkl and cami_y.pkl), run the following command:
Note: The flag --dsname
must include the name of the dataset which is "cami" in this case.
python main.py --evaluate --X-name "cami_Xe.pkl" --y-name "cami_y.pkl" --dsname "cami" --file-name "triUMPF_cami" --model-name "triUMPF" --num-jobs 2
After running the command, the output will be saved to the result/
folder. A short description of the output is given in the table above. The tree structure for the folder with the output will look like this:
triUMPF_materials/
├── objectset/
│ └── ...
├── model/
│ ├── triUMPF.pkl
│ └── ...
├── dataset/
│ └── ...
├── result/
| ├── triUMPF_cami_scores.txt
│ └── ...
└── triUMPF/
└── ...