The Drug Response Prediction 2022 project in Computational Biology and Artificial Intelligence (COMBINE) Laboratory, McGill University.
In the console, type the following command.
git clone
It is preferred to use CONDA for dependency packages management. Type the following command to the console (make sure the current working directory is under the project root /MTDRP/) to create a new environment and to install all required packages.
conda create --name env --file ./requirements.txt
Activate the newly created CONDA environment.
Download (not disclosed, will be available in the future), unzip it and merge the folder to ./data/DRP2022_preprocessed.
The dataset contains multiple .csv files, this operation extracts numerical values from them and creates objects (sub-class of for easy training and testing.
A particular set of folds (for cross-validation) with an (optional) addition data preprocessing rule should be determined. In the example below (see 2.2.3), the first fold (indexed 0) in cl_fold and zero-mean standardization are used to create PyTorch datasets.
It is possible and easy to define a new 2nd-stage preprocessing method in ./datahandlers/ (see 2.2.2). Min-max normalization and zero-mean standardization rules are provided initially.
Every preprocessing method should pack to a class that inherits datahandlers.dataset_handler.PreprocessRule, and implements its preprocess() interface to return a list that contains two torch.Tensor for training and testing, respectively.
In the Python console.
>>> from datahandlers.dataset_handler import DRPGeneralDataset
>>> from datahandlers.custom_preprocess_rules import Standardization
>>> GDSC = DRPGeneralDataset()
>>> GDSC.load_from_csv('GDSC',
>>> train, test = GDSC.get_fold('cl_fold', 0, preprocess=Standardization(), save=True)
>>> print(len(train), len(test))
259386 66319
In the above example, passing save=True saves all tensor files (.pt) under ./tensors/Standardization/GDSC/cl_fold0/. It is recommended to do so.
In the console, type the following command with arguments source_path, batch_size, epochs and lr (the learning rate).
python --source_path ./tensors/Standardization/GDSC/cl_fold0/ --batch_size 20 --epochs 100 --lr 1e-4