Added some hints to Readme about GPU running, Added a missing cuda_vi…

…sibility command
EhsanMashhadi · Jan 8, 2022 · decced3 · decced3
1 parent 61e6ff0
commit decced3
Show file tree

Hide file tree

Showing 3 changed files with 11 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -30,6 +30,7 @@ You can find the paper here: https://arxiv.org/abs/2103.11626
 ### Running Simple LSTM Experiments
 1. Install [OpenNMT-py](https://github.com/OpenNMT/OpenNMT-py)
     - `pip install OpenNMT-py==2.2.0`
+    - If you face conflicts between pytorch and CUDA version, you can follow this [link](https://pytorch.org/get-started/locally/)
 2. Preprocess the MSR data
     - `bash ./scripts/simple-lstm/build_vocab.sh`
 3. Train the model
@@ -49,5 +50,10 @@ You can find the paper here: https://arxiv.org/abs/2103.11626
 4. Evaluate the model
     - `bash ./scripts/simple-lstm/legacy/test.sh`
 
-### How to run all of experiments?
+### How to run all experiments?
    - You can change the `size` and `type` variables value in script files to run different experiments (large | small, unique | repetition).
+
+### Have trouble running on GPU?
+1. Check the `CUDA` and `PyTorch` compatibility
+2. Assign the correct values for `CUDA_VISIBLE_DEVICES`, `gpu_rank`, and `world_size` based on your GPU numbers in all scripts.
+3. Run on GPU by removing the `gpu_rank`, and `world_size` options in all scripts.
diff --git a/scripts/simple-lstm/legacy/train.sh b/scripts/simple-lstm/legacy/train.sh
@@ -4,5 +4,6 @@ size=small  # Can be: small OR large
 type=unique  # Can be: repetition OR unique
 output_dir=./saved_models/simple-lstm-legacy/$type/$size
 
-export CUDA_VISIBLE_DEVICES=0,1,2,3
+export CUDA_VISIBLE_DEVICES=0,1,2,3 # You may change it based on the GPU number, and also change `world_size` and `gpu_ranks` accordingly.
+#You can also delete the `world_size` and `gpu_ranks` to run the scripts on CPU (It will take much longer)
 onmt_train -data $output_dir/final -world_size 4 -gpu_ranks 0 1 2 3 -encoder_type brnn -enc_layers 2 -decoder_type rnn -dec_layers 2 -rnn_size 256 -global_attention general -batch_size 32 -word_vec_size 256 -bridge -copy_attn -reuse_copy_attn -train_steps 20000 -save_checkpoint_steps 5000 -valid_steps 1000 -save_model $output_dir/final-model
diff --git a/scripts/simple-lstm/train.sh b/scripts/simple-lstm/train.sh
@@ -4,5 +4,6 @@ size=small  # Can be: small OR large
 type=unique  # Can be: repetition OR unique
 data_config=./scripts/simple-lstm/${type}_${size}_data.yaml
 output_dir=./saved_models/simple-lstm/$type/$size
-
+export CUDA_VISIBLE_DEVICES=0,1,2,3 # You may change it based on the GPU number, and also change `world_size` and `gpu_ranks` accordingly.
+#You can also delete the `world_size` and `gpu_ranks` to run the scripts on CPU (It will take much longer)
 onmt_train -config $data_config -share_vocab -src_vocab $output_dir/final.vocab -world_size 4 -gpu_ranks 0 1 2 3 -encoder_type brnn -enc_layers 2 -decoder_type rnn -dec_layers 2 -rnn_size 256 -global_attention general -batch_size 32 -word_vec_size 256 -bridge -copy_attn -reuse_copy_attn -train_steps 20000 -save_checkpoint_steps 5000 -valid_steps 1000 -save_model $output_dir/final-model