Errors In PyTorch - Multi-GPU #5

WilliamJudge94 · 2021-10-07T16:39:18Z

In order to get accurate training on the multi-GPU instance of PyTorch one must keep the batch size the same. Please see explanation here (https://discuss.pytorch.org/t/accuracy-difference-on-multi-gpu-with-nn-dataparallel/65481/12)

As for the model saving for the multi-GPU instance, one must chance the code to the following (see the following for more information https://discuss.pytorch.org/t/save-checkpoints-trained-on-multi-gpus-for-load-on-single-gpu/97881/9):

#Function to update saved model if validation loss is minimum
def update_saved_model(model, path):
    if not os.path.isdir(path):
        os.mkdir(path)
    for f in os.listdir(path):
        os.remove(os.path.join(path, f))
    if (NGPUS>1):    
        
        if isinstance(model, nn.DataParallel):
            torch.save(model.module.state_dict(), path+'best_model.pth')
        else:
            torch.save(model.state_dict(), path+'best_model.pth')
    else:
        torch.save(model, path+'best_model.pth')

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors In PyTorch - Multi-GPU #5

Errors In PyTorch - Multi-GPU #5

WilliamJudge94 commented Oct 7, 2021

Errors In PyTorch - Multi-GPU #5

Errors In PyTorch - Multi-GPU #5

Comments

WilliamJudge94 commented Oct 7, 2021