You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a very nasty problem involving pytorch DataLoader. In the log below, you notice that dataloader delay is 6s while the overall epoch takes only 8s. Tensorflow2 only takes 3s for one epoch on same settings(TODO: make tf2 time consumption test code).
Log: (in branch pytorch_dataset_hanging_problem)
[*] Start training
***> dataloader delay: 5.8484
>> end of iteration (time interval: 0.0408) )
[2020-04-06 22:22:14]
>> epoch 1 loss: 0.045038 (lr: 1.00e-03)
***> dataloader delay: 6.1289
>> end of iteration (time interval: 0.0431) )
[2020-04-06 22:22:26]
>> epoch 2 loss: 0.001379 (lr: 1.00e-03)
Possible scenarios:
On every start of epoch, there is a heavy operation done by DataLoader. For example, remaking threads, thus re-copying each resources.
Iteration doesn't start until each worker completes on batch(=32).
There is a very nasty problem involving pytorch DataLoader. In the log below, you notice that dataloader delay is 6s while the overall epoch takes only 8s. Tensorflow2 only takes 3s for one epoch on same settings(TODO: make tf2 time consumption test code).
Log: (in branch pytorch_dataset_hanging_problem)
Possible scenarios:
On every start of epoch, there is a heavy operation done by DataLoader. For example, remaking threads, thus re-copying each resources.
Iteration doesn't start until each worker completes on batch(=32).
Useful links:
https://discuss.pytorch.org/t/dataloader-with-num-workers-1-hangs-every-epoch/20323/9
https://discuss.pytorch.org/t/dataloader-resets-dataset-state/27960/4
https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
The text was updated successfully, but these errors were encountered: