You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
我自己实现数据并行会报错如下:
Traceback (most recent call last):
File "/home/user01/hzz/MLIC++/train.py", line 158, in
main()
File "/home/user01/hzz/MLIC++/train.py", line 119, in main
current_step = train_one_epoch(
File "/home/user01/hzz/MLIC++/utils/training.py", line 18, in train_one_epoch
out_net = model(d)
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user01/hzz/MLIC++/models/mlicpp.py", line 93, in forward
self.update_resolutions(x.size(2) // 16, x.size(3) // 16)
File "/home/user01/hzz/MLIC++/models/mlicpp.py", line 191, in update_resolutions
self.local_context[i].update_resolution(H, W, next(self.parameters()).device, mask=None)
StopIteration
作者你能开源一下你的train_one_epoch_ddp实现吗?
The text was updated successfully, but these errors were encountered:
我自己实现数据并行会报错如下:
Traceback (most recent call last):
File "/home/user01/hzz/MLIC++/train.py", line 158, in
main()
File "/home/user01/hzz/MLIC++/train.py", line 119, in main
current_step = train_one_epoch(
File "/home/user01/hzz/MLIC++/utils/training.py", line 18, in train_one_epoch
out_net = model(d)
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/user01/anaconda3/envs/tcm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user01/hzz/MLIC++/models/mlicpp.py", line 93, in forward
self.update_resolutions(x.size(2) // 16, x.size(3) // 16)
File "/home/user01/hzz/MLIC++/models/mlicpp.py", line 191, in update_resolutions
self.local_context[i].update_resolution(H, W, next(self.parameters()).device, mask=None)
StopIteration
作者你能开源一下你的train_one_epoch_ddp实现吗?
The text was updated successfully, but these errors were encountered: