Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: device-side assert triggered when "max_c_len" is setted to 1000 (bigger than default value 384) #6

Open
zhanhl316 opened this issue Oct 19, 2020 · 1 comment

Comments

@zhanhl316
Copy link

zhanhl316 commented Oct 19, 2020

/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed.
Epoch: 55%|?????????????????????????????????????????????????????????????????????????? | 11/20 [00:22<00:18, 2.07s/it]
Traceback (most recent call last):
File "main.py", line 136, in
main(args)
File "main.py", line 51, in main
metric_dict, bleu, rouge_1, rouge_2, _ = eval_vae(epoch, args, trainer, eval_data)
File "/home/codes/Info-HCVAE/vae/eval.py", line 74, in eval_vae
posterior_z_prob = trainer.generate_posterior(c_ids, q_ids, a_ids)
File "/home/codes/Info-HCVAE/vae/trainer.py", line 49, in generate_posterior
_, _, zq, _, za = self.vae.posterior_encoder(c_ids, q_ids, a_ids)
File "/home/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/codes/Info-HCVAE/vae/models.py", line 199, in forward
c_hs, c_state = self.encoder(c_embeddings, c_lengths)
File "/home/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/codes/Info-HCVAE/vae/models.py", line 146, in forward
batch_first=True, enforce_sorted=False)
File "/home/.local/lib/python3.7/site-packages/torch/nn/utils/rnn.py", line 223, in pack_padded_sequence
lengths = torch.as_tensor(lengths, dtype=torch.int64)
RuntimeError: CUDA error: device-side assert triggered

Do you have any idea about this error ? Thank you! The only changing value is "max_c_len" (from 384(default) to 1000). It seems that this error is triggered by the increasing of "max_c_len"

@seanie12
Copy link
Owner

The upper bound of max_c_len should be 512, because we use pretrained BERT for the encoders.

Since BERT has positional embedding up to the length 512, it does not work for the length strictly greater than 512.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants