You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pt_seq_length=1024
所以在transformer.py decoder逐步输出inference结果时的for循环里
for i in range(self.args.pt_seq_length):
i从0取到1023
embedding对于Position的编码应该也是0到1023
一个小的代码修改意见:
pt_seq_length=1024
所以在transformer.py decoder逐步输出inference结果时的for循环里
for i in range(self.args.pt_seq_length):
i从0取到1023
embedding对于Position的编码应该也是0到1023
但是pt_hs = self.decode(pt_seq, memory, mask, pos_embed, 'pt')这一步输入到decode函数里的pt_seq,在i=0的时候,已经长度为7(应该是有prompt),所以会存在数组越界的情况。
数组越界就报错:
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling
cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())
最后,感谢作者分享工作,学习后受益良多。
祝好~
The text was updated successfully, but these errors were encountered: