We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug While running the classification on marie-3.0.30 we fail with following exception
marie-3.0.30
UserWarning: Plan failed with a CuDNNError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED Exception raised from run_conv_plan at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:374 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f8b260d0897 in /opt/venv/lib/python3.10/site-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0xdf1dcb (0x7f8ad5ec0dcb in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #2: <unknown function> + 0x106b1d7 (0x7f8ad613a1d7 in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #3: <unknown function> + 0x106b84b (0x7f8ad613a84b in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #4: <unknown function> + 0x104e2f2 (0x7f8ad611d2f2 in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #5: at::native::cudnn_convolution(at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool, bool) + 0x53f (0x7f8ad611dcbf in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #6: <unknown function> + 0x32c1b6e (0x7f8ad8390b6e in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #7: <unknown function> + 0x32d9321 (0x7f8ad83a8321 in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #8: at::_ops::cudnn_convolution::call(at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::SymInt, bool, bool, bool) + 0x2bb (0x7f8b0f7b6c2b in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #9: at::native::_convolution(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long, bool, bool, bool, bool) + 0x13cb (0x7f8b0e9f180b in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #10: <unknown function> + 0x2e0089f (0x7f8b0fb7f89f in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #11: <unknown function> + 0x2e071fc (0x7f8b0fb861fc in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #12: at::_ops::_convolution::call(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, bool, c10::ArrayRef<c10::SymInt>, c10::SymInt, bool, bool, bool, bool) + 0x344 (0x7f8b0f2c86f4 in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #13: at::native::convolution(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long) + 0x3b8 (0x7f8b0e9e4e88 in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #14: <unknown function> + 0x2e0013c (0x7f8b0fb7f13c in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #15: <unknown function> + 0x2e07068 (0x7f8b0fb86068 in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #16: at::_ops::convolution::call(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, bool, c10::ArrayRef<c10::SymInt>, c10::SymInt) + 0x2d4 (0x7f8b0f2c74f4 in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #17: <unknown function> + 0x19bd900 (0x7f8b0e73c900 in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #18: at::native::conv2d_symint(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::SymInt) + 0x16b (0x7f8b0e9e876b in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #19: <unknown function> + 0x2ff96c3 (0x7f8b0fd786c3 in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #20: <unknown function> + 0x2ff995d (0x7f8b0fd7895d in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #21: at::_ops::conv2d::call(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::SymInt) + 0x26e (0x7f8b0f8eb95e in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so) frame #22: <unknown function> + 0x68541d (0x7f8b24d7e41d in /opt/venv/lib/python3.10/site-packages/torch/lib/libtorch_python.so) frame #23: <unknown function> + 0x15a10e (0x56325ed7c10e in /opt/venv/bin/python3) frame #24: _PyObject_MakeTpCall + 0x25b (0x56325ed72a7b in /opt/venv/bin/python3) frame #25: _PyEval_EvalFrameDefault + 0x6a79 (0x56325ed6b629 in /opt/venv/bin/python3) frame #26: <unknown function> + 0x16893e (0x56325ed8a93e in /opt/venv/bin/python3) frame #27: _PyEval_EvalFrameDefault + 0x2a27 (0x56325ed675d7 in /opt/venv/bin/python3) frame #28: <unknown function> + 0x16893e (0x56325ed8a93e in /opt/venv/bin/python3) frame #29: _PyEval_EvalFrameDefault + 0x2a27 (0x56325ed675d7 in /opt/venv/bin/python3) frame #30: _PyObject_FastCallDictTstate + 0xc4 (0x56325ed71c14 in /opt/venv/bin/python3) frame #31: _PyObject_Call_Prepend + 0x5c (0x56325ed8786c in /opt/venv/bin/python3) frame #32: <unknown function> + 0x280700 (0x56325eea2700 in /opt/venv/bin/python3) frame #33: _PyObject_MakeTpCall + 0x25b (0x56325ed72a7b in /opt/venv/bin/python3) frame #34: _PyEval_EvalFrameDefault + 0x64e6 (0x56325ed6b096 in /opt/venv/bin/python3) frame #35: <unknown function> + 0x16893e (0x56325ed8a93e in /opt/venv/bin/python3) frame #36: _PyEval_EvalFrameDefault + 0x2a27 (0x56325ed675d7 in /opt/venv/bin/python3) frame #37: <unknown function> + 0x16893e (0x56325ed8a93e in /opt/venv/bin/python3) frame #38: _PyEval_EvalFrameDefault + 0x2a27 (0x56325ed675d7 in /opt/venv/bin/python3) frame #39: _PyObject_FastCallDictTstate + 0xc4 (0x56325ed71c14 in /opt/venv/bin/python3) frame #40: _PyObject_Call_Prepend + 0x5c (0x56325ed8786c in /opt/venv/bin/python3) frame #41: <unknown function> + 0x280700 (0x56325eea2700 in /opt/venv/bin/python3) frame #42: _PyObject_MakeTpCall + 0x25b (0x56325ed72a7b in /opt/venv/bin/python3) frame #43: _PyEval_EvalFrameDefault + 0x6a79 (0x56325ed6b629 in /opt/venv/bin/python3) frame #44: <unknown function> + 0x1687f1 (0x56325ed8a7f1 in /opt/venv/bin/python3) frame #45: _PyEval_EvalFrameDefault + 0x614a (0x56325ed6acfa in /opt/venv/bin/python3) frame #46: <unknown function> + 0x16893e (0x56325ed8a93e in /opt/venv/bin/python3) frame #47: _PyEval_EvalFrameDefault + 0x2a27 (0x56325ed675d7 in /opt/venv/bin/python3) frame #48: <unknown function> + 0x16893e (0x56325ed8a93e in /opt/venv/bin/python3) frame #49: _PyEval_EvalFrameDefault + 0x2a27 (0x56325ed675d7 in /opt/venv/bin/python3) frame #50: _PyObject_FastCallDictTstate + 0xc4 (0x56325ed71c14 in /opt/venv/bin/python3) frame #51: _PyObject_Call_Prepend + 0x5c (0x56325ed8786c in /opt/venv/bin/python3) frame #52: <unknown function> + 0x280700 (0x56325eea2700 in /opt/venv/bin/python3) frame #53: _PyObject_MakeTpCall + 0x25b (0x56325ed72a7b in /opt/venv/bin/python3) frame #54: _PyEval_EvalFrameDefault + 0x6a79 (0x56325ed6b629 in /opt/venv/bin/python3) frame #55: <unknown function> + 0x1687f1 (0x56325ed8a7f1 in /opt/venv/bin/python3) frame #56: _PyEval_EvalFrameDefault + 0x198c (0x56325ed6653c in /opt/venv/bin/python3) frame #57: _PyObject_FastCallDictTstate + 0xc4 (0x56325ed71c14 in /opt/venv/bin/python3) frame #58: _PyObject_Call_Prepend + 0x5c (0x56325ed8786c in /opt/venv/bin/python3) frame #59: <unknown function> + 0x280700 (0x56325eea2700 in /opt/venv/bin/python3) frame #60: _PyObject_MakeTpCall + 0x25b (0x56325ed72a7b in /opt/venv/bin/python3) frame #61: _PyEval_EvalFrameDefault + 0x6a79 (0x56325ed6b629 in /opt/venv/bin/python3) frame #62: _PyFunction_Vectorcall + 0x7c (0x56325ed7c9fc in /opt/venv/bin/python3) frame #63: _PyEval_EvalFrameDefault + 0x8ac (0x56325ed6545c in /opt/venv/bin/python3) (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:855.) (raised from /opt/venv/lib/python3.10/site-packages/detectron2/layers/wrappers.py:142) ERROR marie@37 Error in PSM_SPARSE_STEP : FIND was unable to find an engine to execute this computation after trying 6 plans. [10/03/24 16:28:30] ERROR marie@37 Error in PSM_SPARSE_STEP : CUDA error: unspecified launch failure CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. INFO marie@37 Refinement step 1 : No change in image ERROR marie@37 CUDA error: unspecified launch failure [10/03/24 16:28:30] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. ERROR marie@37 Extract error Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/marie/boxes/dit/ulim_dit_box_processor.py", line 811, in extract_bounding_boxes raise ex File "/opt/venv/lib/python3.10/site-packages/marie/boxes/dit/ulim_dit_box_processor.py", line 692, in extract_bounding_boxes bboxes, polys, scores, lines_bboxes, classes = self.psm_sparse( File "/opt/venv/lib/python3.10/site-packages/marie/boxes/dit/ulim_dit_box_processor.py", line 640, in psm_sparse torch_gc() File "/opt/venv/lib/python3.10/site-packages/marie/models/utils.py", line 106, in torch_gc torch.cuda.empty_cache() File "/opt/venv/lib/python3.10/site-packages/torch/cuda/memory.py", line 162, in empty_cache torch._C._cuda_emptyCache() RuntimeError: CUDA error: unspecified launch failure CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Describe the bug
While running the classification on
marie-3.0.30
we fail with following exceptionThe text was updated successfully, but these errors were encountered: