Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triton-inference-server cannot be started #129

Open
tuninger opened this issue May 29, 2024 · 0 comments
Open

triton-inference-server cannot be started #129

tuninger opened this issue May 29, 2024 · 0 comments

Comments

@tuninger
Copy link

tuninger commented May 29, 2024

NAME READY STATUS RESTARTS AGE
jupyter-notebook-server-5f785cd7c8-x8qd6 1/1 Running 0 45m
llm-playground-7d8c999487-fgmj5 1/1 Running 0 45m
milvu-etcd-7cf545456f-m8q9m 1/1 Running 0 45m
milvus-minio-7ff64c76f-4njkz 1/1 Running 0 45m
milvus-standalone-7479bf9ddd-n6s6f 1/1 Running 0 45m
query-router-65c6f864ff-fstkb 1/1 Running 0 45m
triton-inference-server-7cd84c8f4b-wzsk9 0/1 CrashLoopBackOff 8 (18s ago) 23m

[triton-inference-server-7cd84c8f4b-wzsk9:30 :0:30] Caught signal 7 (Bus error: nonexistent physical address)
backtrace (tid: 30)
0 0x0000000000042520 __sigaction() ???:0
1 0x000000000001678b uct_iface_mp_chunk_alloc_inner() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/base/uct_mem.c:469
2 0x000000000001678b uct_iface_mp_chunk_alloc() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/base/uct_mem.c:443
3 0x000000000005407b ucs_mpool_grow() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucs/datastruct/mpool.c:266
4 0x00000000000542c9 ucs_mpool_get_grow() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucs/datastruct/mpool.c:312
5 0x000000000001b488 uct_mm_iface_t_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/sm/mm/base/mm_iface.c:822
6 0x000000000001b9f2 uct_mm_iface_t_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/sm/mm/base/mm_iface.c:720
7 0x0000000000014f02 uct_iface_open() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/uct/base/uct_md.c:284
8 0x000000000004a017 ucp_worker_iface_open() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucp/core/ucp_worker.c:1357
9 0x000000000004afe0 ucp_worker_add_resource_ifaces() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucp/core/ucp_worker.c:1101
10 0x000000000004d2db ucp_worker_create() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ucx-d799cfdd2293cc72206aa8188deb8e7d22c82c9f/src/ucp/core/ucp_worker.c:2441
11 0x000000000000702f mca_pml_ucx_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mca/pml/ucx/pml_ucx.c:306
12 0x00000000000093a5 mca_pml_ucx_component_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mca/pml/ucx/pml_ucx_component.c:136
13 0x00000000000c7022 mca_pml_base_select() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mca/pml/base/pml_base_select.c:127
14 0x00000000000d01c9 ompi_mpi_init() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/runtime/ompi_mpi_init.c:647
15 0x0000000000075899 PMPI_Init_thread() /build-result/src/hpcx-v2.15-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.17-x86_64/ompi-5980bac63337537bf34556f95ee7511778de44a5/ompi/mpi/c/profile/pinit_thread.c:69
16 0x00000000000327a8 __pyx_f_6mpi4py_3MPI_bootstrap() /tmp/pip-install-05lukizf/mpi4py_8cc4cad65d414a8995a9d1c890fac173/src/mpi4py.MPI.c:8115
17 0x00000000000327a8 __pyx_pymod_exec_MPI() /tmp/pip-install-05lukizf/mpi4py_8cc4cad65d414a8995a9d1c890fac173/src/mpi4py.MPI.c:176976
18 0x000000000023b2d3 PyModule_ExecDef() ???:0
19 0x000000000023bda0 PyInit__thread() ???:0
20 0x000000000015f854 PyObject_GenericGetAttr() ???:0
21 0x000000000014b2c1 _PyEval_EvalFrameDefault() ???:0
22 0x000000000016070c _PyFunction_Vectorcall() ???:0
23 0x000000000014e8a2 _PyEval_EvalFrameDefault() ???:0
24 0x000000000016070c _PyFunction_Vectorcall() ???:0
25 0x0000000000148f52 _PyEval_EvalFrameDefault() ???:0
26 0x000000000016070c _PyFunction_Vectorcall() ???:0
27 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0
28 0x000000000016070c _PyFunction_Vectorcall() ???:0
29 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0
30 0x000000000016070c _PyFunction_Vectorcall() ???:0
31 0x000000000015fb24 PyObject_CallFunctionObjArgs() ???:0
32 0x000000000023f4af _PyObject_CallMethodIdObjArgs() ???:0
33 0x00000000001740ca PyImport_ImportModuleLevelObject() ???:0
34 0x0000000000184458 PyImport_Import() ???:0
35 0x000000000015fe0e PyObject_CallFunctionObjArgs() ???:0
36 0x000000000016f12b PyObject_Call() ???:0
37 0x000000000014b2c1 _PyEval_EvalFrameDefault() ???:0
38 0x000000000016070c _PyFunction_Vectorcall() ???:0
39 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0
40 0x000000000016070c _PyFunction_Vectorcall() ???:0
41 0x000000000015fb24 PyObject_CallFunctionObjArgs() ???:0
42 0x000000000023f4af _PyObject_CallMethodIdObjArgs() ???:0
43 0x0000000000174cda PyImport_ImportModuleLevelObject() ???:0
44 0x000000000014b9e5 _PyEval_EvalFrameDefault() ???:0
45 0x0000000000239e56 PyEval_EvalCode() ???:0
46 0x0000000000239cf6 PyEval_EvalCode() ???:0
47 0x000000000023fb0d PyFrozenSet_New() ???:0
48 0x0000000000160969 PyCell_New() ???:0
49 0x000000000014b2c1 _PyEval_EvalFrameDefault() ???:0
50 0x000000000016070c _PyFunction_Vectorcall() ???:0
51 0x000000000014e8a2 _PyEval_EvalFrameDefault() ???:0
52 0x000000000016070c _PyFunction_Vectorcall() ???:0
53 0x0000000000148f52 _PyEval_EvalFrameDefault() ???:0
54 0x000000000016070c _PyFunction_Vectorcall() ???:0
55 0x0000000000148e0d _PyEval_EvalFrameDefault() ???:0
56 0x000000000016070c _PyFunction_Vectorcall() ???:0

[triton-inference-server-7cd84c8f4b-wzsk9:00030] *** Process received signal ***
[triton-inference-server-7cd84c8f4b-wzsk9:00030] Signal: Bus error (7)
[triton-inference-server-7cd84c8f4b-wzsk9:00030] Signal code: (-6)
[triton-inference-server-7cd84c8f4b-wzsk9:00030] Failing at address: 0x1e
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f9d7caa7520]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 1] /opt/hpcx/ucx/lib/libuct.so.0(uct_iface_mp_chunk_alloc+0x7b)[0x7f9d3689178b]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 2] /opt/hpcx/ucx/lib/libucs.so.0(ucs_mpool_grow+0x7b)[0x7f9d3691607b]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 3] /opt/hpcx/ucx/lib/libucs.so.0(ucs_mpool_get_grow+0x19)[0x7f9d369162c9]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 4] /opt/hpcx/ucx/lib/libuct.so.0(+0x1b488)[0x7f9d36896488]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 5] /opt/hpcx/ucx/lib/libuct.so.0(uct_mm_iface_t_new+0xb2)[0x7f9d368969f2]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 6] /opt/hpcx/ucx/lib/libuct.so.0(uct_iface_open+0xe2)[0x7f9d3688ff02]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 7] /opt/hpcx/ucx/lib/libucp.so.0(ucp_worker_iface_open+0x317)[0x7f9d36a93017]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 8] /opt/hpcx/ucx/lib/libucp.so.0(+0x4afe0)[0x7f9d36a93fe0]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [ 9] /opt/hpcx/ucx/lib/libucp.so.0(ucp_worker_create+0x7cb)[0x7f9d36a962db]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [10] /opt/hpcx/ompi/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_init+0x9f)[0x7f9d36b2f02f]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [11] /opt/hpcx/ompi/lib/openmpi/mca_pml_ucx.so(+0x93a5)[0x7f9d36b313a5]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [12] /opt/hpcx/ompi/lib/libmpi.so.40(mca_pml_base_select+0x1e2)[0x7f9c1bc35022]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [13] /opt/hpcx/ompi/lib/libmpi.so.40(ompi_mpi_init+0x6c9)[0x7f9c1bc3e1c9]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [14] /opt/hpcx/ompi/lib/libmpi.so.40(PMPI_Init_thread+0x79)[0x7f9c1bbe3899]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [15] /usr/local/lib/python3.10/dist-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x327a8)[0x7f9c1bcbf7a8]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [16] /usr/bin/python3(PyModule_ExecDef+0x73)[0x55f3c471e2d3]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [17] /usr/bin/python3(+0x23bda0)[0x55f3c471eda0]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [18] /usr/bin/python3(+0x15f854)[0x55f3c4642854]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [19] /usr/bin/python3(_PyEval_EvalFrameDefault+0x2b71)[0x55f3c462e2c1]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [20] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [21] /usr/bin/python3(_PyEval_EvalFrameDefault+0x6152)[0x55f3c46318a2]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [22] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [23] /usr/bin/python3(_PyEval_EvalFrameDefault+0x802)[0x55f3c462bf52]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [24] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [25] /usr/bin/python3(_PyEval_EvalFrameDefault+0x6bd)[0x55f3c462be0d]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [26] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [27] /usr/bin/python3(_PyEval_EvalFrameDefault+0x6bd)[0x55f3c462be0d]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [28] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55f3c464370c]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] [29] /usr/bin/python3(+0x15fb24)[0x55f3c4642b24]
[triton-inference-server-7cd84c8f4b-wzsk9:00030] *** End of error message ***
[23] May 29 04:16:21 [ ERROR] - main - TensorRT conversion returned a non-zero exit code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant