NCCL on Kubernetes #58
nccl-k8s.yaml
on: schedule
build-mpi-operator-compatible-base
/
build-mpi-operator-compatible-base
1m 43s
Matrix: nccl-test
Annotations
4 errors and 1 warning
nccl-test (all_gather_perf_mpi)
The self-hosted runner: eks-mfpzq-runner-5g6p6 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|
nccl-test (broadcast_perf_mpi)
The self-hosted runner: eks-mfpzq-runner-khdp5 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|
nccl-test (reduce_scatter_perf_mpi)
The self-hosted runner: eks-mfpzq-runner-f6mxn lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|
nccl-test (all_reduce_perf_mpi)
The job was canceled because "broadcast_perf_mpi" failed.
|
nccl-test (all_reduce_perf_mpi)
Runner eks-mfpzq-runner-spklg did not respond to a cancelation request with 00:05:00.
|
Artifacts
Produced during runtime
Name | Size | |
---|---|---|
artifact-mpi-operator-compatible-base-build-amd64
|
638 Bytes |
|