Upgrade seqio to upstream to workaround the MaxText dependency issue for inference #59
nccl-k8s.yaml
on: pull_request
build-mpi-operator-compatible-base
/
build-mpi-operator-compatible-base
1m 44s
Matrix: nccl-test
Annotations
6 errors
nccl-test (broadcast_perf_mpi)
The self-hosted runner: eks-mfpzq-runner-bgvwp lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|
nccl-test (broadcast_perf_mpi)
The operation was canceled.
|
nccl-test (all_gather_perf_mpi)
The job was canceled because "all_reduce_perf_mpi" failed.
|
nccl-test (reduce_scatter_perf_mpi)
The self-hosted runner: eks-mfpzq-runner-2k4pl lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|
nccl-test (reduce_scatter_perf_mpi)
The operation was canceled.
|
nccl-test (all_reduce_perf_mpi)
The job was canceled because "broadcast_perf_mpi" failed.
|