diff --git a/ods_ci/tests/Tests/0600__distributed_workloads/0602__training/test-run-training-stack-tests.robot b/ods_ci/tests/Tests/0600__distributed_workloads/0602__training/test-run-training-stack-tests.robot index c72735772..fadfd6a77 100644 --- a/ods_ci/tests/Tests/0600__distributed_workloads/0602__training/test-run-training-stack-tests.robot +++ b/ods_ci/tests/Tests/0600__distributed_workloads/0602__training/test-run-training-stack-tests.robot @@ -47,31 +47,54 @@ Run Training operator KFTO error handling test with AMD ROCm image ... TrainingOperator Run Training Operator KFTO Test TestPyTorchJobFailureWithROCm ${ROCM_TRAINING_IMAGE} -Run Training operator KFTO_MNIST multi-node CPU test with NVIDIA CUDA image - [Documentation] Run Go KFTO_MNIST multi-node CPU test for Training operator using PyTorch job with NVIDIA CUDA image +Run Training operator KFTO_MNIST multi-node single-CPU test with NVIDIA CUDA image + [Documentation] Run Go KFTO_MNIST multi-node single-CPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with at least 1 CPUs each [Tags] RHOAIENG-16556 ... Sanity ... DistributedWorkloads ... Training ... TrainingOperator - Run Training Operator KFTO Test TestPyTorchJobMnistCpu ${CUDA_TRAINING_IMAGE} + Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeSingleCpu ${CUDA_TRAINING_IMAGE} -Run Training operator KFTO_MNIST multi-node test with NVIDIA CUDA image - [Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with NVIDIA CUDA image +Run Training operator KFTO_MNIST multi-node multi-CPU test with NVIDIA CUDA image + [Documentation] Run Go KFTO_MNIST multi-node multi-CPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 2 CPUs each + [Tags] RHOAIENG-16556 + ... Tier1 + ... DistributedWorkloads + ... Training + ... TrainingOperator + Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeMultiCpu ${CUDA_TRAINING_IMAGE} + +Run Training operator KFTO_MNIST multi-node single-GPU test with NVIDIA CUDA image + [Documentation] Run Go KFTO_MNIST multi-node single-GPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 1 GPU each [Tags] Resources-GPU NVIDIA-GPUs ... RHOAIENG-16556 ... Tier1 ... DistributedWorkloads ... Training ... TrainingOperator - Run Training Operator KFTO Test TestPyTorchJobMnistWithCuda ${CUDA_TRAINING_IMAGE} + Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeSingleGpuWithCuda ${CUDA_TRAINING_IMAGE} -Run Training operator KFTO_MNIST multi-node test with AMD ROCm image - [Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with AMD ROCm image +Run Training operator KFTO_MNIST multi-node single-GPU test with AMD ROCm image + [Documentation] Run Go KFTO_MNIST multi-node single-GPU test for Training operator using PyTorch job with AMD ROCm image - It requires 2 cluster-nodes with 1 GPU each [Tags] Resources-GPU AMD-GPUs ROCm ... RHOAIENG-16556 ... Tier1 ... DistributedWorkloads ... Training ... TrainingOperator - Run Training Operator KFTO Test TestPyTorchJobMnistWithROCm ${ROCM_TRAINING_IMAGE} + Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeSingleGpuWithROCm ${ROCM_TRAINING_IMAGE} + +Run Training operator KFTO_MNIST multi-node multi-gpu test with NVIDIA CUDA image + [Documentation] Run Go KFTO_MNIST multi-node multi-gpu test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 2 GPUs each + [Tags] Kfto-MultiNodeMultiGpu + ... Training + ... TrainingOperator + Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeMultiGpuWithCuda ${CUDA_TRAINING_IMAGE} + +Run Training operator KFTO_MNIST multi-node multi-gpu test with AMD ROCm image + [Documentation] Run Go KFTO_MNIST multi-node multi-gpu test for Training operator using PyTorch job with AMD ROCm image - It requires 2 cluster-nodes with 2 GPUs each + [Tags] Kfto-MultiNodeMultiGpu + ... Training + ... TrainingOperator + Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeMultiGpuWithROCm ${ROCM_TRAINING_IMAGE}