Update KFTO multi-node test names according to recent updates in orig… #2164

abhijeet-dhumal · 2025-01-09T06:50:37Z

Update KFTO multi-node test names according to recent updates in original test names

Related to : opendatahub-io/distributed-workloads#299

…inal test names

...i/tests/Tests/0600__distributed_workloads/0602__training/test-run-training-stack-tests.robot

@@ -48,30 +48,30 @@
    Run Training Operator KFTO Test    TestPyTorchJobFailureWithROCm    ${ROCM_TRAINING_IMAGE}

 Run Training operator KFTO_MNIST multi-node CPU test with NVIDIA CUDA image
-    [Documentation]    Run Go KFTO_MNIST multi-node CPU test for Training operator using PyTorch job with NVIDIA CUDA image
+    [Documentation]    Run Go KFTO_MNIST multi-node CPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 2 CPUs each


...i/tests/Tests/0600__distributed_workloads/0602__training/test-run-training-stack-tests.robot


 Run Training operator KFTO_MNIST multi-node test with NVIDIA CUDA image
-    [Documentation]    Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with NVIDIA CUDA image
+    [Documentation]    Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 1 GPUs each


...i/tests/Tests/0600__distributed_workloads/0602__training/test-run-training-stack-tests.robot


 Run Training operator KFTO_MNIST multi-node test with AMD ROCm image
-    [Documentation]    Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with AMD ROCm image
+    [Documentation]    Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with AMD ROCm image  - It requires 2 cluster-nodes with 1 GPUs each


github-actions · 2025-01-09T06:55:25Z

Robot Results

✅ Passed	❌ Failed	⏭️ Skipped	Total	Pass %
594	0	0	594	100

ChughShilpa · 2025-01-09T07:00:26Z

What about other 2 test scenarios
TestPyTorchJobMnistMultiNodeMultiGpuWithCuda and TestPyTorchJobMnistMultiNodeMultiGpuWithROCm ?
Will you add it in another PR ?

abhijeet-dhumal · 2025-01-09T07:17:43Z

What about other 2 test scenarios TestPyTorchJobMnistMultiNodeMultiGpuWithCuda and TestPyTorchJobMnistMultiNodeMultiGpuWithROCm ? Will you add it in another PR ?

@ChughShilpa Actually the remaining MultiNode/MultiGPUs tests requires 2 cluster-nodes with minimum 2 GPUs each (GPU instance like g4dn.12xlarge - A100 GPUs), which I'm not sure whether will be available during QG tests..
Even after this pre-requisite, is it ok to add these tests here?
cc: @sutaakar

sutaakar · 2025-01-09T08:27:14Z

We can add the tests to ODS CI, just we can't run them as part of QG, only as part of our own jobs.

ChughShilpa · 2025-01-09T08:35:57Z

What about other 2 test scenarios TestPyTorchJobMnistMultiNodeMultiGpuWithCuda and TestPyTorchJobMnistMultiNodeMultiGpuWithROCm ? Will you add it in another PR ?

@ChughShilpa Actually the remaining MultiNode/MultiGPUs tests requires 2 cluster-nodes with minimum 2 GPUs each (GPU instance like g4dn.12xlarge - A100 GPUs), which I'm not sure whether will be available during QG tests.. Even after this pre-requisite, is it ok to add these tests here? cc: @sutaakar

g4dn.12xlarge instance is used in qe-jenkins, and we also have Resources-2GPUS tag and can be used for this requirement, the only thing is we might need to inform the devtestops team for this

…d NVIDIA Cuda

sonarqubecloud · 2025-01-09T12:49:47Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

...i/tests/Tests/0600__distributed_workloads/0602__training/test-run-training-stack-tests.robot

+    Run Training Operator KFTO Test    TestPyTorchJobMnistMultiNodeWithROCm    ${ROCM_TRAINING_IMAGE}
+
+Run Training operator KFTO_MNIST multi-node multi-gpu test with NVIDIA CUDA image
+    [Documentation]    Run Go KFTO_MNIST multi-node multi-gpu test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 2 GPUs each


...i/tests/Tests/0600__distributed_workloads/0602__training/test-run-training-stack-tests.robot

+    Run Training Operator KFTO Test    TestPyTorchJobMnistMultiNodeMultiGpuWithCuda    ${CUDA_TRAINING_IMAGE}
+
+Run Training operator KFTO_MNIST multi-node multi-gpu test with AMD ROCm image
+    [Documentation]    Run Go KFTO_MNIST multi-node multi-gpu test for Training operator using PyTorch job with AMD ROCm image  - It requires 2 cluster-nodes with 2 GPUs each


openshift-ci · 2025-01-09T13:08:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: abhijeet-dhumal, sutaakar

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Update KFTO multi-node test names according to recent updates in orig…

1c4c2be

…inal test names

abhijeet-dhumal requested review from sutaakar and ChughShilpa January 9, 2025 06:50

github-advanced-security bot found potential problems Jan 9, 2025

View reviewed changes

Add KFTO pytorch multi-node multi-gpu tests for GPUs with AMD ROCm an…

d8d75d4

…d NVIDIA Cuda

github-advanced-security bot found potential problems Jan 9, 2025

View reviewed changes

sutaakar approved these changes Jan 9, 2025

View reviewed changes

openshift-ci bot assigned sutaakar Jan 9, 2025

openshift-ci bot added the lgtm label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update KFTO multi-node test names according to recent updates in orig… #2164

Update KFTO multi-node test names according to recent updates in orig… #2164

abhijeet-dhumal commented Jan 9, 2025

github-actions bot commented Jan 9, 2025 •

edited

Loading

ChughShilpa commented Jan 9, 2025

abhijeet-dhumal commented Jan 9, 2025 •

edited

Loading

sutaakar commented Jan 9, 2025

ChughShilpa commented Jan 9, 2025

sonarqubecloud bot commented Jan 9, 2025

openshift-ci bot commented Jan 9, 2025

Update KFTO multi-node test names according to recent updates in orig… #2164

Are you sure you want to change the base?

Update KFTO multi-node test names according to recent updates in orig… #2164

Conversation

abhijeet-dhumal commented Jan 9, 2025

github-actions bot commented Jan 9, 2025 • edited Loading

Robot Results

ChughShilpa commented Jan 9, 2025

abhijeet-dhumal commented Jan 9, 2025 • edited Loading

sutaakar commented Jan 9, 2025

ChughShilpa commented Jan 9, 2025

sonarqubecloud bot commented Jan 9, 2025

Quality Gate passed

openshift-ci bot commented Jan 9, 2025

github-actions bot commented Jan 9, 2025 •

edited

Loading

abhijeet-dhumal commented Jan 9, 2025 •

edited

Loading