-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update KFTO multi-node test names according to recent updates in orig… #2164
base: master
Are you sure you want to change the base?
Update KFTO multi-node test names according to recent updates in orig… #2164
Conversation
@@ -48,30 +48,30 @@ | |||
Run Training Operator KFTO Test TestPyTorchJobFailureWithROCm ${ROCM_TRAINING_IMAGE} | |||
|
|||
Run Training operator KFTO_MNIST multi-node CPU test with NVIDIA CUDA image | |||
[Documentation] Run Go KFTO_MNIST multi-node CPU test for Training operator using PyTorch job with NVIDIA CUDA image | |||
[Documentation] Run Go KFTO_MNIST multi-node CPU test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 2 CPUs each |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
|
||
Run Training operator KFTO_MNIST multi-node test with NVIDIA CUDA image | ||
[Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with NVIDIA CUDA image | ||
[Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 1 GPUs each |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
|
||
Run Training operator KFTO_MNIST multi-node test with AMD ROCm image | ||
[Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with AMD ROCm image | ||
[Documentation] Run Go KFTO_MNIST multi-node test for Training operator using PyTorch job with AMD ROCm image - It requires 2 cluster-nodes with 1 GPUs each |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
Robot Results
|
What about other 2 test scenarios |
@ChughShilpa Actually the remaining MultiNode/MultiGPUs tests requires 2 cluster-nodes with minimum 2 GPUs each (GPU instance like g4dn.12xlarge - A100 GPUs), which I'm not sure whether will be available during QG tests.. |
We can add the tests to ODS CI, just we can't run them as part of QG, only as part of our own jobs. |
g4dn.12xlarge instance is used in qe-jenkins, and we also have |
Quality Gate passedIssues Measures |
Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeWithROCm ${ROCM_TRAINING_IMAGE} | ||
|
||
Run Training operator KFTO_MNIST multi-node multi-gpu test with NVIDIA CUDA image | ||
[Documentation] Run Go KFTO_MNIST multi-node multi-gpu test for Training operator using PyTorch job with NVIDIA CUDA image - It requires 2 cluster-nodes with 2 GPUs each |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
Run Training Operator KFTO Test TestPyTorchJobMnistMultiNodeMultiGpuWithCuda ${CUDA_TRAINING_IMAGE} | ||
|
||
Run Training operator KFTO_MNIST multi-node multi-gpu test with AMD ROCm image | ||
[Documentation] Run Go KFTO_MNIST multi-node multi-gpu test for Training operator using PyTorch job with AMD ROCm image - It requires 2 cluster-nodes with 2 GPUs each |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: abhijeet-dhumal, sutaakar The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Update KFTO multi-node test names according to recent updates in original test names
Related to : opendatahub-io/distributed-workloads#299