Skip to content

~NGC release testing #16

~NGC release testing

~NGC release testing #16

Manually triggered April 5, 2024 03:56
Status Failure
Total duration 6h 1m 41s
Artifacts 18

ngc-release-testing.yaml

on: workflow_dispatch
Matrix: test-maxtext / maxtext-multinode
Waiting for pending jobs
Matrix: test-maxtext / single-process-multi-device
Waiting for pending jobs
Matrix: test-jax / run-unit-test
Matrix: test-levanter / run-unit-test
Waiting for pending jobs
Matrix: test-rosetta-pax / rosetta-pax-multi-node-te
Matrix: test-rosetta-pax / rosetta-pax-multi-node
Matrix: test-rosetta-pax / rosetta-pax-single-node-dropout-te
Matrix: test-rosetta-pax / single-process-evaluation-te
Matrix: test-rosetta-pax / single-process-multi-device-te
test-jax  /  ...  /  launch-slurm-runner
5h 34m
test-jax / runner / launch-slurm-runner
test-levanter  /  ...  /  launch-slurm-runner
test-levanter / runner / launch-slurm-runner
test-maxtext  /  summary
test-maxtext / summary
test-maxtext  /  metrics
test-maxtext / metrics
test-rosetta-pax  /  summary
0s
test-rosetta-pax / summary
test-rosetta-pax  /  metrics
0s
test-rosetta-pax / metrics
test-maxtext  /  ...  /  sitrep
test-maxtext / sitrep / sitrep
test-rosetta-pax  /  ...  /  sitrep
9s
test-rosetta-pax / sitrep / sitrep
test-maxtext  /  outcome
test-maxtext / outcome
test-rosetta-pax  /  outcome
0s
test-rosetta-pax / outcome
finalize  /  workflow-badge
6s
finalize / workflow-badge
finalize  /  report
8s
finalize / report
finalize  /  upload-badge
8s
finalize / upload-badge
finalize  /  publish-badge
4s
finalize / publish-badge
Fit to window
Zoom out
Zoom in

Annotations

9 errors
test-jax / jax-V100-unit-test
Process completed with exit code 1.
test-jax / jax-A100-unit-test
Process completed with exit code 1.
test-rosetta-pax / single-process-multi-device-te (1, 1, 2, 4)
The job running on runner GitHub Actions 142 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / single-process-multi-device-te (1, 1, 2, 4)
The operation was canceled.
test-rosetta-pax / rosetta-pax-multi-node-te (16DP1FSDP1TP1PP_TE, 1, 16, 1, 1, 4)
The job running on runner GitHub Actions 447 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-multi-node-te (5B_fused_attn_1, 1, 1, 8, 1, 2, --model-type 5B --enable-fused-attn)
The job running on runner GitHub Actions 327 has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / outcome
Process completed with exit code 1.

Artifacts

Produced during runtime
Name Size
artifact-final-report Expired
550 Bytes
artifact-rosetta-pax-mgmn-test Expired
714 Bytes
artifact-workflow-metadata Expired
267 Bytes
jax-unit-test-A100 Expired
16.7 KB
jax-unit-test-V100 Expired
17.5 KB
rosetta-pax-8564753851-1DP1FSDP1TP1PP_TE Expired
88.2 KB
rosetta-pax-8564753851-1DP8FSDP1TP1PP_TE Expired
325 KB
rosetta-pax-8564753851-2DP1FSDP1TP4PP Expired
289 KB
rosetta-pax-8564753851-2DP1FSDP2TP4PP Expired
521 KB
rosetta-pax-8564753851-4DP1FSDP2TP1PP Expired
361 KB
rosetta-pax-8564753851-4DP1FSDP2TP1PP_TE Expired
322 KB
rosetta-pax-8564753851-5B_fused_attn_0 Expired
392 KB
rosetta-pax-8564753851-8DP1FSDP1TP1PP Expired
364 KB
rosetta-pax-8564753851-8DP1FSDP1TP1PP_TE Expired
329 KB
rosetta-pax-8564753851-8DP1FSDP1TP1PP_eval_TE Expired
76.3 KB
rosetta-pax-8564753851-8DP1FSDP1TP1PP_single_process_TE Expired
105 KB
rosetta-pax-8564753851-8DP_TE_dropout Expired
324 KB
rosetta-pax-8564753851-LLaMA_eval_TE Expired
224 KB