Skip to content

Actions: NVIDIA/JAX-Toolbox

NCCL on Kubernetes

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
123 workflow runs
123 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Update the dockerfile base image so that we can support NCCL
NCCL on Kubernetes #123: Pull request #1248 synchronize by Steboss
January 15, 2025 11:14 7m 10s sbosisio/cuda-dl-base
January 15, 2025 11:14 7m 10s
Update the dockerfile base image so that we can support NCCL
NCCL on Kubernetes #122: Pull request #1248 synchronize by Steboss
January 15, 2025 09:43 7m 27s sbosisio/cuda-dl-base
January 15, 2025 09:43 7m 27s
Update the dockerfile base image so that we can support NCCL
NCCL on Kubernetes #121: Pull request #1248 synchronize by Steboss
January 15, 2025 09:36 6m 39s sbosisio/cuda-dl-base
January 15, 2025 09:36 6m 39s
NCCL on Kubernetes
NCCL on Kubernetes #120: Scheduled
January 15, 2025 08:36 7m 3s main
January 15, 2025 08:36 7m 3s
Delete .git for internal repos
NCCL on Kubernetes #119: Pull request #1252 opened by DwarKapex
January 15, 2025 04:13 7m 22s 25.01-devel-clean-internal-git
January 15, 2025 04:13 7m 22s
Update ml_dtypes to the latest version after pip-finalize.
NCCL on Kubernetes #118: Pull request #1250 opened by DwarKapex
January 15, 2025 00:27 7m 18s 25.01-devel-ml-dtypes-050
January 15, 2025 00:27 7m 18s
Update the dockerfile base image so that we can support NCCL
NCCL on Kubernetes #117: Pull request #1248 synchronize by Steboss
January 14, 2025 15:33 7m 19s sbosisio/cuda-dl-base
January 14, 2025 15:33 7m 19s
Update the dockerfile base image so that we can support NCCL
NCCL on Kubernetes #115: Pull request #1248 opened by Steboss
January 14, 2025 11:42 7m 10s sbosisio/cuda-dl-base
January 14, 2025 11:42 7m 10s
NCCL on Kubernetes
NCCL on Kubernetes #114: Scheduled
January 14, 2025 08:36 7m 19s main
January 14, 2025 08:36 7m 19s
Remove V100 from test environment
NCCL on Kubernetes #113: Pull request #1238 synchronize by DwarKapex
January 13, 2025 19:45 10m 3s vkozlov/remove-v100
January 13, 2025 19:45 10m 3s
[nsys-jax] Add ratio of hidden communication time to total communication time
NCCL on Kubernetes #112: Pull request #1241 synchronize by sfvaroglu
January 13, 2025 18:47 10m 42s sevin/comm_time
January 13, 2025 18:47 10m 42s
[nsys-jax] Add ratio of hidden communication time to total communication time
NCCL on Kubernetes #111: Pull request #1241 synchronize by sfvaroglu
January 13, 2025 18:44 4m 3s sevin/comm_time
January 13, 2025 18:44 4m 3s
[nsys-jax] Add ratio of hidden communication time to total communication time
NCCL on Kubernetes #110: Pull request #1241 synchronize by sfvaroglu
January 13, 2025 18:30 7m 14s sevin/comm_time
January 13, 2025 18:30 7m 14s
[nsys-jax] Add ratio of hidden communication time to total communication time
NCCL on Kubernetes #109: Pull request #1241 synchronize by sfvaroglu
January 13, 2025 18:15 9m 27s sevin/comm_time
January 13, 2025 18:15 9m 27s
CI: run MaxText tests on AWS with NGC release candidate images
NCCL on Kubernetes #108: Pull request #1237 synchronize by olupton
January 13, 2025 17:35 7m 20s olupton/eks-maxtext-25.01
January 13, 2025 17:35 7m 20s
NCCL on Kubernetes
NCCL on Kubernetes #107: Scheduled
January 13, 2025 08:37 7m 1s main
January 13, 2025 08:37 7m 1s
Add MACE training example
NCCL on Kubernetes #106: Pull request #1192 synchronize by mariogeiger
January 12, 2025 21:26 19s mariogeiger:mace
January 12, 2025 21:26 19s
NCCL on Kubernetes
NCCL on Kubernetes #105: Scheduled
January 12, 2025 08:35 7m 18s main
January 12, 2025 08:35 7m 18s
Replace deprecated flag xla_gpu_graph_level.
NCCL on Kubernetes #104: Pull request #1244 opened by sergachev
January 11, 2025 21:56 12m 2s cuda_graph_flag
January 11, 2025 21:56 12m 2s
NCCL on Kubernetes
NCCL on Kubernetes #103: Scheduled
January 11, 2025 08:35 7m 24s main
January 11, 2025 08:35 7m 24s
Remove V100 from test environment
NCCL on Kubernetes #102: Pull request #1238 synchronize by DwarKapex
January 10, 2025 22:30 8m 48s vkozlov/remove-v100
January 10, 2025 22:30 8m 48s
Remove V100 from test environment
NCCL on Kubernetes #101: Pull request #1238 synchronize by DwarKapex
January 10, 2025 21:57 8m 2s vkozlov/remove-v100
January 10, 2025 21:57 8m 2s
Remove V100 from test environment
NCCL on Kubernetes #100: Pull request #1238 synchronize by DwarKapex
January 10, 2025 21:57 20s vkozlov/remove-v100
January 10, 2025 21:57 20s
Remove V100 from test environment
NCCL on Kubernetes #99: Pull request #1238 synchronize by DwarKapex
January 10, 2025 21:57 32s vkozlov/remove-v100
January 10, 2025 21:57 32s