Skip to content

Latest commit

 

History

History
98 lines (84 loc) · 3.05 KB

kueue.md

File metadata and controls

98 lines (84 loc) · 3.05 KB

Kueue + InstaSlice Integration Demo

The following setup uses a Kind cluster with fake MIG-enabled GPUs and InstaSlice running in emulator mode to confirm that InstaSlice allocates MIG slices for queued pods only once admitted by Kueue.

Create a Kind cluster:

kind create cluster

Deploy cert manager:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.3/cert-manager.yaml

Wait for cert manager to be ready.

Deploy InstaSlice in emulator mode using Kueue-enabled images:

IMG=quay.io/tardieu/instaslicev2-controller:kueue IMG_DMST=quay.io/tardieu/instaslicev2-daemonset:kueue make deploy-emulated

Wait for InstaSlice to be ready.

Add fake GPU capacity to the cluster:

kubectl apply -f test/e2e/resources/instaslice-fake-capacity.yaml

Deploy Kueue v0.8.1:

kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.8.1/manifests.yaml
kubectl patch cm -n kueue-system kueue-manager-config --patch-file docs/kueue/kueue-manager-config.yaml
kubectl rollout restart -n kueue-system deployment kueue-controller-manager

The provided kueue-manager-config.yaml enables the optional, opt-in pod integration and adds instaslice.redhat.com/ and instaslice.redhat.com/accelerator-memory-quota to Kueue's excludeResourcePrefixes.

Wait for Kueue to be ready.

Configure a default flavor, a cluster queue, and a local queue in the default namespace with quota of 3 nvidia.com/mig-1g.5gb slices:

kubectl apply -f docs/kueue/sample-queues.yaml

Queue 7 pods:

kubectl apply -f docs/kueue/sample-pods.yaml

Check that at most 3 pods are running at a time:

kubectl get pods
NAME   READY   STATUS            RESTARTS   AGE
p1     0/1     SchedulingGated   0          15s
p2     1/1     Running           0          15s
p3     1/1     Running           0          15s
p4     1/1     Running           0          15s
p5     0/1     SchedulingGated   0          15s
p6     0/1     SchedulingGated   0          15s
p7     0/1     SchedulingGated   0          15s

Confirm that InstaSlice does not create 7 slices ahead of time:

kubectl get node kind-control-plane -o json | jq .status.capacity
{
  "cpu": "8",
  "ephemeral-storage": "102625208Ki",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "hugepages-32Mi": "0",
  "hugepages-64Ki": "0",
  "memory": "16351912Ki",
  "instaslice.redhat.com/accelerator-memory-quota": "80Gi",
  "nvidia.com/mig-1g.5gb": "3",
  "instaslice.redhat.com/358bb6d7-b65b-4a0c-9585-2567c1ce89e2": "1",
  "instaslice.redhat.com/358d2198-eab4-4ac8-9e25-5c7b67187dac": "1",
  "instaslice.redhat.com/79fcac9e-3be1-4fc2-892c-78238c2c405c": "1",
  "instaslice.redhat.com/99ba54ca-dfcd-4942-a770-6e144d69fd9b": "1",
  "pods": "110"
}

To cleanup, delete the Kind cluster:

kind delete cluster