The following setup uses a Kind cluster with fake MIG-enabled GPUs and InstaSlice running in emulator mode to confirm that InstaSlice allocates MIG slices for queued pods only once admitted by Kueue.
Create a Kind cluster:
kind create cluster
Deploy cert manager:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.3/cert-manager.yaml
Wait for cert manager to be ready.
Deploy InstaSlice in emulator mode using Kueue-enabled images:
IMG=quay.io/tardieu/instaslicev2-controller:kueue IMG_DMST=quay.io/tardieu/instaslicev2-daemonset:kueue make deploy-emulated
Wait for InstaSlice to be ready.
Add fake GPU capacity to the cluster:
kubectl apply -f test/e2e/resources/instaslice-fake-capacity.yaml
Deploy Kueue v0.8.1:
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.8.1/manifests.yaml
kubectl patch cm -n kueue-system kueue-manager-config --patch-file docs/kueue/kueue-manager-config.yaml
kubectl rollout restart -n kueue-system deployment kueue-controller-manager
The provided kueue-manager-config.yaml
enables the optional, opt-in pod
integration and adds
instaslice.redhat.com/
and instaslice.redhat.com/accelerator-memory-quota
to Kueue's
excludeResourcePrefixes.
Wait for Kueue to be ready.
Configure a default flavor, a cluster queue, and a local queue in the default
namespace with quota of 3 nvidia.com/mig-1g.5gb
slices:
kubectl apply -f docs/kueue/sample-queues.yaml
Queue 7 pods:
kubectl apply -f docs/kueue/sample-pods.yaml
Check that at most 3 pods are running at a time:
kubectl get pods
NAME READY STATUS RESTARTS AGE
p1 0/1 SchedulingGated 0 15s
p2 1/1 Running 0 15s
p3 1/1 Running 0 15s
p4 1/1 Running 0 15s
p5 0/1 SchedulingGated 0 15s
p6 0/1 SchedulingGated 0 15s
p7 0/1 SchedulingGated 0 15s
Confirm that InstaSlice does not create 7 slices ahead of time:
kubectl get node kind-control-plane -o json | jq .status.capacity
{
"cpu": "8",
"ephemeral-storage": "102625208Ki",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"hugepages-32Mi": "0",
"hugepages-64Ki": "0",
"memory": "16351912Ki",
"instaslice.redhat.com/accelerator-memory-quota": "80Gi",
"nvidia.com/mig-1g.5gb": "3",
"instaslice.redhat.com/358bb6d7-b65b-4a0c-9585-2567c1ce89e2": "1",
"instaslice.redhat.com/358d2198-eab4-4ac8-9e25-5c7b67187dac": "1",
"instaslice.redhat.com/79fcac9e-3be1-4fc2-892c-78238c2c405c": "1",
"instaslice.redhat.com/99ba54ca-dfcd-4942-a770-6e144d69fd9b": "1",
"pods": "110"
}
To cleanup, delete the Kind cluster:
kind delete cluster