Skip to content

Automate Kubernetes cluster cost optimisations

License

Notifications You must be signed in to change notification settings

hsbc/cost-manager

Repository files navigation

cost-manager

cost-manager is a Kubernetes controller manager that manages controllers to automate cost optimisations.

Controllers

Here we provide details of the various controllers supported by cost-manager.

spot-migrator

Spot VMs are unused compute capacity that many cloud providers support access to at significantly reduced costs (e.g. on GCP spot VMs provide a 60-91% discount). Since spot VM availability can fluctuate it is common to configure workloads to be able to run on spot VMs but to allow fallback to on-demand VMs if spot VMs are unavailable. However, even if spot VMs are available, if workloads are already running on on-demand VMs there is no reason for them to migrate.

To improve spot VM utilisation, spot-migrator periodically attempts to migrate workloads from on-demand VMs to spot VMs by draining on-demand Nodes to force cluster scale up, relying on the fact that the cluster autoscaler attempts to expand the least expensive possible node group, taking into account the reduced cost of spot VMs. If an on-demand VM is added to the cluster then spot-migrator assumes that there are currently no more spot VMs available and waits for the next migration attempt (currently every hour) however if no on-demand VMs were added then spot-migrator continues to drain on-demand VMs until there are no more left in the cluster (and all workloads are running on spot VMs). Node draining respects PodDisruptionBudgets to ensure that workloads are migrated whilst maintaining desired levels of availability.

Currently only GKE Standard clusters are supported. To allow spot-migrator to migrate workloads to spot VMs with fallback to on-demand VMs your cluster must be running at least one on-demand node pool and at least one spot node pool.

apiVersion: cost-manager.io/v1alpha1
kind: CostManagerConfiguration
controllers:
- spot-migrator
cloudProvider:
  name: gcp

pod-safe-to-evict-annotator

Certain types of Pods can prevent the cluster autoscaler from removing a Node (e.g. Pods in the kube-system Namespace that do not have a PodDisruptionBudget) leading to more Nodes in the cluster than necessary. This can be particularly problematic for workloads that cluster operators are not in control of and can have a high number of replicas, such as kube-dns or the Konnectivity agent, which are typically installed by cloud providers.

To allow the cluster autoscaler to evict all Pods that have not been explicitly marked as unsafe for eviction, pod-safe-to-evict-annotator adds the cluster-autoscaler.kubernetes.io/safe-to-evict: "true" annotation to all Pods that have not already been annotated; note that PodDisruptionBudgets can still be used to maintain desired levels of availability.

apiVersion: cost-manager.io/v1alpha1
kind: CostManagerConfiguration
controllers:
- pod-safe-to-evict-annotator
podSafeToEvictAnnotator:
  namespaceSelector:
    matchExpressions:
    - key: kubernetes.io/metadata.name
      operator: In
      values:
      - kube-system

Installation

You can install cost-manager into a GKE cluster with Workload Identity enabled as follows:

NAMESPACE="cost-manager"
kubectl get namespace "$NAMESPACE" || kubectl create namespace "$NAMESPACE"
LATEST_RELEASE_TAG="$(curl -s https://api.github.com/repos/hsbc/cost-manager/releases/latest | jq -r .tag_name)"
# GCP service account bound to the roles/compute.instanceAdmin role
GCP_SERVICE_ACCOUNT_EMAIL_ADDRESS="cost-manager@example.iam.gserviceaccount.com"
cat <<EOF > values.yaml
image:
  tag: $LATEST_RELEASE_TAG
config:
  apiVersion: cost-manager.io/v1alpha1
  kind: CostManagerConfiguration
  controllers:
  - spot-migrator
  - pod-safe-to-evict-annotator
  cloudProvider:
    name: gcp
  podSafeToEvictAnnotator:
    namespaceSelector:
      matchExpressions:
      - key: kubernetes.io/metadata.name
        operator: In
        values:
        - kube-system
serviceAccount:
  annotations:
    iam.gke.io/gcp-service-account: $GCP_SERVICE_ACCOUNT_EMAIL_ADDRESS
EOF
helm template ./charts/cost-manager -n "$NAMESPACE" -f values.yaml | kubectl apply -f -

Testing

Build Docker image and run E2E tests using kind:

make image e2e

Roadmap

See ROADMAP.md for details.

Contributing

Contributions are greatly appreciated. The project follows the typical GitHub pull request model. See CONTRIBUTING.md for more details.