Skip to content

Latest commit

 

History

History
412 lines (313 loc) · 21.6 KB

gce.md

File metadata and controls

412 lines (313 loc) · 21.6 KB

GCE Ingress controller FAQ

This page contains general FAQ for the GCE Ingress controller.

Table of Contents

How do I deploy an Ingress controller?

On GCP (either GCE or GKE), every Kubernetes cluster has an Ingress controller running on the master, no deployment necessary. You can deploy a second, different (i.e non-GCE) controller, like this. If you wish to deploy a GCE controller as a pod in your cluster, make sure to turn down the existing auto-deployed Ingress controller as shown in this example.

I created an Ingress and nothing happens, now what?

Please check the following:

  1. Output of kubectl describe, as shown here
  2. Do your Services all have a NodePort?
  3. Do your Services either serve an HTTP status code 200 on /, or have a readiness probe as described in this section?
  4. Do you have enough GCP quota?

What are the cloud resources created for a single Ingress?

Terminology:

The pipeline is as follows:

Global Forwarding Rule -> TargetHTTPProxy
        |                                  \                               Instance Group (us-east1)
    Static IP                               URL Map - Backend Service(s) - Instance Group (us-central1)
        |                                  /                               ...
Global Forwarding Rule -> TargetHTTPSProxy
                            SSL cert

In addition to this pipeline:

  • Each Backend Service requires a HTTP or HTTPS health check to the NodePort of the Service
  • Each port on the Backend Service has a matching port on the Instance Group
  • Each port on the Backend Service is exposed through a firewall-rule open to the GCE LB IP ranges (130.211.0.0/22 and 35.191.0.0/16)

The Ingress controller events complain about quota, how do I increase it?

GLBC is not aware of your GCE quota. As of this writing users get 3 GCE Backend Services by default. If you plan on creating Ingresses for multiple Kubernetes Services, remember that each one requires a backend service, and request quota. Should you fail to do so the controller will poll periodically and grab the first free backend service slot it finds. You can view your quota:

$ gcloud compute project-info describe --project myproject

See GCE documentation for how to request more.

Why does the Ingress need a different instance group then the GKE cluster?

The controller adds/removes Kubernetes nodes that are NotReady from the lb instance group. We cannot simply rely on health checks to achieve this for a few reasons.

First, older Kubernetes versions (<=1.3) did not mark endpoints on unreachable nodes as NotReady. Meaning if the Kubelet didn't heart beat for 10s, the node was marked NotReady, but there was no other signal at the Service level to stop routing requests to endpoints on that node. In later Kubernetes version this is handled a little better, if the Kubelet doesn't heart beat for 10s it's marked NotReady, if it stays in NotReady for 40s all endpoints are marked NotReady. So it is still advantageous to pull the node out of the GCE LB Instance Group in 10s, because we save 30s of bad requests.

Second, continuing to send requests to NotReady nodes is not a great idea. The NotReady condition is an aggregate of various factors. For example, a NotReady node might still pass health checks but have the wrong nodePort to endpoint mappings. The health check will pass as long as something returns a HTTP 200.

Why does the cloud console show 0/N healthy instances?

Some nodes are reporting negatively on the GCE HTTP health check. Please check the following:

  1. Try to access any node-ip:node-port/health-check-url
  2. Try to access any public-ip:node-port/health-check-url
  3. Make sure you have a firewall-rule allowing access to the GCE LB IP range (created by the Ingress controller on your behalf)
  4. Make sure the right NodePort is opened in the Backend Service, and consequently, plugged into the lb instance group

Can I configure GCE health checks through the Ingress?

Currently health checks are not exposed through the Ingress resource, they're handled at the node level by Kubernetes daemons (kube-proxy and the kubelet). However the GCE L7 lb still requires a HTTP(S) health check to measure node health. By default, this health check points at / on the nodePort associated with a given backend. Note that the purpose of this health check is NOT to determine when endpoint pods are overloaded, but rather, to detect when a given node is incapable of proxying requests for the Service:nodePort altogether. Overloaded endpoints are removed from the working set of a Service via readiness probes conducted by the kubelet.

If / doesn't work for your application, you can have the Ingress controller program the GCE health check to point at a readiness probe as shows in this example.

We plan to surface health checks through the API soon.

Why does my Ingress have an ephemeral ip?

GCE has a concept of ephemeral and static IPs. A production website would always want a static IP, which ephemeral IPs are cheaper (both in terms of quota and cost), and are therefore better suited for experimentation.

  • Creating a HTTP Ingress (i.e an Ingress without a TLS section) allocates an ephemeral IP for 2 reasons:
    • we want to encourage secure defaults
    • static-ips have limited quota and pure HTTP ingress is often used for testing
  • Creating an Ingress with a TLS section allocates a static IP
  • Modifying an Ingress and adding a TLS section allocates a static IP, but the IP will change. This is a beta limitation.
  • You can promote an ephemeral to a static IP by hand, if required.

Can I pre-allocate a static-ip?

Yes, please see this example.

Does updating a Kubernetes secret update the GCE TLS certs?

Yes, expect O(30s) delay.

The controller should create a second SSL certificate suffixed with -1 and atomically swap it with the SSL certificate in your target proxy, then delete the obsolete SSL certificate.

Can I tune the loadbalancing algorithm?

Right now, a kube-proxy NodePort service is a necessary condition for Ingress on GCP. This is because the cloud LB doesn't understand how to route directly to your pods. Incorporating kube-proxy and cloud lb algorithms so they cooperate toward a common goal is still a work in progress. If you really want fine grained control over the algorithm, you should deploy the nginx controller.

Is there a maximum number of Endpoints I can add to the Ingress?

This limit is directly related to the maximum number of endpoints allowed in a Kubernetes cluster, not the the HTTP LB configuration, since the HTTP LB sends packets to VMs. Ingress is not yet supported on single zone clusters of size > 1000 nodes (issue). If you'd like to use Ingress on a large cluster, spread it across 2 or more zones such that no single zone contains more than a 1000 nodes. This is because there is a limit to the number of instances one can add to a single GCE Instance Group. In a multi-zone cluster, each zone gets its own instance group.

How do I match GCE resources to Kubernetes Services?

The format followed for creating resources in the cloud is: k8s-<resource-name>-<nodeport>-<cluster-hash>, where nodeport is the output of

$ kubectl get svc <svcname> --template '{{range $i, $e := .spec.ports}}{{$e.nodePort}},{{end}}'

cluster-hash is the output of:

$ kubectl get configmap -o yaml --namespace=kube-system | grep -i " data:" -A 1
  data:
    uid: cad4ee813812f808

and resource-name is a short prefix for one of the resources mentioned here (eg: be for backends, hc for health checks). If a given resource is not tied to a single node-port, its name will not include the same.

Can I change the cluster UID?

The Ingress controller configures itself to add the UID it stores in a configmap in the kube-system namespace.

$ kubectl --namespace=kube-system get configmaps
NAME          DATA      AGE
ingress-uid   1         12d

$ kubectl --namespace=kube-system get configmaps -o yaml
apiVersion: v1
items:
- apiVersion: v1
  data:
    uid: UID
  kind: ConfigMap
...

You can pick a different UID, but this requires you to:

  1. Delete existing Ingresses
  2. Edit the configmap using kubectl edit
  3. Recreate the same Ingress

After step 3 the Ingress should come up using the new UID as the suffix of all cloud resources. You can't simply change the UID if you have existing Ingresses, because renaming a cloud resource requires a delete/create cycle that the Ingress controller does not currently automate. Note that the UID in step 1 might be an empty string, if you had a working Ingress before upgrading to Kubernetes 1.3.

A note on setting the UID: The Ingress controller uses the token -- to split a machine generated prefix from the UID itself. If the user supplied UID is found to contain -- the controller will take the token after the last --, and use an empty string if it ends with --. For example, if you insert foo--bar as the UID, the controller will assume bar is the UID. You can either edit the configmap and set the UID to bar to match the controller, or delete existing Ingresses as described above, and reset it to a string bereft of --.

Why do I need a default backend?

All GCE URL maps require at least one default backend, which handles all requests that don't match a host/path. In Ingress, the default backend is optional, since the resource is cross-platform and not all platforms require a default backend. If you don't specify one in your yaml, the GCE ingress controller will inject the default-http-backend Service that runs in the kube-system namespace as the default backend for the GCE HTTP lb allocated for that Ingress resource.

Some caveats concerning the default backend:

  • It is the only Backend Service that doesn't directly map to a user specified NodePort Service
  • It's created when the first Ingress is created, and deleted when the last Ingress is deleted, since we don't want to waste quota if the user is not going to need L7 loadbalancing through Ingress
  • It has a HTTP health check pointing at /healthz, not the default /, because / serves a 404 by design

How does Ingress work across 2 GCE clusters?

See kubemci documentation.

I shutdown a cluster without deleting all Ingresses, how do I manually cleanup?

If you kill a cluster without first deleting Ingresses, the resources will leak. If you find yourself in such a situation, you can delete the resources by hand:

  1. Navigate to the cloud console and click on the "Networking" tab, then choose "LoadBalancing"
  2. Find the loadbalancer you'd like to delete, it should have a name formatted as: k8s-um-ns-name--UUID
  3. Delete it, check the boxes to also cascade the deletion down to associated resources (eg: backend-services)
  4. Switch to the "Compute Engine" tab, then choose "Instance Groups"
  5. Delete the Instance Group allocated for the leaked Ingress, it should have a name formatted as: k8s-ig-UUID

We plan to fix this soon.

How do I disable the GCE Ingress controller?

As of Kubernetes 1.3, GLBC runs as a static pod on the master. If you want to disable it, you have 3 options:

Soft disable

Option 1. Have it no-op for an Ingress resource based on the ingress.class annotation as shown here. This can also be used to use one of the other Ingress controllers at the same time as the GCE controller.

Hard disable

Option 2. SSH into the GCE master node and delete the GLBC manifest file found at /etc/kubernetes/manifests/glbc.manifest.

Option 3. Disable the addon in GKE via gcloud:

Disabling GCE ingress on cluster creation

Disable the addon in GKE at cluster bring-up time through the disable-addons flag:

gcloud container clusters create mycluster --network "default" --num-nodes 1 \
--machine-type n1-standard-2 \
--zone $ZONE \
--disk-size 50 \
--scopes storage-full \
--disable-addons HttpLoadBalancing

Disabling GCE ingress in an existing cluster

Disable the addon in GKE for an existing cluster through the update-addons flag:

gcloud container clusters update mycluster --update-addons HttpLoadBalancing=DISABLED

What GCE resources are shared between Ingresses?

Every Ingress creates a pipeline of GCE cloud resources behind an IP. Some of these are shared between Ingresses out of necessity, while some are shared because there was no perceived need for duplication (all resources consume quota and usually cost money).

Shared:

  • Backend Services: because of low quota and high reuse. A single Service in a Kubernetes cluster has one NodePort, common throughout the cluster. GCE has a hard limit of the number of allowed Backend Services, so if multiple Ingresses all point to a single Service, that creates a single Backend Service in GCE pointing to that Service's NodePort.

  • Instance Group: since an instance can only be part of a single loadbalanced Instance Group, these must be shared. There is 1 Ingress Instance Group per zone containing Kubernetes nodes.

  • Health Checks: currently the health checks point at the NodePort of a Backend Service. They don't need to be shared, but they are since Backend Services are shared.

  • Firewall rule: There is a single firewall rule that covers health check traffic from the range of GCE loadbalancer IPs to entire NodePort range.

Unique:

Currently, a single Ingress on GCE creates a unique IP and URL Map. In this model the following resources cannot be shared:

  • URL Map
  • Target HTTP(S) Proxies
  • SSL Certificates
  • Static-ip
  • Forwarding rules

How do I debug a controller spinloop?

The most likely cause of a controller spin loop is some form of GCE validation failure, eg:

  • It's trying to delete a Backend Service already in use, say in a URL Map
  • It's trying to add an Instance to more than 1 loadbalanced Instance Groups
  • It's trying to flip the loadbalancing algorithm on a Backend Service to RATE, when some other Backend Service is pointing at the same Instance Group and asking for UTILIZATION

In all such cases, the work queue will put a single key (ingress namespace/name) that's getting continuously re-queued into exponential backoff. However, currently the Informers that watch the Kubernetes API are setup to periodically resync, so even though a particular key is in backoff, we might end up syncing all other keys every, say, 10m, which might trigger the same validation-error-condition when syncing a shared resource.

Creating an Internal Load Balancer without existing ingress

How the GCE ingress controller Works
To assemble an L7 Load Balancer, the ingress controller creates an unmanaged instance-group named k8s-ig--{UID} and adds every known minion node to the group. For every service specified in all ingresses, a Backend Service is created to point to that instance group.

How the Internal Load Balancer Works
K8s does not yet assemble ILB's for you, but you can manually create one via the GCP Console. The ILB is composed of a regional forwarding rule and a regional Backend Service. Similar to the L7 LB, the Backend Service points to an unmanaged instance-group containing your K8s nodes.

The Complication
GCP will only allow one load balanced unmanaged instance-group for a given instance. If you manually created an instance group named something like my-kubernetes-group containing all your nodes and put an ILB in front of it, then you will probably encounter a GCP error when setting up an ingress resource. The controller doesn't know to use your my-kubernetes-group group and will create it's own. Unfortunately, it won't be able to add any nodes to that group because they already belong to the ILB group.

As mentioned before, the instance group name is composed of a hard-coded prefix k8s-ig-- and a cluster-specific UID. The ingress controller will check the K8s configmap for an existing UID value at process start. If it doesn't exist, the controller will create one randomly and update the configmap.

Solutions

Want an ILB and Ingress?
If you plan on creating both ingresses and internal load balancers, simply create the ingress resource first then use the GCP Console to create an ILB pointing to the existing instance group.

Want just an ILB for now, ingress maybe later?
Retrieve the UID via configmap, create an instance-group per used zone, then add all respective nodes to the group.

# Fetch instance group name from config map
GROUPNAME=`kubectl get configmaps ingress-uid -o jsonpath='k8s-ig--{.data.uid}' --namespace=kube-system`

# Create an instance group for every zone you have nodes. If you use GKE, this is probably a single zone.
gcloud compute instance-groups unmanaged create $GROUPNAME --zone {ZONE}

# Look at your list of your nodes
kubectl get nodes

# Add minion nodes that exist in zone X to the instance group in zone X. (Do not add the master!)
gcloud compute instance-groups unmanaged add-instances $GROUPNAME --zone {ZONE} --instances=A,B,C...

You can now follow the GCP Console wizard for creating an internal load balancer and point to the k8s-ig--{UID} instance group.

Can I use websockets?

Yes!
The GCP HTTP(S) Load Balancer supports websockets. You do not need to change your HTTP server or Kubernetes deployment. You will need to manually configure the created Backend Service's timeout setting. This value is the interpreted as the max connection duration. The default value of 30 seconds is probably too small for your needs. You can increase it to the supported maximum: 86400 (a day) through the GCP Console or the gcloud CLI.

View the example.