Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-eks: cdk should validate cluster version and kubectl layer version #24580

Open
trondhindenes opened this issue Mar 11, 2023 · 18 comments
Open
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/small Small work item – less than a day of effort p2

Comments

@trondhindenes
Copy link

trondhindenes commented Mar 11, 2023

Describe the bug

Ever since we upgraded from Kubernetes 1.21 to newer versions, we're getting lots of weird errors related to what I believe are kubectl layer incompatibilities, like

3:40:15 PM | UPDATE_FAILED        | Custom::AWSCDK-EKS-KubernetesResource | clusterAwsAuthmanifestB57F2A94
Received response status [FAILED] from custom resource. Message returned: Error: b'configmap/aws-auth configured\nerror: error retrieving RESTMappings to prune: invalid resource extensions/v1bet
a1, Kind=Ingress, Namespaced=true: no matches for kind "Ingress" in version "extensions/v1beta1"\n'

It would be much better if cdk actually validated the layer version vs the intended kubernetes version when synthesising, so that these issues didn't occur

Expected Behavior

cdk should error out, informing me that the selected cluster version doesn't match the configured layer

Current Behavior

No validation occurs, which leads to lots of errors when trying to change the cluster later

Reproduction Steps

  • create cluster version 1.23
  • make a change, such as add a node group
  • witness the layer error described above

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.67.0

Framework Version

2.66.1

Node.js Version

v18.14.2

OS

Ubuntu

Language

Python

Language Version

3.9

Other information

No response

@trondhindenes trondhindenes added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 11, 2023
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Mar 11, 2023
@trondhindenes
Copy link
Author

From another issue, it looks like the library in some cases print a warning:

You created a cluster with Kubernetes Version 1.23 without specifying the kubectlLayer property

But I've never seen that warning. Was it removed in a newer version maybe?
IMHO it needs to be easy to build rock-solid clusters with cdk.

@pahud
Copy link
Contributor

pahud commented Mar 13, 2023

According to the document:

The version of kubectl used must be compatible with the Kubernetes version of the cluster. kubectl is supported within one minor version (older or newer) of Kubernetes (see Kubernetes version skew policy). Only version 1.20 of kubectl is available in aws-cdk-lib. If you need a different version, you will need to use one of the @aws-cdk/lambda-layer-kubectl-vXY packages.

But I agree with you we probably should implement a check to avoid potential error like that.

I am making this a p2 feature request and any PR would be appreciated!

@ShankarDhandapani
Copy link

ShankarDhandapani commented May 8, 2023

@pahud According to this reply

According to the document:

The version of kubectl used must be compatible with the Kubernetes version of the cluster. kubectl is supported within one minor version (older or newer) of Kubernetes (see Kubernetes version skew policy). Only version 1.20 of kubectl is available in aws-cdk-lib. If you need a different version, you will need to use one of the @aws-cdk/lambda-layer-kubectl-vXY packages.

But I agree with you we probably should implement a check to avoid potential error like that.

I am making this a p2 feature request and any PR would be appreciated!

when I am trying to use @aws-cdk/lambda-layer-kubectl-v25 package with @aws-quickstart/eks-blueprints in GenericClusterProvider with the property of kubectlLayer then it shows error like Type 'typeof KubectlV25Layer' is missing the following properties from type 'ILayerVersion': layerVersionArn, addPermission, stack, env, and 2 more. Below is the code

...
import { KubectlV25Layer } from "@aws-cdk/lambda-layer-kubectl-v25";
....
.....
.....
const clusterProvider = new EksBlueprint.GenericClusterProvider({
      version: this.props.version,
      kubectlLayer: KubectlV25Layer,
      vpcSubnets: [{ subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS }],
      managedNodeGroups: [
        {
          id: `${id}-nodegroup`,
          minSize: 1,
          maxSize: 2,
          instanceTypes: config.InstanceTypes.map(
            (instance_type) => new ec2.InstanceType(instance_type)
          ),
        },
      ],
    });
.....
.....

CC: @menakakarichiyappakumar

@baizele
Copy link

baizele commented May 10, 2023

@ShankarDhandapani looks like you need to instantiate it like:

const kubectl = new KubectlV25Layer(this, 'KubectlLayer');

@jesseadams
Copy link

I am currently struggling with the same issue.

@Kilowhisky
Copy link

This solution does not seem to apply to the v2 of the AWS-CDK.

@pahud
Copy link
Contributor

pahud commented Jul 12, 2023

We probably can add the validation here

if (semver.gte(kubectlVersion, '1.22.0') && !props.kubectlLayer) {
Annotations.of(this).addWarning(`You created a cluster with Kubernetes Version ${props.version.version} without specifying the kubectlLayer property. This may cause failures as the kubectl version provided with aws-cdk-lib is 1.20, which is only guaranteed to be compatible with Kubernetes versions 1.19-1.21. Please provide a kubectlLayer from @aws-cdk/lambda-layer-kubectl-v${kubectlVersion.minor}.`);
};

I guess the challenge is that the lambda.ILayerVersion does not have any attribute of the kubectl version so it's not easy to compare that.

@ravi-vk8679
Copy link

ravi-vk8679 commented Jul 29, 2023

Thanks for this thread.

Describe the bug

Ever since we upgraded from Kubernetes 1.21 to newer versions, we're getting lots of weird errors related to what I believe are kubectl layer incompatibilities, like

3:40:15 PM | UPDATE_FAILED        | Custom::AWSCDK-EKS-KubernetesResource | clusterAwsAuthmanifestB57F2A94
Received response status [FAILED] from custom resource. Message returned: Error: b'configmap/aws-auth configured\nerror: error retrieving RESTMappings to prune: invalid resource extensions/v1bet
a1, Kind=Ingress, Namespaced=true: no matches for kind "Ingress" in version "extensions/v1beta1"\n'

It would be much better if cdk actually validated the layer version vs the intended kubernetes version when synthesising, so that these issues didn't occur

Expected Behavior

cdk should error out, informing me that the selected cluster version doesn't match the configured layer

Current Behavior

No validation occurs, which leads to lots of errors when trying to change the cluster later

Reproduction Steps

  • create cluster version 1.23
  • make a change, such as add a node group
  • witness the layer error described above

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.67.0

Framework Version

2.66.1

Node.js Version

v18.14.2

OS

Ubuntu

Language

Python

Language Version

3.9

Other information

No response

Thanks for starting this thread. I was running into the same issue, but I was able to fix it following the suggestions posted here.

I am using CDK v2 and I see that my kubectl version is at its latest. I don't know my cdk is not validating the Kubectl version. Is anyone working on fixing this? Any idea on when will this issue be fixed where it can take the related versions for kubectlLayer based on the kubernetes version provided.

I imported the KubectlLambdaLayer package from here.

`import { KubectlV26Layer } from '@aws-cdk/lambda-layer-kubectl-v26';

kubectlLayer: new KubectlV26Layer(this, 'KubectlLayer'),`

@graydenshand
Copy link
Contributor

I've seen this error several times while attempting to update resources created with cluster.add_manifest().

It appears cloud formation is attempting to use a mismatched api version from what is actually deployed. E.g. attempting to use batch/v1beta1 rather than batch/v1.

Full error response

Received response status [FAILED] from custom resource. Message returned: Error: b'serviceaccount/user created\nerror: error retrieving RESTMappings to prune: invalid resource batch/v1beta1, Kind=CronJob, Namespaced=true: no matches for kind "CronJob" in version "batch/v1beta1"\n' Logs: /aws/lambda/Application-awscdka-Handler886CB40B-q8TSqd5FvHp8 at invokeUserFunction (/var/task/framework.js:2:6) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async onEvent (/var/task/framework.js:1:369) at async Runtime.handler (/var/task/cfn-response.js:1:1573) (RequestId: 1ffd3898-6f7f-49a7-b97d-83518c0dc5fe)

When it occurs, it leaves the stack in an UPDATE_ROLLBACK_FAILED state and there no way to stabilize the stack again. I've had to destroy and recreate my entire cluster every time.

Running Kubernetes 1.29.

@benjamin-at-greensky
Copy link

I've deployed a 1.29 EKS cluster via cdk and specify the kubectlLayer as KubectlV29Layer() when creating the cluster and having the same issue as @graydenshand where the only way to get changes applied is to destroy and deploy again. This blocks just about any management of the cluster.

From the lambda kubectl layer logs:

[ERROR] Exception: b'service/serviceXYZ configured\nerror: error retrieving RESTMappings to prune: invalid resource batch/v1beta1, Kind=CronJob, Namespaced=true: no matches for kind "CronJob" in version "batch/v1beta1"\n' Traceback (most recent call last): File "/var/task/index.py", line 14, in handler return apply_handler(event, context) File "/var/task/apply/__init__.py", line 69, in apply_handler kubectl('apply', manifest_file, *kubectl_opts) File "/var/task/apply/__init__.py", line 91, in kubectl raise Exception(output)

@kriscoleman
Copy link

We are experiencing the same problem

To make matters worse for us, it appears that KubectlV29 was never released by the GO cdk lib from cdklabs/awscdk-kubectl-go, leaving us with few options to resolve this gracefully.

https://github.com/cdklabs/awscdk-kubectl-go/commits/kubectl.29

@pahud
Copy link
Contributor

pahud commented May 21, 2024

@graydenshand @benjamin-at-greensky

Are you able to reproduce this issue for us? For example, after initially create a 1.29 cluster with kubectl v29 layer, what could cause this error after that?

@pahud
Copy link
Contributor

pahud commented May 21, 2024

@kriscoleman Can you create a new issue and provide your CDK in Go code snippet in the issue description?

@benjamin-at-greensky
Copy link

benjamin-at-greensky commented Jun 4, 2024

@pahud I have been able to reproduce this by deploying a fresh EKS cluster with kubectlLayer set to v29 and then redeploying a helm chart with updated values.

import { KubectlV29Layer } from '@aws-cdk/lambda-layer-kubectl-v29';

const clusterProps: GsEksClusterProps = {
...
    kubectlLayer: new KubectlV29Layer(this, 'KubectlLayer'),
...
}

this.cluster = new eks.Cluster(this, 'EksCluster', {
...
    kubectlLayer: clusterProps.kubectlLayer
...
});

After this I will make an update to the cdk that deploys a helm chart (for example I was redeploying one with some annotations on an ingress). I then receive this error when running a cdk deploy:

10:06:27 AM | UPDATE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | Clustermanifestrep...63A40109
Received response status [FAILED] from custom resource. Message returned: Error: b'configmap/start-override configured\nerror: error retrieving RESTMappings to prune: invalid resource bat
ch/v1beta1, Kind=CronJob, Namespaced=true: no matches for kind "CronJob" in version "batch/v1beta1"\n'

I have no CronJob's deployed to the cluster:

$ kubectl get cronjob -A
No resources found

$ kubectl api-resources | grep cronjob
cronjobs                          cj           batch/v1                          true         CronJob

It is worth mentioning the helm chart I'm deploying has no references to batch/v1beta1 anywhere.

@dilshanonline
Copy link

dilshanonline commented Jun 13, 2024

I had the same issue and defining,

from aws_cdk.lambda_layer_kubectl_v28 import KubectlV28Layer

    cluster = eks.Cluster(
        self,
        'EksCluster',
        version=eks.KubernetesVersion.V1_28,
        kubectl_layer=KubectlV28Layer(self, "KubectlLayer"),
  )

solved my issue.

@tchcxp
Copy link

tchcxp commented Aug 15, 2024

I am using @aws-cdk/lambda-layer-kubectl-v30 and KubernetesVersion.V1_30 and I got the same issue as @graydenshand and @benjamin-at-greensky mentioned above when updating resources. The only workaround is to delete and re-create the application and related resources, which is completely impossible for the production environment.

4:02:20 PM | UPDATE_FAILED        | Custom::AWSCDK-EKS-KubernetesResource    | ImportedClusterman...aDployment5DA7DFEB
Received response status [FAILED] from custom resource. Message returned: Error: b'deployment.apps/********** configured\nerror: error retrieving RESTMappings to prune: invalid resource batch/v1beta1, Kind=CronJob, Namespaced=true: no matches for kind "CronJob" in version "batch/v1beta1"\n'

Can someone please look into this issue? It's been a while and it technically blocked us from using EKS at the moment.

@tchcxp
Copy link

tchcxp commented Aug 18, 2024

I had the same issue and defining,

from aws_cdk.lambda_layer_kubectl_v28 import KubectlV28Layer

    cluster = eks.Cluster(
        self,
        'EksCluster',
        version=eks.KubernetesVersion.V1_28,
        kubectl_layer=KubectlV28Layer(self, "KubectlLayer"),
  )

solved my issue.

I tried to create a new cluster in version 1.28 and use KubectlV28Layer, but still got the same error.

@kkandori
Copy link

kkandori commented Aug 27, 2024

@tchcxp
The issue occurs due to the kubectlLayer, specifically the kubectl version in the handler lambda. It seems the cluster is imported, which leads to the following error:

UPDATE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | ImportedClusterman...aDployment5DA7DFEB
Received response status [FAILED] from custom resource. Message returned: Error: b'deployment.apps/********** configured\nerror: error retrieving RESTMappings to prune: invalid resource batch/v1beta1, Kind=CronJob, Namespaced=true: no matches for kind "CronJob" in version "batch/v1beta1"\n'

By default, if you don't specify the layer version, it will default to version 20.0.

To resolve this, you need to set the kubectl layer again:

eks.Cluster.fromClusterAttributes(this, 'ImportedCluster', {
    clusterName: clusterName, 
    kubectlRoleArn: kubectlRoleArn,
    blah: blah,
    kubectlLayer: new KubectlV28Layer(this, `kubectl-v28-layer`),  // <--- 
});

This should address the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/small Small work item – less than a day of effort p2
Projects
None yet
Development

No branches or pull requests