Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linkerd check --proxy CLI command isn't working #13306

Open
tstraley opened this issue Nov 11, 2024 · 7 comments
Open

linkerd check --proxy CLI command isn't working #13306

tstraley opened this issue Nov 11, 2024 · 7 comments
Labels

Comments

@tstraley
Copy link

tstraley commented Nov 11, 2024

What is the issue?

Attempting to run linkerd check --proxy results in the same operation as the linkerd check control-plane checks.

I cannot get any validation of the data plane checks, even when trying to isolate to a namespace with -n flag.

How can it be reproduced?

linkerd check --proxy

Logs, error output, etc

example:

$ linkerd check --proxy -n main --wait 5s --verbose
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API

kubernetes-version
------------------
√ is running the minimum Kubernetes API version

linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ control plane pods are ready
DEBU[0002] Skipping check: cluster networks contains all node podCIDRs. Reason: skipping check because the nodes aren't exposing podCIDR 
√ cluster networks contains all pods
√ cluster networks contains all services

linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
× control plane CustomResourceDefinitions exist
    missing egressnetworks.policy.linkerd.io
    see https://linkerd.io/2/checks/#l5d-existence-crd for hints

linkerd-jaeger
--------------
√ linkerd-jaeger extension Namespace exists
√ jaeger extension pods are injected
√ jaeger injector pods are running
‼ jaeger extension proxies are healthy
    Some pods do not have the current trust bundle and must be restarted:
	* jaeger-injector-765ccfbb5-dmc2r
    see https://linkerd.io/2/checks/#l5d-jaeger-proxy-healthy for hints
‼ jaeger extension proxies are up-to-date
    some proxies are not running the current version:
	* jaeger-injector-765ccfbb5-dmc2r (edge-24.10.2)
    see https://linkerd.io/2/checks/#l5d-jaeger-proxy-cp-version for hints
‼ jaeger extension proxies and cli versions match
    jaeger-injector-765ccfbb5-dmc2r running edge-24.10.2 but cli running edge-24.10.5
    see https://linkerd.io/2/checks/#l5d-jaeger-proxy-cli-version for hints

Status check results are ×

output of linkerd check -o short

$ linkerd check -o short --wait 5s
linkerd-config
--------------
× control plane CustomResourceDefinitions exist
    missing egressnetworks.policy.linkerd.io
    see https://linkerd.io/2/checks/#l5d-existence-crd for hints

linkerd-jaeger
--------------
‼ jaeger extension proxies are healthy
    Some pods do not have the current trust bundle and must be restarted:
	* jaeger-injector-765ccfbb5-dmc2r
    see https://linkerd.io/2/checks/#l5d-jaeger-proxy-healthy for hints
‼ jaeger extension proxies are up-to-date
    some proxies are not running the current version:
	* jaeger-injector-765ccfbb5-dmc2r (edge-24.10.2)
    see https://linkerd.io/2/checks/#l5d-jaeger-proxy-cp-version for hints
‼ jaeger extension proxies and cli versions match
    jaeger-injector-765ccfbb5-dmc2r running edge-24.10.2 but cli running edge-24.10.5
    see https://linkerd.io/2/checks/#l5d-jaeger-proxy-cli-version for hints

Status check results are ×

Environment

K8s 1.30 (AWS EKS)

$ linkerd version
Client version: edge-24.10.5
Server version: edge-24.10.2

Possible solution

No response

Additional context

This workflow is critically essential for this administrative task of rotating the trust anchor cert bundle, otherwise we cannot readily identify if any meshed pods still need to be restarted to pick up the latest trust anchor https://linkerd.io/2-edge/tasks/manually-rotating-control-plane-tls-credentials/

Would you like to work on fixing this bug?

no

@tstraley tstraley added the bug label Nov 11, 2024
@MicahSee
Copy link
Contributor

I will take this!

@MicahSee
Copy link
Contributor

@tstraley I just ran your exact command (on a different app namespace) bin/linkerd check --proxy -n emojivoto --wait 5s --verbose and I am getting the expected result.

You should see two additional sections in the output with the --proxy flag added. These are linkerd-data-plane and linkerd-identity-data-plane. Can you confirm that you don't see these?

@MicahSee
Copy link
Contributor

Ahh actually looks like the troublesome line in your output is:

× control plane CustomResourceDefinitions exist
    missing egressnetworks.policy.linkerd.io
    see https://linkerd.io/2/checks/#l5d-existence-crd for hints

The CustomResourceDefinitions check is set as fatal, so because this is failing on your setup, no further checks will be run, and you will never get to the additional checks run by the --proxy flag, hence the output looks the same for both commands because they fail at the same place.

@tstraley
Copy link
Author

tstraley commented Nov 25, 2024

@MicahSee thanks for your response.

Am I wrong in thinking that in the past, the --proxy flag resulted in only executing the data-plane checks and not these control-plane checks?

The documentation around this flag still states that it is used to only run data-plane checks https://linkerd.io/2.16/reference/cli/check/

Even if control-plane checks are expected to be run now, why would it stop after a failure like this instead of proceeding to run the data-plane checks I asked for?


I guess I'm also curious why the missing CRD is a fatal check in the first place, given that the installed version of linkerd in our cluster is edge-24.10.2 which doesn't have that egressnetworks.policy.linkerd.io (it wasn't added until 24.10.4). I've even tried using various permutations of --expected-version and --cli-version-override which don't seem to help. But this concern isn't really relevant to this bug.

@tstraley
Copy link
Author

tstraley commented Dec 5, 2024

@MicahSee - just wanted to check in on this and make sure you saw my reply above. Thanks.

@kflynn
Copy link
Member

kflynn commented Dec 5, 2024

The EgressNetwork CRD showed up in edge-24.10.4. Your control plane is running edge-24.10.2, so it has no EgressNetwork CRD, but your linkerd CLI is from edge-24.10.5, so it looks for EgressNetwork. In general, the CLI expects that you're running the same CLI version as control plane version, so the edge-24.10.5 CLI treats the missing EgressNetwork as an error. The simplest way to manage that is to run the edge-24.10.2 CLI, to match your control plane (since the CLI is a standalone binary, this isn't too hard -- check in ~/.linkerd2/bin and you may already have it, even).

I'll have to look back at history for the --proxy check, but my understanding is that we always have to check some of the control-plane basics to get a sense of whether there's any possibility of being able to check the data plane...

@tstraley
Copy link
Author

tstraley commented Dec 9, 2024

The simplest way to manage that is to run the edge-24.10.2 CLI

I was able to find where I could manually get this version of the CLI, but are there docs anywhere describing how to install a specific version? All I found was https://linkerd.io/2.16/getting-started/#step-1-install-the-cli which just installs the latest available.

I'll have to look back at history for the --proxy check, but my understanding is that we always have to check some of the control-plane basics to get a sense of whether there's any possibility of being able to check the data plane.

Thanks. I'm not clear why checking CRDs or the jaeger extension would be necessary control-plane basics, but if this is the case, seems like the docs and CLI should not be saying --proxy is to only run data-plane checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants