-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
secrets-store dropping connection to registrar, liveness-probe after 30-45sec of deployment, only works correctly after container is restarted #620
Comments
Hello @jddexxx Thanks for reporting this. Also, could you try installing driver and provider in |
another area to look is for OOM or abnormal process failures. I found Azure/secrets-store-csi-driver-provider-azure#328 on the azure provider of a similar report. |
@nilekhc running the driver and provider in kube-system has the same result (and same logs) description of the pod (after moving it into kube-system)
top output:
both nodes are reporting no pressure of any kind:
Does the driver perhaps get unregistered when the liveness-probe restarts the secrets-store? If so that would explain the reason the driver doesn't show up, but not why the disconnect occurs. Here are the logs in the interim before the liveness-probe causes the secrets-store to restart:
top during that time shows the csi driver using a little more CPU:
but there is still no pressure |
@jddexxx Thank you for the details. This looks very similar to kubernetes-csi/node-driver-registrar#139. The |
I can confirm that after simply restarting the pod by deleting it and letting the replicaset run it again (not redeploying), it works. I'll keep a note in the stack deployment manifest that this will likely need to be manually restarted after each deployment. I didn't try this originally as I assumed restarting would be the same as redeploying. Thank you for your help, I will not close this issue for now as this is not a proper solution - it seems very odd that it would work only after a restart. Perhaps some connection retry facility would work as you mentioned. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What steps did you take and what happened:
When spinning up project, the csi driver is unable to be found.
The namespace for the SecretsProviderClass is the same as the project's namespace (apps)
The driver and provider are running on all nodes, and the project runs on those same nodes
The driver and provider have been up and running in the same namespace
The driver and SecretsProviderClass are all existing with no suspicious entries in log
project describe (clipped):
What did you expect to happen:
The driver is found and able to be used.
Anything else you would like to add:
The drivers and providers running:
Logs for secrets-store:
Logs for csi-driver:
CSIDriver list:
SecretProviderClass correctly exists:
Provider log:
SecretsProviderClass config (via pulumi):
Secrets-related YML for app:
Which provider are you using:
[e.g. Azure Key Vault, HashiCorp Vault, etc. Have you checked out the provider's repo for more help?]
aws - although the provider part seems to be irrelevant until the CSIDriver is picked up.
Environment:
kubectl version
):The text was updated successfully, but these errors were encountered: