-
Notifications
You must be signed in to change notification settings - Fork 55
Advance csi-node-driver-registrar version to 1.1.0 #248
Conversation
Note that this commit also has side-artifact in form of /sys mount change by kustomize pass in a file which is not relevant to this PR.
So the lack of some parts was side effect of generating files and adding files in overlapping branches. |
Joy too early: first impression "it works now" was based on observation that pod did not enter CrashLoop state. But instead, driver-registrar container remains retrying without timeout, and does not reach functional state. |
The problem we see in this deployment trial is similar to what is reported here: The SELinux=enabled has been pointed to trigger connection failure |
qOlev Kartau <notifications@github.com> writes:
In a sense the new semantics is not so good because it hides the "can
not connect to socket" problem.
It would be worthwhile to file a feature request:
- implement a readiness probe for the sidecar
- return "ready" only once connected to the driver
|
But how come only driver-registrar has problem with that? |
610ec49
to
0206d95
Compare
0206d95
to
f2b0d82
Compare
Although the reason for deployment issue which caused this trial, turned out to be other misconfiguration, we can still consider this PR as independent contribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I also verified with the Kubernetes-CSI WG that csi-node-driver-registrar is indeed compatible with 1.13 and that the resulting merge leaves deployment files in a consistent state (merge manually, run make test-kustomize
).
One deployment trial case shows that driver deployment fails with version 1.0.2 but succeeds with 1.1.0.
In failing case, node-driver-registrar fails in getting connection to local csi socket, times out in 60s, and causes pod to exit and CrashLoop. For some reason (still not explained), this scenario repeats multiple times.
There is explanation why connection can take longer time 1st time, because node driver is in turn waiting to register with controller, and controller is still in starting stage.
But such wait should not happen on 2nd start of node pod.
In 1.1.0 the timeout and exit was removed and driver-registrar keeps trying. That seems to help
in the current deployment case.