bring up Kubernetes deployment without restarts #182

pohly · 2019-03-01T09:28:07Z

When I tested the CSI 1.0 support in #177 (eab77d7), the pods came up eventually, but only after some intermittent errors:

$ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
my-csi-app              1/1     Running   0          6m30s
pmem-csi-bqdwp          2/2     Running   11         10m
pmem-csi-controller-0   4/4     Running   15         10m
pmem-csi-h7dfn          2/2     Running   11         10m
pmem-csi-w4p2f          2/2     Running   9          10m

I've not caught it in the logs, but I think the node pmem-driver was restarting because the registry wasn't up yet.

This is confusing. In the sidecar containers, we chose the approach of waiting forever for a peer to show up, with regular logging while in that wait loop. The pmem driver should do the same. See kubernetes-csi/csi-lib-utils#11.

The text was updated successfully, but these errors were encountered:

okartau · 2019-03-04T11:47:50Z

Do the startup phase errors/retries happen differently on 1st vs next times, i.e. what's the impact caused by 1st fetch of docker images.
Also, is csi-1.0 branch behavior different from latest devel-branch, i.e. is there more related issues created by 1.0 work. Because I don't recall seen that high restart counts in devel-branch recently.
I will run some trials as well.

avalluri · 2019-03-04T11:50:22Z

If i am not wrong, this exists in driver for long time, not specific to 1.0 changes. I can take this task.

Instead of exiting with error, driver has to wait and retry 'node controller' registration till the registry server up and registration get succeed. FIXES: intel#182 Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>

avalluri · 2019-03-04T14:07:17Z

with change #184, driver deployment looks like:

$ kubectl get po
NAME                    READY   STATUS    RESTARTS   AGE
pmem-csi-4d6xj          2/2     Running   0          107s
pmem-csi-controller-0   4/4     Running   0          107s
pmem-csi-g5v9h          2/2     Running   0          107s
pmem-csi-tszbz          2/2     Running   0          107s

Instead of exiting with error, driver has to wait and retry 'node controller' registration till the registry server up and registration get succeed. FIXES: intel#182 Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>

Instead of exiting with error, driver has to wait and retry 'node controller' registration till the registry server up and registration get succeed. FIXES: #182 Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>

avalluri self-assigned this Mar 4, 2019

avalluri mentioned this issue Mar 4, 2019

Keep retry registering node controller with RegistryServer #184

Merged

avalluri closed this as completed in #184 Mar 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bring up Kubernetes deployment without restarts #182

bring up Kubernetes deployment without restarts #182

pohly commented Mar 1, 2019

okartau commented Mar 4, 2019

avalluri commented Mar 4, 2019

avalluri commented Mar 4, 2019

bring up Kubernetes deployment without restarts #182

bring up Kubernetes deployment without restarts #182

Comments

pohly commented Mar 1, 2019

okartau commented Mar 4, 2019

avalluri commented Mar 4, 2019

avalluri commented Mar 4, 2019