Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

bring up Kubernetes deployment without restarts #182

Closed
pohly opened this issue Mar 1, 2019 · 3 comments
Closed

bring up Kubernetes deployment without restarts #182

pohly opened this issue Mar 1, 2019 · 3 comments
Assignees

Comments

@pohly
Copy link
Contributor

pohly commented Mar 1, 2019

When I tested the CSI 1.0 support in #177 (eab77d7), the pods came up eventually, but only after some intermittent errors:

$ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
my-csi-app              1/1     Running   0          6m30s
pmem-csi-bqdwp          2/2     Running   11         10m
pmem-csi-controller-0   4/4     Running   15         10m
pmem-csi-h7dfn          2/2     Running   11         10m
pmem-csi-w4p2f          2/2     Running   9          10m

I've not caught it in the logs, but I think the node pmem-driver was restarting because the registry wasn't up yet.

This is confusing. In the sidecar containers, we chose the approach of waiting forever for a peer to show up, with regular logging while in that wait loop. The pmem driver should do the same. See kubernetes-csi/csi-lib-utils#11.

@okartau
Copy link
Contributor

okartau commented Mar 4, 2019

Do the startup phase errors/retries happen differently on 1st vs next times, i.e. what's the impact caused by 1st fetch of docker images.
Also, is csi-1.0 branch behavior different from latest devel-branch, i.e. is there more related issues created by 1.0 work. Because I don't recall seen that high restart counts in devel-branch recently.
I will run some trials as well.

@avalluri
Copy link
Contributor

avalluri commented Mar 4, 2019

If i am not wrong, this exists in driver for long time, not specific to 1.0 changes. I can take this task.

@avalluri avalluri self-assigned this Mar 4, 2019
avalluri added a commit to avalluri/pmem-CSI that referenced this issue Mar 4, 2019
Instead of exiting with error, driver has to wait and retry 'node controller'
registration till the registry server up and registration get succeed.

FIXES: intel#182

Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
avalluri added a commit to avalluri/pmem-CSI that referenced this issue Mar 4, 2019
Instead of exiting with error, driver has to wait and retry 'node controller'
registration till the registry server up and registration get succeed.

FIXES: intel#182

Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
@avalluri
Copy link
Contributor

avalluri commented Mar 4, 2019

with change #184, driver deployment looks like:

$ kubectl get po
NAME                    READY   STATUS    RESTARTS   AGE
pmem-csi-4d6xj          2/2     Running   0          107s
pmem-csi-controller-0   4/4     Running   0          107s
pmem-csi-g5v9h          2/2     Running   0          107s
pmem-csi-tszbz          2/2     Running   0          107s

avalluri added a commit to avalluri/pmem-CSI that referenced this issue Mar 6, 2019
Instead of exiting with error, driver has to wait and retry 'node controller'
registration till the registry server up and registration get succeed.

FIXES: intel#182

Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
avalluri added a commit to avalluri/pmem-CSI that referenced this issue Mar 7, 2019
Instead of exiting with error, driver has to wait and retry 'node controller'
registration till the registry server up and registration get succeed.

FIXES: intel#182

Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
avalluri added a commit to avalluri/pmem-CSI that referenced this issue Mar 8, 2019
Instead of exiting with error, driver has to wait and retry 'node controller'
registration till the registry server up and registration get succeed.

FIXES: intel#182

Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
avalluri added a commit that referenced this issue Mar 11, 2019
Instead of exiting with error, driver has to wait and retry 'node controller'
registration till the registry server up and registration get succeed.

FIXES: #182

Signed-off-by: Amarnath Valluri <amarnath.valluri@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants