Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD-681: Add etcd-backup-server container within separate daemonset #1354

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Elbehery
Copy link
Contributor

@Elbehery Elbehery commented Oct 9, 2024

resolves https://issues.redhat.com/browse/ETCD-681

  • Reacts to Deltas in CR
  • add the error handling back and a test for it

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 9, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 9, 2024

@Elbehery: This pull request references ETCD-681 which is a valid jira issue.

In response to this:

resolves https://issues.redhat.com/browse/ETCD-681

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Elbehery
Copy link
Contributor Author

Elbehery commented Oct 9, 2024

/hold

still WIP

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 9, 2024
@openshift-ci openshift-ci bot requested review from dusk125 and tjungblu October 9, 2024 21:48
Copy link
Contributor

openshift-ci bot commented Oct 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Elbehery

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 9, 2024
@Elbehery Elbehery force-pushed the add_backup-server_deployment branch 3 times, most recently from 0d1d456 to 7a70831 Compare October 10, 2024 08:30
@Elbehery Elbehery force-pushed the add_backup-server_deployment branch from 7a70831 to 99f31b8 Compare October 10, 2024 10:12
@Elbehery Elbehery changed the title ETCD-681: Add etcd-backup-server container within separate deployment ETCD-681: Add etcd-backup-server container within separate daemonset Oct 10, 2024
@Elbehery
Copy link
Contributor Author

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 10, 2024
@Elbehery Elbehery force-pushed the add_backup-server_deployment branch from 125c590 to 3b86c6a Compare October 10, 2024 21:33
@Elbehery
Copy link
Contributor Author

/retest-required

@Elbehery Elbehery force-pushed the add_backup-server_deployment branch 4 times, most recently from d5e259b to d874214 Compare October 11, 2024 00:01
@Elbehery
Copy link
Contributor Author

/retest-required

@Elbehery Elbehery force-pushed the add_backup-server_deployment branch from cb84961 to 4d33aa7 Compare October 11, 2024 09:33
@Elbehery
Copy link
Contributor Author

Tested with this PR atop of 4.18.0-0.ci-2024-10-11-065556 OCP cluster

CR used

apiVersion: config.openshift.io/v1alpha1
kind: Backup
metadata:
  name: default
spec:
  etcd:
    schedule: "* * * * *"
    timeZone: "UTC"
    retentionPolicy:
      retentionType: RetentionNumber
      retentionNumber:
        maxNumberOfBackups: 3

backups are being taken on each master node

backup-server-daemon-set-nzx9p

melbeher@melbeher-mac Downloads % oc rsh -n openshift-etcd  pod/backup-server-daemon-set-nzx9p
Defaulted container "etcd-backup-server" out of: etcd-backup-server, init-env (init)
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:36 2024-10-11_173600
sh-5.1# 
sh-5.1# 
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:36 2024-10-11_173600
sh-5.1# 
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:36 2024-10-11_173600
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:36 2024-10-11_173600
drwxr-xr-x. 2 root root 96 Oct 11 17:37 2024-10-11_173700
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:37 2024-10-11_173700
drwxr-xr-x. 2 root root 96 Oct 11 17:38 2024-10-11_173800
drwxr-xr-x. 2 root root 96 Oct 11 17:39 2024-10-11_173900
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:40 2024-10-11_174000
drwxr-xr-x. 2 root root 96 Oct 11 17:41 2024-10-11_174100
drwxr-xr-x. 2 root root 96 Oct 11 17:42 2024-10-11_174200
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:45 2024-10-11_174500
drwxr-xr-x. 2 root root 96 Oct 11 17:46 2024-10-11_174600
drwxr-xr-x. 2 root root 96 Oct 11 17:47 2024-10-11_174700

pod/backup-server-daemon-set-f65np

melbeher@melbeher-mac Downloads % oc rsh -n openshift-etcd  pod/backup-server-daemon-set-f65np
Defaulted container "etcd-backup-server" out of: etcd-backup-server, init-env (init)
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:41 2024-10-11_174100
drwxr-xr-x. 2 root root 96 Oct 11 17:42 2024-10-11_174200
drwxr-xr-x. 2 root root 96 Oct 11 17:43 2024-10-11_174300
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:45 2024-10-11_174500
drwxr-xr-x. 2 root root 96 Oct 11 17:46 2024-10-11_174600
drwxr-xr-x. 2 root root 96 Oct 11 17:47 2024-10-11_174700

pod/backup-server-daemon-set-cbmvm

melbeher@melbeher-mac Downloads % oc rsh -n openshift-etcd  pod/backup-server-daemon-set-cbmvm
Defaulted container "etcd-backup-server" out of: etcd-backup-server, init-env (init)
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:41 2024-10-11_174100
drwxr-xr-x. 2 root root 96 Oct 11 17:42 2024-10-11_174200
drwxr-xr-x. 2 root root 96 Oct 11 17:43 2024-10-11_174300
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:42 2024-10-11_174200
drwxr-xr-x. 2 root root 96 Oct 11 17:43 2024-10-11_174300
drwxr-xr-x. 2 root root 96 Oct 11 17:44 2024-10-11_174400
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 11 17:45 2024-10-11_174500
drwxr-xr-x. 2 root root 96 Oct 11 17:46 2024-10-11_174600
drwxr-xr-x. 2 root root 96 Oct 11 17:47 2024-10-11_174700

@Elbehery
Copy link
Contributor Author

During testing, I had issues with creating the correct ETCDCTL_KEY ETCDCTL_CERT names.

Therefore, has chosen to use init-container to create the correct names as ENV and expose them to the Etcd-backup-server container

the manifest used for testing

apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: etcd-auto-backup
  name: backup-server-daemon-set
  namespace: openshift-etcd
spec:
  selector:
    matchLabels:
      app: etcd-auto-backup
  template:
    metadata:
      labels:
        app: etcd-auto-backup
    spec:
      initContainers:
      - name: init-env
        image: stakater/base-alpine
        command:
        - /bin/bash
        - -c
        - |
          #!/bin/bash
          ETCDCTL_KEY="/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-NODE_NAME.key"
          ETCDCTL_CERT="/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-NODE_NAME.crt"
          currentNodeName=$NODE_NAME
          subStringToReplace="NODE_NAME"
          new_ETCDCTL_KEY=${ETCDCTL_KEY/$subStringToReplace/$currentNodeName}
          new_ETCDCTL_CERT=${ETCDCTL_CERT/$subStringToReplace/$currentNodeName}
          echo "ETCDCTL_KEY=$new_ETCDCTL_KEY" >> /shared/env-vars.sh
          echo "ETCDCTL_CERT=$new_ETCDCTL_CERT" >> /shared/env-vars.sh
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        volumeMounts:
        - name: shared-data
          mountPath: /shared
      containers:
      - command:
        - /bin/bash
        - -c
        - |
          #!/bin/bash
          set -o allexport
          if [[ -f /shared/env-vars.sh ]]; then
            source /shared/env-vars.sh
          fi
          exec cluster-etcd-operator backup-server \
          --enabled=true \
          --timezone=UTC \
          --schedule="* * * * *" \
          --type=RetentionNumber \
          --maxNumberOfBackups=3 \
          --endpoints=10.0.125.23:2379,10.0.76.40:2379,10.0.48.236:2379 \
          --backupPath=/var/lib/etcd-auto-backup
        env:
        - name: NODE_ip_10_0_76_40_us_west_1_compute_internal_ETCD_NAME
          value: ip-10-0-76-40.us-west-1.compute.internal
        - name: NODE_ip_10_0_48_236_us_west_1_compute_internal_IP
          value: 10.0.48.236
        - name: ETCD_CIPHER_SUITES
          value: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
        - name: ETCD_DATA_DIR
          value: /var/lib/etcd
        - name: ETCD_SOCKET_REUSE_ADDRESS
          value: "true"
        - name: ETCD_IMAGE
          value: registry.build09.ci.openshift.org/ci-ln-7m2pl6t/stable@sha256:a5ffe3489a5c049cb2bae31ba55fa7e3a7654d93d833a78f6c0506d2d7c1b272
        - name: ETCDCTL_API
          value: "3"
        - name: NODE_ip_10_0_76_40_us_west_1_compute_internal_IP
          value: 10.0.76.40
        - name: ALL_ETCD_ENDPOINTS
          value: https://10.0.125.23:2379,https://10.0.48.236:2379,https://10.0.76.40:2379
        - name: ETCD_INITIAL_CLUSTER_STATE
          value: existing
        - name: ETCD_QUOTA_BACKEND_BYTES
          value: "8589934592"
        - name: NODE_ip_10_0_125_23_us_west_1_compute_internal_ETCD_URL_HOST
          value: 10.0.125.23
        - name: NODE_ip_10_0_76_40_us_west_1_compute_internal_ETCD_URL_HOST
          value: 10.0.76.40
        - name: ETCD_ENABLE_PPROF
          value: "true"
        - name: ETCD_EXPERIMENTAL_MAX_LEARNERS
          value: "3"
        - name: ETCD_EXPERIMENTAL_WATCH_PROGRESS_NOTIFY_INTERVAL
          value: 5s
        - name: ETCDCTL_CACERT
          value: /etc/kubernetes/static-pod-certs/configmaps/etcd-all-bundles/server-ca-bundle.crt
        - name: ETCD_EXPERIMENTAL_WARNING_APPLY_DURATION
          value: 200ms
        - name: NODE_ip_10_0_48_236_us_west_1_compute_internal_ETCD_URL_HOST
          value: 10.0.48.236
        - name: ETCD_HEARTBEAT_INTERVAL
          value: "100"
        - name: ETCDCTL_ENDPOINTS
          value: https://10.0.125.23:2379,https://10.0.48.236:2379,https://10.0.76.40:2379
        - name: ETCD_ELECTION_TIMEOUT
          value: "1000"
        - name: NODE_ip_10_0_48_236_us_west_1_compute_internal_ETCD_NAME
          value: ip-10-0-48-236.us-west-1.compute.internal
        - name: NODE_ip_10_0_125_23_us_west_1_compute_internal_IP
          value: 10.0.125.23
        - name: NODE_ip_10_0_125_23_us_west_1_compute_internal_ETCD_NAME
          value: ip-10-0-125-23.us-west-1.compute.internal
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: registry.build09.ci.openshift.org/ci-ln-7m2pl6t/stable@sha256:07725b1d583f4bd4afcbe13121b57c857db961e377cad4f5345b864f7ba4f08e
        imagePullPolicy: IfNotPresent
        name: etcd-backup-server
        resources: {}
        securityContext:
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/lib/etcd
          name: data-dir
        - mountPath: /etc/kubernetes
          name: config-dir
        - mountPath: /var/lib/etcd-auto-backup
          name: etcd-auto-backup-dir
        - mountPath: /etc/kubernetes/static-pod-certs
          name: cert-dir
        - name: shared-data
          mountPath: /shared
      dnsPolicy: ClusterFirst
      nodeSelector:
        node-role.kubernetes.io/master: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
      volumes:
      - hostPath:
          path: /var/lib/etcd
          type: ""
        name: data-dir
      - hostPath:
          path: /etc/kubernetes
          type: ""
        name: config-dir
      - hostPath:
          path: /var/lib/etcd-auto-backup
          type: ""
        name: etcd-auto-backup-dir
      - hostPath:
          path: /etc/kubernetes/static-pod-resources/etcd-certs
          type: ""
        name: cert-dir
      - name: shared-data
        emptyDir: {}
  updateStrategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate

Also all these ENVs are not needed, just the init-container is enough

@Elbehery
Copy link
Contributor Author

/test unit

@Elbehery
Copy link
Contributor Author

Elbehery commented Oct 12, 2024

Result of test final test

CR Used

apiVersion: config.openshift.io/v1alpha1
kind: Backup
metadata:
  name: default
spec:
  etcd:
    schedule: "* * * * *"
    timeZone: "UTC"
    retentionPolicy:
      retentionType: RetentionNumber
      retentionNumber:
        maxNumberOfBackups: 3

Result from openshift-etcd namespace

oc get all -n openshift-etcd                                                              
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
NAME                                                               READY   STATUS      RESTARTS   AGE
pod/backup-server-daemon-set-555xg                                 1/1     Running     1          45m
pod/backup-server-daemon-set-85ptk                                 1/1     Running     1          45m
pod/backup-server-daemon-set-mcbqc                                 1/1     Running     1          45m

NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/etcd   ClusterIP   172.30.154.25   <none>        2379/TCP,9979/TCP   86m

NAME                                      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
daemonset.apps/backup-server-daemon-set   3         3         3       3            3           node-role.kubernetes.io/master=   45m

Comment

  • The result is very solid deployment of backup pods
  • Only 1 restart took place, and the reason of it on all the three pods was due to node issue, see below

oc describe pod/backup-server-daemon-set-555xg -n openshift-etcd

Events:
  Type     Reason           Age                 From               Message
  ----     ------           ----                ----               -------
  Normal   Scheduled        48m                 default-scheduler  Successfully assigned openshift-etcd/backup-server-daemon-set-555xg to ip-10-0-103-219.us-west-2.compute.internal
  Normal   AddedInterface   48m                 multus             Add eth0 [10.128.0.89/23] from ovn-kubernetes
  Normal   Pulled           48m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          48m                 kubelet            Created container init-env
  Normal   Started          48m                 kubelet            Started container init-env
  Normal   Pulled           48m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          48m                 kubelet            Created container etcd-backup-server
  Normal   Started          48m                 kubelet            Started container etcd-backup-server
  Warning  NodeNotReady     32m                 node-controller    Node is not ready
  Warning  FailedMount      30m (x6 over 31m)   kubelet            MountVolume.SetUp failed for volume "kube-api-access-ctrmh" : [object "openshift-etcd"/"kube-root-ca.crt" not registered, object "openshift-etcd"/"openshift-service-ca.crt" not registered]
  Warning  NetworkNotReady  30m (x10 over 31m)  kubelet            network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
  Normal   AddedInterface   30m                 multus             Add eth0 [10.128.0.89/23] from ovn-kubernetes
  Normal   Pulled           30m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          30m                 kubelet            Created container init-env
  Normal   Started          30m                 kubelet            Started container init-env
  Normal   Pulled           30m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          30m                 kubelet            Created container etcd-backup-server
  Normal   Started          30m                 kubelet            Started container etcd-backup-server

oc describe pod/backup-server-daemon-set-85ptk -n openshift-etcd

Events:
  Type     Reason           Age                 From               Message
  ----     ------           ----                ----               -------
  Normal   Scheduled        50m                 default-scheduler  Successfully assigned openshift-etcd/backup-server-daemon-set-85ptk to ip-10-0-121-90.us-west-2.compute.internal
  Normal   AddedInterface   50m                 multus             Add eth0 [10.129.0.65/23] from ovn-kubernetes
  Normal   Pulled           50m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          50m                 kubelet            Created container init-env
  Normal   Started          50m                 kubelet            Started container init-env
  Normal   Pulled           50m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          50m                 kubelet            Created container etcd-backup-server
  Normal   Started          50m                 kubelet            Started container etcd-backup-server
  Warning  NodeNotReady     41m                 node-controller    Node is not ready
  Warning  FailedMount      38m (x6 over 39m)   kubelet            MountVolume.SetUp failed for volume "kube-api-access-dttcb" : [object "openshift-etcd"/"kube-root-ca.crt" not registered, object "openshift-etcd"/"openshift-service-ca.crt" not registered]
  Warning  NetworkNotReady  38m (x10 over 39m)  kubelet            network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
  Normal   AddedInterface   38m                 multus             Add eth0 [10.129.0.65/23] from ovn-kubernetes
  Normal   Pulled           38m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          38m                 kubelet            Created container init-env
  Normal   Started          38m                 kubelet            Started container init-env
  Normal   Pulled           38m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          38m                 kubelet            Created container etcd-backup-server
  Normal   Started          38m                 kubelet            Started container etcd-backup-server

oc describe pod/backup-server-daemon-set-mcbqc -n openshift-etcd

Events:
  Type     Reason           Age                 From               Message
  ----     ------           ----                ----               -------
  Normal   Scheduled        52m                 default-scheduler  Successfully assigned openshift-etcd/backup-server-daemon-set-mcbqc to ip-10-0-20-183.us-west-2.compute.internal
  Normal   AddedInterface   52m                 multus             Add eth0 [10.130.0.77/23] from ovn-kubernetes
  Normal   Pulled           52m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          52m                 kubelet            Created container init-env
  Normal   Started          52m                 kubelet            Started container init-env
  Normal   Pulled           52m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          52m                 kubelet            Created container etcd-backup-server
  Normal   Started          52m                 kubelet            Started container etcd-backup-server
  Warning  NodeNotReady     47m                 node-controller    Node is not ready
  Warning  FailedMount      45m (x6 over 45m)   kubelet            MountVolume.SetUp failed for volume "kube-api-access-pbcx4" : [object "openshift-etcd"/"kube-root-ca.crt" not registered, object "openshift-etcd"/"openshift-service-ca.crt" not registered]
  Warning  NetworkNotReady  45m (x10 over 45m)  kubelet            network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
  Normal   AddedInterface   45m                 multus             Add eth0 [10.130.0.77/23] from ovn-kubernetes
  Normal   Pulled           45m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          45m                 kubelet            Created container init-env
  Normal   Started          45m                 kubelet            Started container init-env
  Normal   Pulled           45m                 kubelet            Container image "registry.build09.ci.openshift.org/ci-ln-8x2i3jk/stable@sha256:38755ec1b503a120c70a314e947a7080dd936830ba4d5ae972005fea03e3858e" already present on machine
  Normal   Created          45m                 kubelet            Created container etcd-backup-server
  Normal   Started          45m                 kubelet            Started container etcd-backup-server

@Elbehery
Copy link
Contributor Author

oc rsh -n openshift-etcd pod/backup-server-daemon-set-555xg

Defaulted container "etcd-backup-server" out of: etcd-backup-server, init-env (init)
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 12 18:40 2024-10-12_184000
drwxr-xr-x. 2 root root 96 Oct 12 18:41 2024-10-12_184100
drwxr-xr-x. 2 root root 96 Oct 12 18:42 2024-10-12_184200
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 12 18:56 2024-10-12_185600
drwxr-xr-x. 2 root root 96 Oct 12 18:57 2024-10-12_185700
drwxr-xr-x. 2 root root 96 Oct 12 18:58 2024-10-12_185800

oc rsh -n openshift-etcd pod/backup-server-daemon-set-85ptk

Defaulted container "etcd-backup-server" out of: etcd-backup-server, init-env (init)
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 12 18:41 2024-10-12_184100
drwxr-xr-x. 2 root root 96 Oct 12 18:42 2024-10-12_184200
drwxr-xr-x. 2 root root 96 Oct 12 18:43 2024-10-12_184300
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 12 18:57 2024-10-12_185700
drwxr-xr-x. 2 root root 96 Oct 12 18:58 2024-10-12_185800
drwxr-xr-x. 2 root root 96 Oct 12 18:59 2024-10-12_185900

oc rsh -n openshift-etcd pod/backup-server-daemon-set-mcbqc

Defaulted container "etcd-backup-server" out of: etcd-backup-server, init-env (init)
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 12 18:59 2024-10-12_185900
drwxr-xr-x. 2 root root 96 Oct 12 19:00 2024-10-12_190000
drwxr-xr-x. 2 root root 96 Oct 12 19:01 2024-10-12_190100
sh-5.1# 
sh-5.1# ls -l /var/lib/etcd-auto-backup/
total 0
drwxr-xr-x. 2 root root 96 Oct 12 19:00 2024-10-12_190000
drwxr-xr-x. 2 root root 96 Oct 12 19:01 2024-10-12_190100
drwxr-xr-x. 2 root root 96 Oct 12 19:02 2024-10-12_190200

@Elbehery Elbehery force-pushed the add_backup-server_deployment branch 2 times, most recently from f6ef487 to 011991c Compare December 15, 2024 11:49
@Elbehery
Copy link
Contributor Author

/retest-required

2 similar comments
@Elbehery
Copy link
Contributor Author

/retest-required

@Elbehery
Copy link
Contributor Author

/retest-required

@Elbehery Elbehery force-pushed the add_backup-server_deployment branch 2 times, most recently from 0e4093a to 7706a91 Compare December 15, 2024 20:36
@Elbehery
Copy link
Contributor Author

/retest-required

1 similar comment
@Elbehery
Copy link
Contributor Author

/retest-required

@Elbehery Elbehery force-pushed the add_backup-server_deployment branch from 7706a91 to 3f289fc Compare January 1, 2025 15:01
@Elbehery
Copy link
Contributor Author

Elbehery commented Jan 1, 2025

/retest-required

@Elbehery Elbehery force-pushed the add_backup-server_deployment branch from 5403ca0 to 62d9bb5 Compare January 2, 2025 08:28
@Elbehery Elbehery force-pushed the add_backup-server_deployment branch from 62d9bb5 to ac9675c Compare January 2, 2025 08:58
@Elbehery
Copy link
Contributor Author

Elbehery commented Jan 2, 2025

/retest-required

1 similar comment
@Elbehery
Copy link
Contributor Author

Elbehery commented Jan 2, 2025

/retest-required

@@ -121,38 +161,14 @@ func (c *PeriodicBackupController) sync(ctx context.Context, _ factory.SyncConte
}
}

if defaultFound {
mirrorPods, err := c.podLister.List(labels.Set{"app": "etcd"}.AsSelector())
if !defaultFound {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need if !defaultFound here? When default DS is found we do continue, so any DS reaching this if is not default anyway

@Elbehery
Copy link
Contributor Author

Elbehery commented Jan 3, 2025

/assign @wking
/assign @hexfusion
/assign @deads2k

@Elbehery Elbehery force-pushed the add_backup-server_deployment branch from 25bb204 to 1f1b1a7 Compare January 5, 2025 20:48
@Elbehery
Copy link
Contributor Author

Elbehery commented Jan 6, 2025

/retest-required

Copy link
Contributor

openshift-ci bot commented Jan 6, 2025

@Elbehery: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ovn-ha-cert-rotation-shutdown 1f1b1a7 link false /test e2e-metal-ovn-ha-cert-rotation-shutdown
ci/prow/e2e-aws-ovn-single-node 1f1b1a7 link true /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-etcd-certrotation 1f1b1a7 link false /test e2e-aws-etcd-certrotation
ci/prow/e2e-aws-etcd-recovery 1f1b1a7 link false /test e2e-aws-etcd-recovery
ci/prow/e2e-metal-ovn-sno-cert-rotation-shutdown 1f1b1a7 link false /test e2e-metal-ovn-sno-cert-rotation-shutdown

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants