Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm: Redpanda controller sidecar not respecting nameOverride or fullNameOverride #1536

Open
ngalanis930 opened this issue Sep 19, 2024 · 2 comments

Comments

@ngalanis930
Copy link

ngalanis930 commented Sep 19, 2024

What happened?

It seems that Redpanda controller sidecar is not respecting nameOverride and/or fullNameOverride.

########################
Current Helm config in nameOverrdide and fullNameOverride:
nameOverride: stggke01-redpanda
fullnameOverride: stggke01-redpanda

##########################
Current Behavior
##########################
the controller is requesting brokers with non-existing names:

Request error, trying another node: Get "https://redpanda-1.stggke01-redpanda.redpanda.svc.cluster.local:9644/v1/cluster/health_overview": dial tcp: lookup redpanda-1.stggke01-redpanda.redpanda.svc.cluster.local on 172.18.0.10:53: no such host

##########################
Expected behavior
##########################
The controller should be able to discover/request the actual brokers' names
stggke01-redpanda-1.stggke01-redpanda.redpanda.svc.cluster.local:9644

What did you expect to happen?

##########################
Expected behavior
##########################
The controller should be able to discover/request the actual brokers' names:
stggke01-redpanda-1.stggke01-redpanda.redpanda.svc.cluster.local:9644

How can we reproduce it (as minimally and precisely as possible)?. Please include values file.

$ helm get values redpanda -n redpanda --all
COMPUTED VALUES:
affinity: {}
auditLogging:
  clientMaxBufferSize: 16777216
  enabled: false
  enabledEventTypes: null
  excludedPrincipals: null
  excludedTopics: null
  listener: internal
  partitions: 12
  queueDrainIntervalMs: 500
  queueMaxBufferSizePerShard: 1048576
  replicationFactor: null
auth:
  sasl:
    bootstrapUser:
      mechanism: SCRAM-SHA-512
    enabled: true
    mechanism: SCRAM-SHA-512
    secretRef: redpanda-users
    users:
    - mechanism: SCRAM-SHA-512
      name: xxxxxxxxxxxxxxxxxx
      password: xxxxxxxxxxxxxxxxxxxxx
clusterDomain: cluster.local
commonLabels:
  app: redpanda
config:
  cluster: {}
  node:
    crash_loop_limit: 5
  pandaproxy_client: {}
  rpk: {}
  schema_registry_client: {}
  tunable:
    compacted_log_segment_size: 67108864
    kafka_connection_rate_limit: 1000
    log_segment_size_max: 268435456
    log_segment_size_min: 16777216
    max_compacted_log_segment_size: 536870912
connectors:
  deployment:
    create: false
  enabled: false
  test:
    create: false
console:
  affinity: {}
  annotations: {}
  automountServiceAccountToken: true
  autoscaling:
    enabled: false
    maxReplicas: 100
    minReplicas: 1
    targetCPUUtilizationPercentage: 80
  commonLabels: {}
  config: {}
  configmap:
    create: false
  console:
    config: {}
  deployment:
    create: false
  enabled: true
  enterprise:
    licenseSecretRef:
      key: ""
      name: ""
  extraContainers: []
  extraEnv: []
  extraEnvFrom: []
  extraVolumeMounts: []
  extraVolumes: []
  fullnameOverride: ""
  global: {}
  image:
    pullPolicy: IfNotPresent
    registry: docker.redpanda.com
    repository: redpandadata/console
    tag: ""
  imagePullSecrets: []
  ingress:
    annotations: {}
    className: null
    enabled: false
    hosts:
    - host: chart-example.local
      paths:
      - path: /
        pathType: ImplementationSpecific
    tls: []
  initContainers:
    extraInitContainers: ""
  livenessProbe:
    failureThreshold: 3
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 1
  nameOverride: ""
  nodeSelector: {}
  podAnnotations: {}
  podLabels:
    name: redpanda-console
  podSecurityContext:
    fsGroup: 99
    runAsUser: 99
  priorityClassName: ""
  readinessProbe:
    failureThreshold: 3
    initialDelaySeconds: 10
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 1
  replicaCount: 1
  resources:
    limits:
      cpu: 500m
      memory: 1024Mi
    requests:
      cpu: 500m
      memory: 512Mi
  secret:
    create: false
    enterprise: {}
    kafka: {}
    login:
      github: {}
      google: {}
      jwtSecret: ""
      oidc: {}
      okta: {}
    redpanda:
      adminApi: {}
  secretMounts: []
  securityContext:
    runAsNonRoot: true
  service:
    annotations: {}
    port: 8080
    targetPort: null
    type: ClusterIP
  serviceAccount:
    annotations: {}
    automountServiceAccountToken: true
    create: true
    name: ""
  strategy: {}
  tests:
    enabled: false
  tolerations: []
  topologySpreadConstraints: []
enterprise:
  license: ""
  licenseSecretRef: {}
external:
  enabled: true
  service:
    enabled: false
  type: NodePort
fullnameOverride: stggke01-redpanda
image:
  pullPolicy: IfNotPresent
  repository: docker.redpanda.com/redpandadata/redpanda
  tag: v24.1.16
imagePullSecrets: []
license_key: ""
license_secret_ref: {}
listeners:
  admin:
    external:
      default:
        advertisedPorts:
        - 31644
        enabled: false
        port: 9645
        tls:
          cert: external
          enabled: true
    port: 9644
    tls:
      cert: default
      enabled: true
      requireClientAuth: false
  http:
    authenticationMethod: null
    enabled: true
    external:
      default:
        advertisedPorts:
        - 32000
        authenticationMethod: null
        enabled: false
        port: 8083
        tls:
          cert: external
          enabled: true
          requireClientAuth: false
    kafkaEndpoint: default
    port: 8082
    tls:
      cert: default
      enabled: true
      requireClientAuth: false
  kafka:
    authenticationMethod: null
    external:
      default:
        advertisedPorts:
        - 31092
        authenticationMethod: null
        enabled: false
        port: 9094
        tls:
          cert: external
          enabled: false
    port: 9093
    tls:
      cert: default
      enabled: false
      requireClientAuth: false
  rpc:
    port: 33145
    tls:
      cert: default
      enabled: true
      requireClientAuth: false
  schemaRegistry:
    authenticationMethod: null
    enabled: true
    external:
      default:
        advertisedPorts:
        - 32001
        authenticationMethod: null
        enabled: false
        port: 8084
        tls:
          cert: external
          enabled: true
          requireClientAuth: false
    kafkaEndpoint: default
    port: 8081
    tls:
      cert: default
      enabled: true
      requireClientAuth: false
logging:
  logLevel: info
  usageStats:
    enabled: false
monitoring:
  enabled: false
  labels: {}
  scrapeInterval: 30s
nameOverride: stggke01-redpanda
nodeSelector:
  node: redpanda
post_install_job:
  affinity: {}
  enabled: true
  labels:
    name: redpanda-post-install-job
  podTemplate:
    annotations: {}
    labels:
      role: job
    spec:
      containers:
      - env: []
        name: post-install
        securityContext: {}
      securityContext: {}
  resources: {}
post_upgrade_job:
  affinity: {}
  enabled: true
  labels:
    name: redpanda-post-upgrade-job
  podTemplate:
    annotations: {}
    labels:
      role: job
    spec:
      containers:
      - env: []
        name: post-upgrade
        securityContext: {}
      securityContext: {}
rackAwareness:
  enabled: false
  nodeAnnotation: topology.kubernetes.io/zone
rbac:
  annotations: {}
  enabled: true
resources:
  cpu:
    cores: 2
  memory:
    container:
      max: 10Gi
      min: 10Gi
serviceAccount:
  annotations: {}
  create: true
  name: ""
statefulset:
  additionalRedpandaCmdFlags: []
  additionalSelectorLabels: {}
  annotations: {}
  budget:
    maxUnavailable: 1
  extraVolumeMounts: ""
  extraVolumes: ""
  initContainerImage:
    repository: harbor.persado.com/hub/library/busybox
    tag: latest
  initContainers:
    configurator:
      extraVolumeMounts: ""
      resources:
        limits:
          cpu: 2
          memory: 1024Mi
        requests:
          cpu: 1
          memory: 512Mi
    extraInitContainers: ""
    fsValidator:
      enabled: false
      expectedFS: xfs
      extraVolumeMounts: ""
      resources: {}
    setDataDirOwnership:
      enabled: true
      extraVolumeMounts: ""
      resources:
        limits:
          cpu: 1
          memory: 512Mi
        requests:
          cpu: 1
          memory: 512Mi
    setTieredStorageCacheDirOwnership:
      extraVolumeMounts: ""
      resources: {}
    tuning:
      extraVolumeMounts: ""
      resources:
        limits:
          cpu: 2
          memory: 1024Mi
        requests:
          cpu: 1
          memory: 512Mi
  livenessProbe:
    failureThreshold: 3
    initialDelaySeconds: 10
    periodSeconds: 10
  nodeSelector:
    node: redpanda
  podAffinity: {}
  podAntiAffinity:
    custom: {}
    topologyKey: kubernetes.io/hostname
    type: hard
    weight: 100
  podTemplate:
    annotations:
      prometheus.io/path: public_metrics
      prometheus.io/port: "9644"
    labels: {}
    spec:
      containers:
      - env: []
        name: redpanda
        securityContext: {}
      securityContext: {}
  priorityClassName: ""
  readinessProbe:
    failureThreshold: 3
    initialDelaySeconds: 1
    periodSeconds: 10
    successThreshold: 1
  replicas: 3
  securityContext:
    fsGroup: 101
    fsGroupChangePolicy: OnRootMismatch
    runAsUser: 101
  sideCars:
    configWatcher:
      enabled: true
      extraVolumeMounts: ""
      resources:
        limits:
          cpu: 100m
          memory: 100Mi
        requests:
          cpu: 100m
          memory: 100Mi
      securityContext: {}
    controllers:
      createRBAC: true
      enabled: true
      healthProbeAddress: :8085
      image:
        repository: docker.redpanda.com/redpandadata/redpanda-operator
        tag: v2.1.27-24.1.11
      metricsAddress: :9082
      resources:
        limits:
          cpu: 500m
          memory: 512Mi
        requests:
          cpu: 100m
          memory: 100Mi
      run:
      - all
      securityContext: {}
  startupProbe:
    failureThreshold: 120
    initialDelaySeconds: 1
    periodSeconds: 10
  terminationGracePeriodSeconds: 90
  tolerations:
  - effect: NoSchedule
    key: app
    operator: Equal
    value: redpanda
  topologySpreadConstraints:
  - labelSelector: null
    matchLabels:
      app.kubernetes.io/component: stggke01-redpanda-statefulset
    maxSkew: 1
    topologyKey: redpanda
    whenUnsatisfiable: DoNotSchedule
  updateStrategy:
    type: RollingUpdate
storage:
  hostPath: /redpanda
  persistentVolume:
    annotations: {}
    enabled: false
    labels: {}
    nameOverwrite: ""
    size: 20Gi
    storageClass: local-path
  tiered:
    config:
      cloud_storage_cache_size: 5368709120
      cloud_storage_enable_remote_read: true
      cloud_storage_enable_remote_write: true
      cloud_storage_enabled: false
    credentialsSecretRef:
      accessKey:
        configurationKey: cloud_storage_access_key
      secretKey:
        configurationKey: cloud_storage_secret_key
    hostPath: ""
    mountType: emptyDir
    persistentVolume:
      annotations: {}
      labels: {}
      storageClass: ""
tests:
  enabled: false
tls:
  certs:
    default:
      caEnabled: true
      duration: 87600h
    external:
      caEnabled: true
      duration: 87600h
  enabled: false
tolerations:
- effect: NoSchedule
  key: app
  operator: Equal
  value: redpanda
tuning:
  tune_aio_events: true

Anything else we need to know?

Works fine without overrides.

Which are the affected charts?

Redpanda

Chart Version(s)

$ helm -n redpanda list 
NAME    	NAMESPACE	REVISION	UPDATED                              	STATUS  	CHART         	APP VERSION
redpanda	redpanda 	1       	2024-09-19 18:27:41.060257 +0300 EEST	deployed	redpanda-5.9.4	v24.1.16

Cloud provider

Applied in GKE v1.29.6

JIRA Link: K8S-371

@ngalanis930
Copy link
Author

ngalanis930 commented Sep 24, 2024

After running another test by deploying redpanda with the below command:

helm install redpanda2 --namespace redpanda -f values.yaml ./

the error from the controller is the different now:

stggke01-redpanda-1 redpanda-controllers Request error, trying another node: Get "https://redpanda2-0.stggke01-redpanda.redpanda.svc.cluster.local:9644/v1/cluster/health_overview": dial tcp: lookup redpanda2-0.stggke01-redpanda.redpanda.svc.cluster.local on 172.18.0.10:53: no such host

So, the controller is grabbing the release name and attach it to its url for some reason.
Is this a normal behaviour?
It would be great if the broker url for the controller is constructed by the overrides to avoid confusion between different deployments and environments.

@ngalanis930
Copy link
Author

Is there any update on this one?
We want to upgrade other redpanda deployments we have with chart version v4 and we want to use this controller, hence we're blocked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant