8000 [BUG] apecloud mysql switchover failed · Issue #9470 · apecloud/kubeblocks · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[BUG] apecloud mysql switchover failed #9470
Closed
@JashBook

Description

@JashBook

Describe the bug
A clear and concise description of what the bug is.

kbcli version
Kubernetes: v1.30.4-vke.4
KubeBlocks: 0.9.4
kbcli: 0.9.4

To Reproduce
Steps to reproduce the behavior:

  1. create cluster
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: apemysql-dnedvj
  namespace: default
spec:
  terminationPolicy: Delete
  componentSpecs:
    - name: mysql
      componentDef: apecloud-mysql
      replicas: 3
      resources:
        requests:
          cpu: 100m
          memory: 0.5Gi
        limits:
          cpu: 100m
          memory: 0.5Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: 
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
kbcli cluster list-instances apemysql-dnedvj
NAME                      NAMESPACE   CLUSTER           COMPONENT   STATUS    ROLE       ACCESSMODE   AZ               CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE     NODE                      CREATED-TIME                 
apemysql-dnedvj-mysql-0   default     apemysql-dnedvj   mysql       Running   leader     <none>       cn-guangzhou-b   100m / 100m          512Mi / 512Mi           data:20Gi   192.168.0.6/192.168.0.6   Jun 20,2025 10:22 UTC+0800   
apemysql-dnedvj-mysql-1   default     apemysql-dnedvj   mysql       Running   follower   <none>       cn-guangzhou-b   100m / 100m          512Mi / 512Mi           data:20Gi   192.168.0.6/192.168.0.6   Jun 20,2025 10:23 UTC+0800   
apemysql-dnedvj-mysql-2   default     apemysql-dnedvj   mysql       Running   follower   <none>       cn-guangzhou-b   100m / 100m          512Mi / 512Mi           data:20Gi   192.168.0.6/192.168.0.6   Jun 20,2025 10:23 UTC+0800   
  1. switchover
kbcli cluster promote apemysql-dnedvj --auto-approve --force=true --component mysql
  1. See error
➜  ~ kubectl get ops 
NAME                           TYPE     CLUSTER           STATUS   PROGRESS   AGE
apemysql-dnedvj-custom-kmhrz   Custom   apemysql-dnedvj   Failed   1/1        16m
➜  ~ 
➜  ~ kubectl get pod |grep apemysql-dnedvj
apemysql-dnedvj-mysql-0                                4/4     Running   0          22m
apemysql-dnedvj-mysql-1                                4/4     Running   0          21m
apemysql-dnedvj-mysql-2                                4/4     Running   0          21m
b71aa765-apemysql-dnedvj-cu-mysql-switchover-0-6vb8j   0/1     Error     0          16m

kubectl get job 
NAME                                             STATUS   COMPLETIONS   DURATION   AGE
b71aa765-apemysql-dnedvj-cu-mysql-switchover-0   Failed   0/1           17m        17m

job yaml

kubectl get job b71aa765-apemysql-dnedvj-cu-mysql-switchover-0 -oyaml
apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2025-06-20T02:28:25Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: kubeblocks
    ops.kubeblocks.io/ops-name: apemysql-dnedvj-custom-kmhrz
    opsrequest.kubeblocks.io/action-name: switchover
  name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
  namespace: default
  ownerReferences:
  - apiVersion: apps.kubeblocks.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: OpsRequest
    name: apemysql-dnedvj-custom-kmhrz
    uid: b71aa765-cb86-402f-b4ca-1474ab21cc36
  resourceVersion: "41759674"
  uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
spec:
  backoffLimit: 0
  completionMode: NonIndexed
  completions: 1
  manualSelector: false
  parallelism: 1
  podReplacementPolicy: TerminatingOrFailed
  selector:
    matchLabels:
      batch.kubernetes.io/controller-uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
  suspend: false
  template:
    metadata:
      creationTimestamp: null
      labels:
        batch.kubernetes.io/controller-uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
        batch.kubernetes.io/job-name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
        controller-uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
        job-name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
    spec:
      containers:
      - command:
        - sh
        - -c
        - "set -e\n# do switchover\nport=${SYNCER_SERVICE_PORT:-3601}\nurl=\"http://${TARGET_POD_IP}:${port}/v1.0/switchover\"
          \nmetadata=${metadata}\nif [ -z ${metadata} ]; then\n  params=\"{\\\"parameters\\\":
          {\\\"primary\\\":\\\"${primary}\\\",\\\"candidate\\\":\\\"${candidate}\\\"}}\"\nelse\n
          \ params=\"{\\\"parameters\\\": {\\\"primary\\\":\\\"${primary}\\\",\\\"candidate\\\":\\\"${candidate}\\\"},
          \\\"metadata\\\": ${metadata}}\"\nfi\necho \"curl ${url}, parameters: ${params}\"\nres=`curl
          -s -X POST -H 'Content-Type: application/json' \"${url}\" -d \"${params}\"`\necho
          \"curl result: ${res}\"\n\n# check if switchover successfully.\necho \"INFO:
          start to check if switchover successfully, timeout is 60s\"\nexecutedUnix=$(date
          +%s)\nwhile true; do\n  sleep 5\n  if [ ! -z ${candidate} ]; then\n     #
          if candidate specified, only check it\n     role=$(kubectl get pod ${candidate}
          -ojson | jq -r '.metadata.labels[\"kubeblocks.io/role\"]')\n     if [ \"$role\"
          == \"primary\" ] || [ \"$role\" == \"leader\" ] || [ \"$role\" == \"master\"
          ]; then\n        echo \"INFO: switchover successfully, ${candidate} is ${role}\"\n
          \       exit 0\n     fi\n  else\n    # check if the candidate instance has
          been promote to primary\n    pods=$(kubectl get pod -l apps.kubeblocks.io/component-name=${KB_COMP_NAME},app.kubernetes.io/instance=${KB_CLUSTER_NAME}
          | awk 'NR > 1 {print $1}')\n    for podName in ${pods}; do\n       if [
          \"${podName}\" != \"${primary}\" ];then\n         role=$(kubectl get pod
          ${podName} -ojson | jq -r '.metadata.labels[\"kubeblocks.io/role\"]')\n
          \        if [ \"$role\" == \"primary\" ] || [ \"$role\" == \"leader\" ]
          || [ \"$role\" == \"master\" ]; then\n            echo \"INFO: switchover
          successfully, ${podName} is ${role}\"\n            exit 0\n         fi\n
          \      fi\n    done\n  fi\n  currentUnix=$(date +%s)\n  diff_time=$((${currentUnix}-${executedUnix}))\n
          \ if [ ${diff_time} -ge 60 ]; then\n    echo \"ERROR: switchover failed.\"\n
          \   exit 1\n  fi\ndone\n"
        env:
        - name: KB_OPS_NAME
          value: apemysql-dnedvj-custom-kmhrz
        - name: KB_OPS_NAMESPACE
          value: default
        - name: KB_CLUSTER_NAME
          value: apemysql-dnedvj
        - name: KB_COMP_NAME
          value: mysql
        - name: KB_CLUSTER_COMP_NAME
          value: apemysql-dnedvj-mysql
        - name: KB_COMP_REPLICAS
          value: "3"
        - name: KB_COMP_HEADLESS_SVC_NAME
          value: apemysql-dnedvj-mysql-headless
        - name: TARGET_POD_IP
          value: 192.168.0.7
        - name: primary
          value: apemysql-dnedvj-mysql-0
        - name: candidate
        image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
        imagePullPolicy: IfNotPresent
        name: switchover
        resources:
          limits:
            cpu: "0"
            memory: "0"
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /scripts
          name: ops-utils
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - -c
        - |
          cp /usr/bin/kubectl /scripts/kubectl;
          echo '/scripts/kubectl -n "${KB_OPS_NAMESPACE}" patch opsrequests.apps.kubeblocks.io "${KB_OPS_NAME}" --subresource=status --type=merge --patch "{\"status\":{\"extras\":$1}}"' >/scripts/patch-extras-status.sh
        image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
        imagePullPolicy: IfNotPresent
        name: ops-utils
        resources:
          limits:
            cpu: "0"
            memory: "0"
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /scripts
          name: ops-utils
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: kb-apemysql-dnedvj
      serviceAccountName: kb-apemysql-dnedvj
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: kb-data
        operator: Equal
        value: "true"
      volumes:
      - emptyDir: {}
        name: ops-utils
status:
  conditions:
  - lastProbeTime: "2025-06-20T02:28:30Z"
    lastTransitionTime: "2025-06-20T02:28:30Z"
    message: Job has reached the specified backoff limit
    reason: BackoffLimitExceeded
    status: "True"
    type: Failed
  failed: 1
  ready: 0
  startTime: "2025-06-20T02:28:25Z"
  terminating: 0
  uncountedTerminatedPods: {}

pod yaml

kubectl get pod b71aa765-apemysql-dnedvj-cu-mysql-switchover-0-6vb8j -oyaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    vke.volcengine.com/cello-pod-evict-policy: allow
  creationTimestamp: "2025-06-20T02:28:25Z"
  generateName: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0-
  labels:
    batch.kubernetes.io/controller-uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
    batch.kubernetes.io/job-name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
    controller-uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
    job-name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
  name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0-6vb8j
  namespace: default
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
    uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
  resourceVersion: "41759673"
  uid: 619288c7-7694-4953-8d1c-fed3f8160f0b
spec:
  containers:
  - command:
    - sh
    - -c
    - "set -e\n# do switchover\nport=${SYNCER_SERVICE_PORT:-3601}\nurl=\"http://${TARGET_POD_IP}:${port}/v1.0/switchover\"
      \nmetadata=${metadata}\nif [ -z ${metadata} ]; then\n  params=\"{\\\"parameters\\\":
      {\\\"primary\\\":\\\"${primary}\\\",\\\"candidate\\\":\\\"${candidate}\\\"}}\"\nelse\n
      \ params=\"{\\\"parameters\\\": {\\\"primary\\\":\\\"${primary}\\\",\\\"candidate\\\":\\\"${candidate}\\\"},
      \\\"metadata\\\": ${metadata}}\"\nfi\necho \"curl ${url}, parameters: ${params}\"\nres=`curl
      -s -X POST -H 'Content-Type: application/json' \"${url}\" -d \"${params}\"`\necho
      \"curl result: ${res}\"\n\n# check if switchover successfully.\necho \"INFO:
      start to check if switchover successfully, timeout is 60s\"\nexecutedUnix=$(date
      +%s)\nwhile true; do\n  sleep 5\n  if [ ! -z ${candidate} ]; then\n     # if
      candidate specified, only check it\n     role=$(kubectl get pod ${candidate}
      -ojson | jq -r '.metadata.labels[\"kubeblocks.io/role\"]')\n     if [ \"$role\"
      == \"primary\" ] || [ \"$role\" == \"leader\" ] || [ \"$role\" == \"master\"
      ]; then\n        echo \"INFO: switchover successfully, ${candidate} is ${role}\"\n
      \       exit 0\n     fi\n  else\n    # check if the candidate instance has been
      promote to primary\n    pods=$(kubectl get pod -l apps.kubeblocks.io/component-name=${KB_COMP_NAME},app.kubernetes.io/instance=${KB_CLUSTER_NAME}
      | awk 'NR > 1 {print $1}')\n    for podName in ${pods}; do\n       if [ \"${podName}\"
      != \"${primary}\" ];then\n         role=$(kubectl get pod ${podName} -ojson
      | jq -r '.metadata.labels[\"kubeblocks.io/role\"]')\n         if [ \"$role\"
      == \"primary\" ] || [ \"$role\" == \"leader\" ] || [ \"$role\" == \"master\"
      ]; then\n            echo \"INFO: switchover successfully, ${podName} is ${role}\"\n
      \           exit 0\n         fi\n       fi\n    done\n  fi\n  currentUnix=$(date
      +%s)\n  diff_time=$((${currentUnix}-${executedUnix}))\n  if [ ${diff_time} -ge
      60 ]; then\n    echo \"ERROR: switchover failed.\"\n    exit 1\n  fi\ndone\n"
    env:
    - name: KB_OPS_NAME
      value: apemysql-dnedvj-custom-kmhrz
    - name: KB_OPS_NAMESPACE
      value: default
    - name: KB_CLUSTER_NAME
      value: apemysql-dnedvj
    - name: KB_COMP_NAME
      value: mysql
    - name: KB_CLUSTER_COMP_NAME
      value: apemysql-dnedvj-mysql
    - name: KB_COMP_REPLICAS
      value: "3"
    - name: KB_COMP_HEADLESS_SVC_NAME
      value: apemysql-dnedvj-mysql-headless
    - name: TARGET_POD_IP
      value: 192.168.0.7
    - name: primary
      value: apemysql-dnedvj-mysql-0
    - name: candidate
    image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
    imagePullPolicy: IfNotPresent
    name: switchover
    resources:
      limits:
        cpu: "0"
        memory: "0"
        vke.volcengine.com/eni-ip: "1"
      requests:
        cpu: "0"
        memory: "0"
        vke.volcengine.com/eni-ip: "1"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /scripts
      name: ops-utils
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-mnhnr
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  initContainers:
  - command:
    - sh
    - -c
    - |
      cp /usr/bin/kubectl /scripts/kubectl;
      echo '/scripts/kubectl -n "${KB_OPS_NAMESPACE}" patch opsrequests.apps.kubeblocks.io "${KB_OPS_NAME}" --subresource=status --type=merge --patch "{\"status\":{\"extras\":$1}}"' >/scripts/patch-extras-status.sh
    image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
    imagePullPolicy: IfNotPresent
    name: ops-utils
    resources:
      limits:
        cpu: "0"
        memory: "0"
      requests:
        cpu: "0"
        memory: "0"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /scripts
      name: ops-utils
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-mnhnr
      readOnly: true
  nodeName: 192.168.0.43
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: kb-apemysql-dnedvj
  serviceAccountName: kb-apemysql-dnedvj
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: kb-data
    operator: Equal
    value: "true"
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: ops-utils
  - name: kube-api-access-mnhnr
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2025-06-20T02:28:29Z"
    status: "False"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2025-06-20T02:28:27Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2025-06-20T02:28:25Z"
    reason: PodFailed
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2025-06-20T02:28:25Z"
    reason: PodFailed
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2025-06-20T02:28:25Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://e3672bef6a0af7584bbb6ed0159bde4d819d60b6ca6e522563ffdfc8ccee3d9e
    image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
    imageID: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools@sha256:68643125a4ffc56bfe4463bf67908f51a236617ea10f0dfb9c2d4500022512c4
    lastState: {}
    name: switchover
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://e3672bef6a0af7584bbb6ed0159bde4d819d60b6ca6e522563ffdfc8ccee3d9e
        exitCode: 7
        finishedAt: "2025-06-20T02:28:27Z"
        reason: Error
        startedAt: "2025-06-20T02:28:27Z"
  hostIP: 192.168.0.43
  hostIPs:
  - ip: 192.168.0.43
  initContainerStatuses:
  - containerID: containerd://db1b1690fe5f305a03602b7a3ac7afd90deb743b50aa634fd21ad6674474a4aa
    image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
    imageID: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools@sha256:68643125a4ffc56bfe4463bf67908f51a236617ea10f0dfb9c2d4500022512c4
    lastState: {}
    name: ops-utils
    ready: true
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://db1b1690fe5f305a03602b7a3ac7afd90deb743b50aa634fd21ad6674474a4aa
        exitCode: 0
        finishedAt: "2025-06-20T02:28:27Z"
        reason: Completed
        startedAt: "2025-06-20T02:28:27Z"
  phase: Failed
  podIP: 192.168.0.48
  podIPs:
  - ip: 192.168.0.48
  qosClass: BestEffort
  startTime: "2025-06-20T02:28:25Z"

logs pod

kubectl logs b71aa765-apemysql-dnedvj-cu-mysql-switchover-0-6vb8j
Defaulted container "switchover" out of: switchover, ops-utils (init)
curl http://192.168.0.7:3601/v1.0/switchover, parameters: {"parameters": {"primary":"apemysql-dnedvj-mysql-0","candidate":""}}

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0