Closed
Description
Describe the bug
A clear and concise description of what the bug is.
kbcli version
Kubernetes: v1.30.4-vke.4
KubeBlocks: 0.9.4
kbcli: 0.9.4
To Reproduce
Steps to reproduce the behavior:
- create cluster
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
name: apemysql-dnedvj
namespace: default
spec:
terminationPolicy: Delete
componentSpecs:
- name: mysql
componentDef: apecloud-mysql
replicas: 3
resources:
requests:
cpu: 100m
memory: 0.5Gi
limits:
cpu: 100m
memory: 0.5Gi
volumeClaimTemplates:
- name: data
spec:
storageClassName:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
kbcli cluster list-instances apemysql-dnedvj
NAME NAMESPACE CLUSTER COMPONENT STATUS ROLE ACCESSMODE AZ CPU(REQUEST/LIMIT) MEMORY(REQUEST/LIMIT) STORAGE NODE CREATED-TIME
apemysql-dnedvj-mysql-0 default apemysql-dnedvj mysql Running leader <none> cn-guangzhou-b 100m / 100m 512Mi / 512Mi data:20Gi 192.168.0.6/192.168.0.6 Jun 20,2025 10:22 UTC+0800
apemysql-dnedvj-mysql-1 default apemysql-dnedvj mysql Running follower <none> cn-guangzhou-b 100m / 100m 512Mi / 512Mi data:20Gi 192.168.0.6/192.168.0.6 Jun 20,2025 10:23 UTC+0800
apemysql-dnedvj-mysql-2 default apemysql-dnedvj mysql Running follower <none> cn-guangzhou-b 100m / 100m 512Mi / 512Mi data:20Gi 192.168.0.6/192.168.0.6 Jun 20,2025 10:23 UTC+0800
- switchover
kbcli cluster promote apemysql-dnedvj --auto-approve --force=true --component mysql
- See error
➜ ~ kubectl get ops
NAME TYPE CLUSTER STATUS PROGRESS AGE
apemysql-dnedvj-custom-kmhrz Custom apemysql-dnedvj Failed 1/1 16m
➜ ~
➜ ~ kubectl get pod |grep apemysql-dnedvj
apemysql-dnedvj-mysql-0 4/4 Running 0 22m
apemysql-dnedvj-mysql-1 4/4 Running 0 21m
apemysql-dnedvj-mysql-2 4/4 Running 0 21m
b71aa765-apemysql-dnedvj-cu-mysql-switchover-0-6vb8j 0/1 Error 0 16m
kubectl get job
NAME STATUS COMPLETIONS DURATION AGE
b71aa765-apemysql-dnedvj-cu-mysql-switchover-0 Failed 0/1 17m 17m
job yaml
kubectl get job b71aa765-apemysql-dnedvj-cu-mysql-switchover-0 -oyaml
apiVersion: batch/v1
kind: Job
metadata:
creationTimestamp: "2025-06-20T02:28:25Z"
generation: 1
labels:
app.kubernetes.io/managed-by: kubeblocks
ops.kubeblocks.io/ops-name: apemysql-dnedvj-custom-kmhrz
opsrequest.kubeblocks.io/action-name: switchover
name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
namespace: default
ownerReferences:
- apiVersion: apps.kubeblocks.io/v1alpha1
blockOwnerDeletion: true
controller: true
kind: OpsRequest
name: apemysql-dnedvj-custom-kmhrz
uid: b71aa765-cb86-402f-b4ca-1474ab21cc36
resourceVersion: "41759674"
uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
spec:
backoffLimit: 0
completionMode: NonIndexed
completions: 1
manualSelector: false
parallelism: 1
podReplacementPolicy: TerminatingOrFailed
selector:
matchLabels:
batch.kubernetes.io/controller-uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
suspend: false
template:
metadata:
creationTimestamp: null
labels:
batch.kubernetes.io/controller-uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
batch.kubernetes.io/job-name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
controller-uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
job-name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
spec:
containers:
- command:
- sh
- -c
- "set -e\n# do switchover\nport=${SYNCER_SERVICE_PORT:-3601}\nurl=\"http://${TARGET_POD_IP}:${port}/v1.0/switchover\"
\nmetadata=${metadata}\nif [ -z ${metadata} ]; then\n params=\"{\\\"parameters\\\":
{\\\"primary\\\":\\\"${primary}\\\",\\\"candidate\\\":\\\"${candidate}\\\"}}\"\nelse\n
\ params=\"{\\\"parameters\\\": {\\\"primary\\\":\\\"${primary}\\\",\\\"candidate\\\":\\\"${candidate}\\\"},
\\\"metadata\\\": ${metadata}}\"\nfi\necho \"curl ${url}, parameters: ${params}\"\nres=`curl
-s -X POST -H 'Content-Type: application/json' \"${url}\" -d \"${params}\"`\necho
\"curl result: ${res}\"\n\n# check if switchover successfully.\necho \"INFO:
start to check if switchover successfully, timeout is 60s\"\nexecutedUnix=$(date
+%s)\nwhile true; do\n sleep 5\n if [ ! -z ${candidate} ]; then\n #
if candidate specified, only check it\n role=$(kubectl get pod ${candidate}
-ojson | jq -r '.metadata.labels[\"kubeblocks.io/role\"]')\n if [ \"$role\"
== \"primary\" ] || [ \"$role\" == \"leader\" ] || [ \"$role\" == \"master\"
]; then\n echo \"INFO: switchover successfully, ${candidate} is ${role}\"\n
\ exit 0\n fi\n else\n # check if the candidate instance has
been promote to primary\n pods=$(kubectl get pod -l apps.kubeblocks.io/component-name=${KB_COMP_NAME},app.kubernetes.io/instance=${KB_CLUSTER_NAME}
| awk 'NR > 1 {print $1}')\n for podName in ${pods}; do\n if [
\"${podName}\" != \"${primary}\" ];then\n role=$(kubectl get pod
${podName} -ojson | jq -r '.metadata.labels[\"kubeblocks.io/role\"]')\n
\ if [ \"$role\" == \"primary\" ] || [ \"$role\" == \"leader\" ]
|| [ \"$role\" == \"master\" ]; then\n echo \"INFO: switchover
successfully, ${podName} is ${role}\"\n exit 0\n fi\n
\ fi\n done\n fi\n currentUnix=$(date +%s)\n diff_time=$((${currentUnix}-${executedUnix}))\n
\ if [ ${diff_time} -ge 60 ]; then\n echo \"ERROR: switchover failed.\"\n
\ exit 1\n fi\ndone\n"
env:
- name: KB_OPS_NAME
value: apemysql-dnedvj-custom-kmhrz
- name: KB_OPS_NAMESPACE
value: default
- name: KB_CLUSTER_NAME
value: apemysql-dnedvj
- name: KB_COMP_NAME
value: mysql
- name: KB_CLUSTER_COMP_NAME
value: apemysql-dnedvj-mysql
- name: KB_COMP_REPLICAS
value: "3"
- name: KB_COMP_HEADLESS_SVC_NAME
value: apemysql-dnedvj-mysql-headless
- name: TARGET_POD_IP
value: 192.168.0.7
- name: primary
value: apemysql-dnedvj-mysql-0
- name: candidate
image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
imagePullPolicy: IfNotPresent
name: switchover
resources:
limits:
cpu: "0"
memory: "0"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /scripts
name: ops-utils
dnsPolicy: ClusterFirst
initContainers:
- command:
- sh
- -c
- |
cp /usr/bin/kubectl /scripts/kubectl;
echo '/scripts/kubectl -n "${KB_OPS_NAMESPACE}" patch opsrequests.apps.kubeblocks.io "${KB_OPS_NAME}" --subresource=status --type=merge --patch "{\"status\":{\"extras\":$1}}"' >/scripts/patch-extras-status.sh
image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
imagePullPolicy: IfNotPresent
name: ops-utils
resources:
limits:
cpu: "0"
memory: "0"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /scripts
name: ops-utils
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kb-apemysql-dnedvj
serviceAccountName: kb-apemysql-dnedvj
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: kb-data
operator: Equal
value: "true"
volumes:
- emptyDir: {}
name: ops-utils
status:
conditions:
- lastProbeTime: "2025-06-20T02:28:30Z"
lastTransitionTime: "2025-06-20T02:28:30Z"
message: Job has reached the specified backoff limit
reason: BackoffLimitExceeded
status: "True"
type: Failed
failed: 1
ready: 0
startTime: "2025-06-20T02:28:25Z"
terminating: 0
uncountedTerminatedPods: {}
pod yaml
kubectl get pod b71aa765-apemysql-dnedvj-cu-mysql-switchover-0-6vb8j -oyaml
apiVersion: v1
kind: Pod
metadata:
annotations:
vke.volcengine.com/cello-pod-evict-policy: allow
creationTimestamp: "2025-06-20T02:28:25Z"
generateName: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0-
labels:
batch.kubernetes.io/controller-uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
batch.kubernetes.io/job-name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
controller-uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
job-name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0-6vb8j
namespace: default
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: Job
name: b71aa765-apemysql-dnedvj-cu-mysql-switchover-0
uid: 4c9bb41d-5362-4e63-ba73-bcf9f8276cb6
resourceVersion: "41759673"
uid: 619288c7-7694-4953-8d1c-fed3f8160f0b
spec:
containers:
- command:
- sh
- -c
- "set -e\n# do switchover\nport=${SYNCER_SERVICE_PORT:-3601}\nurl=\"http://${TARGET_POD_IP}:${port}/v1.0/switchover\"
\nmetadata=${metadata}\nif [ -z ${metadata} ]; then\n params=\"{\\\"parameters\\\":
{\\\"primary\\\":\\\"${primary}\\\",\\\"candidate\\\":\\\"${candidate}\\\"}}\"\nelse\n
\ params=\"{\\\"parameters\\\": {\\\"primary\\\":\\\"${primary}\\\",\\\"candidate\\\":\\\"${candidate}\\\"},
\\\"metadata\\\": ${metadata}}\"\nfi\necho \"curl ${url}, parameters: ${params}\"\nres=`curl
-s -X POST -H 'Content-Type: application/json' \"${url}\" -d \"${params}\"`\necho
\"curl result: ${res}\"\n\n# check if switchover successfully.\necho \"INFO:
start to check if switchover successfully, timeout is 60s\"\nexecutedUnix=$(date
+%s)\nwhile true; do\n sleep 5\n if [ ! -z ${candidate} ]; then\n # if
candidate specified, only check it\n role=$(kubectl get pod ${candidate}
-ojson | jq -r '.metadata.labels[\"kubeblocks.io/role\"]')\n if [ \"$role\"
== \"primary\" ] || [ \"$role\" == \"leader\" ] || [ \"$role\" == \"master\"
]; then\n echo \"INFO: switchover successfully, ${candidate} is ${role}\"\n
\ exit 0\n fi\n else\n # check if the candidate instance has been
promote to primary\n pods=$(kubectl get pod -l apps.kubeblocks.io/component-name=${KB_COMP_NAME},app.kubernetes.io/instance=${KB_CLUSTER_NAME}
| awk 'NR > 1 {print $1}')\n for podName in ${pods}; do\n if [ \"${podName}\"
!= \"${primary}\" ];then\n role=$(kubectl get pod ${podName} -ojson
| jq -r '.metadata.labels[\"kubeblocks.io/role\"]')\n if [ \"$role\"
== \"primary\" ] || [ \"$role\" == \"leader\" ] || [ \"$role\" == \"master\"
]; then\n echo \"INFO: switchover successfully, ${podName} is ${role}\"\n
\ exit 0\n fi\n fi\n done\n fi\n currentUnix=$(date
+%s)\n diff_time=$((${currentUnix}-${executedUnix}))\n if [ ${diff_time} -ge
60 ]; then\n echo \"ERROR: switchover failed.\"\n exit 1\n fi\ndone\n"
env:
- name: KB_OPS_NAME
value: apemysql-dnedvj-custom-kmhrz
- name: KB_OPS_NAMESPACE
value: default
- name: KB_CLUSTER_NAME
value: apemysql-dnedvj
- name: KB_COMP_NAME
value: mysql
- name: KB_CLUSTER_COMP_NAME
value: apemysql-dnedvj-mysql
- name: KB_COMP_REPLICAS
value: "3"
- name: KB_COMP_HEADLESS_SVC_NAME
value: apemysql-dnedvj-mysql-headless
- name: TARGET_POD_IP
value: 192.168.0.7
- name: primary
value: apemysql-dnedvj-mysql-0
- name: candidate
image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
imagePullPolicy: IfNotPresent
name: switchover
resources:
limits:
cpu: "0"
memory: "0"
vke.volcengine.com/eni-ip: "1"
requests:
cpu: "0"
memory: "0"
vke.volcengine.com/eni-ip: "1"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /scripts
name: ops-utils
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-mnhnr
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
initContainers:
- command:
- sh
- -c
- |
cp /usr/bin/kubectl /scripts/kubectl;
echo '/scripts/kubectl -n "${KB_OPS_NAMESPACE}" patch opsrequests.apps.kubeblocks.io "${KB_OPS_NAME}" --subresource=status --type=merge --patch "{\"status\":{\"extras\":$1}}"' >/scripts/patch-extras-status.sh
image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
imagePullPolicy: IfNotPresent
name: ops-utils
resources:
limits:
cpu: "0"
memory: "0"
requests:
cpu: "0"
memory: "0"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /scripts
name: ops-utils
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-mnhnr
readOnly: true
nodeName: 192.168.0.43
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kb-apemysql-dnedvj
serviceAccountName: kb-apemysql-dnedvj
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: kb-data
operator: Equal
value: "true"
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: ops-utils
- name: kube-api-access-mnhnr
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2025-06-20T02:28:29Z"
status: "False"
type: PodReadyToStartContainers
- lastProbeTime: null
lastTransitionTime: "2025-06-20T02:28:27Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2025-06-20T02:28:25Z"
reason: PodFailed
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2025-06-20T02:28:25Z"
reason: PodFailed
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2025-06-20T02:28:25Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://e3672bef6a0af7584bbb6ed0159bde4d819d60b6ca6e522563ffdfc8ccee3d9e
image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
imageID: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools@sha256:68643125a4ffc56bfe4463bf67908f51a236617ea10f0dfb9c2d4500022512c4
lastState: {}
name: switchover
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: containerd://e3672bef6a0af7584bbb6ed0159bde4d819d60b6ca6e522563ffdfc8ccee3d9e
exitCode: 7
finishedAt: "2025-06-20T02:28:27Z"
reason: Error
startedAt: "2025-06-20T02:28:27Z"
hostIP: 192.168.0.43
hostIPs:
- ip: 192.168.0.43
initContainerStatuses:
- containerID: containerd://db1b1690fe5f305a03602b7a3ac7afd90deb743b50aa634fd21ad6674474a4aa
image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools:0.9.4
imageID: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/kubeblocks-tools@sha256:68643125a4ffc56bfe4463bf67908f51a236617ea10f0dfb9c2d4500022512c4
lastState: {}
name: ops-utils
ready: true
restartCount: 0
started: false
state:
terminated:
containerID: containerd://db1b1690fe5f305a03602b7a3ac7afd90deb743b50aa634fd21ad6674474a4aa
exitCode: 0
finishedAt: "2025-06-20T02:28:27Z"
reason: Completed
startedAt: "2025-06-20T02:28:27Z"
phase: Failed
podIP: 192.168.0.48
podIPs:
- ip: 192.168.0.48
qosClass: BestEffort
startTime: "2025-06-20T02:28:25Z"
logs pod
kubectl logs b71aa765-apemysql-dnedvj-cu-mysql-switchover-0-6vb8j
Defaulted container "switchover" out of: switchover, ops-utils (init)
curl http://192.168.0.7:3601/v1.0/switchover, parameters: {"parameters": {"primary":"apemysql-dnedvj-mysql-0","candidate":""}}
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
Additional context
Add any other context about the problem here.