CN117076180B

CN117076180B - Information processing method, device, equipment and computer readable storage medium

Info

Publication number: CN117076180B
Application number: CN202311130250.2A
Authority: CN
Inventors: 李崇良; 程康; 杨旭荣
Original assignee: Sangfor Technologies Co Ltd
Current assignee: Sangfor Technologies Co Ltd
Priority date: 2023-09-04
Filing date: 2023-09-04
Publication date: 2024-05-28
Anticipated expiration: 2043-09-04
Also published as: CN117076180A

Abstract

The embodiment of the application discloses an information processing method, which comprises the following steps: under the condition that the access of the first storage volume on the service node fails, mounting the second storage volume to the first binding node based on a mounting process in a working state; wherein the mounting process runs on a mounting node; binding the service node and the first binding node through the mounting node under the condition that the second storage volume is mounted by the first binding node, and sharing the first storage volume to the second storage volume of the first binding node through sharing propagation; accessing the first storage volume in the first binding node through the mount process. The embodiment of the application also discloses an information processing device, equipment and a computer readable storage medium. The embodiment of the application can improve the speed of fault recovery and shorten the time of fault recovery.

Description

Information processing method, device, equipment and computer readable storage medium

Technical Field

The present application relates to the field of virtualization technologies, and in particular, to an information processing method, an information processing device, an information processing apparatus, and a computer readable storage medium.

Background

Currently, when a storage volume on a service node is accessed through a user space file system (FILESYSTEM IN Userspace, FUSE) sub-mount process, and when the situation that the storage volume on the service node cannot be accessed due to abnormal service node or normal service node but not mounted storage volume, the FUSE sub-mount process exits abnormally, a fault recovery scheme based on an active probe (Liveness Probe) or a fault recovery scheme based on node (Pod) reconstruction can be adopted; the fault recovery scheme based on Liveness Probe is to perform health check on a storage volume of a service node through Liveness Probe configured in the service node, and when it is detected that the storage volume is unavailable due to abnormal exit of a FUSE sub-mount process, CSI DRIVER needs to automatically pull up the FUSE sub-mount process in time (i.e. has the self-healing capability of a mount point), and restart the service node to mount the storage volume again, thereby completing fault recovery; the failure recovery scheme based on Pod rebuild is that the target Pod can perform health check on the storage volume on each service Pod, and rebuild a Pod to drive CSI DRIVER to re-derive the FUSE sub-mount process when the target Pod checks that the storage volume is not available, so as to complete failure recovery. However, in the failure recovery scheme based on Pod reconstruction, kubernetes does not support a mechanism for health checking of storage volumes on each service Pod by the target Pod; most CSI DRIVER in the fault recovery scheme based on Liveness Probe cannot automatically pull up the FUSE sub-mount process, and even if the fault can be recovered after the FUSE sub-mount process is automatically pulled up, the service node is required to be restarted; based on this, the fault recovery scheme in the related art has the problems of long fault recovery time and slow speed.

Disclosure of Invention

In order to solve the above technical problems, it is desirable in the embodiments of the present application to provide an information processing method, apparatus, device, and computer readable storage medium, which can solve the problems of longer fault recovery time and slower fault recovery speed in the fault recovery scheme in the related art.

The technical scheme of the application is realized as follows:

An information processing method, the method comprising:

under the condition that the access of the first storage volume on the service node fails, mounting the second storage volume to the first binding node based on a mounting process in a working state; wherein the mounting process runs on a mounting node;

Binding the service node and the first binding node through the mounting node under the condition that the second storage volume is mounted by the first binding node, and sharing the first storage volume to the second storage volume of the first binding node through sharing propagation;

accessing the first storage volume in the first binding node through the mount process.

In the above solution, in the case that the access of the first storage volume on the service node fails, the mounting the second storage volume to the first binding node based on the mounting process in the working state includes:

Determining that the first storage volume on the service node fails to access under the condition that the first binding node is abnormal, or the second storage volume is not mounted on the first binding node, or the mounting node is abnormal;

and controlling the mounting node to restart through the proxy node in the container arranging node so as to restore the mounting process to a working state.

In the above scheme, the mounting the second storage volume to the first binding node based on the mounting process in the working state includes:

under the condition that the first binding node is monitored to be abnormal, a second binding node is established based on the mounting node through a target interface;

and mounting the second storage volume to the second binding node based on the mounting process.

In the above solution, before the mounting the second storage volume to the first binding node based on the mounting process in the working state in the case that the access of the first storage volume on the service node fails, the method includes:

Determining a first number of third storage volumes on the service node if the service node anomaly is detected;

and processing the third storage volume based on the first number and the target number.

In the above solution, the processing the third storage volume based on the first number and the target number includes:

determining a second number based on the first number and the target number, if the first number is greater than the target number;

deleting the second number of third storage volumes on the service node by the mount node.

In the above scheme, the method further comprises:

And under the condition that the service node is not mounted with the first storage volume, re-binding the service node and the first binding node to a successful binding state through the mounting node so as to share the first storage volume to a second storage volume of the first binding node through sharing propagation.

In the above scheme, the method further comprises:

acquiring a mounting node with an association relation with the service node under the condition that the service node does not exist;

And deleting the mounting node with the association relation with the service node based on the target driving container in the container arrangement node through the target interface.

An information processing apparatus, the apparatus comprising:

The processing unit is used for mounting the second storage volume to the first binding node based on the mounting process in the working state under the condition that the access of the first storage volume on the service node fails; wherein the mounting process runs on a mounting node;

The sharing unit is used for binding the service node and the first binding node through the mounting node under the condition that the second storage volume is mounted on the first binding node, and sharing the first storage volume to the second storage volume of the first binding node through sharing propagation;

And the access unit is used for accessing the first storage volume in the first binding node through the mounting process.

An information processing apparatus, the apparatus comprising: a processor, a memory, and a communication bus;

The communication bus is used for realizing communication connection between the processor and the memory;

the processor is configured to execute the information processing program stored in the memory to implement the steps of the information processing method as described above.

A computer-readable storage medium storing one or more programs executable by one or more processors to implement the steps of the information processing method as described above.

In the information processing method, the device, the equipment and the computer readable storage medium provided by the embodiment of the application, under the condition that a first storage volume on a service node fails to be accessed, a second storage volume is mounted to a first binding node based on a mounting process in a working state, and the mounting process is operated on the mounting node, then under the condition that the second storage volume is mounted on the first binding node, the service node and the first binding node are bound through the mounting node, the first storage volume is shared to a second storage volume of the first binding node through sharing propagation, and then the first storage volume in the first binding node is accessed through the mounting process, so that, as the mounting process is operated in the mounting node instead of being operated in a target driving container as in the related art, when the mounting process exits abnormally and causes the first storage volume on the service node to fail to be accessed, the mounting process can be restored to the working state through restarting the mounting node; and because the service node and the first binding node are bound and mounted, the first storage volume of the service node can be mapped into the second storage volume of the first binding node, and then the access to the first storage volume of the service node is realized by accessing the first storage volume of the first binding node instead of restarting the service node or newly creating a target node to drive the target drive container to derive the mounting process again as in the related art, thereby improving the speed of fault recovery and shortening the time of fault recovery.

Drawings

Fig. 1 is a schematic flow chart of an information processing method according to an embodiment of the present application;

Fig. 2 is a schematic flow diagram of a binding service node and a first binding node in an information processing method according to an embodiment of the present application;

FIG. 3 is a flowchart of another information processing method according to an embodiment of the present application;

fig. 4 is a schematic flow diagram of a monitoring service node and a first binding node in an information processing method according to an embodiment of the present application;

fig. 5 is a schematic flow chart of cleaning a mounting node in an information processing method according to an embodiment of the present application;

FIG. 6 is a flowchart of another information processing method according to an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating an information processing method for accessing a first storage volume according to an embodiment of the present application;

FIG. 8 is a schematic diagram of fault recovery in an information processing method according to an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating another method for accessing a first storage volume according to an embodiment of the present application;

Fig. 10 is a schematic flow chart of creating a mounting node in an information processing method according to an embodiment of the present application;

Fig. 11 is a schematic flow chart of initializing a first binding node in an information processing method according to an embodiment of the present application;

Fig. 12 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Before describing the embodiments of the present application, technical terms in the present application are explained as follows:

Kubernetes: abbreviated as K8s, is an open source system for automatically deploying, expanding and managing "containerized (containerized) applications".

Container storage interface (Container Storage Interface, CSI): in an attempt to establish a specification of an industry standard interface, any storage system is exposed to its own container workload by means of the CSI container orchestration system.

Container storage interface driver (CSI DRIVER): is a plug-in developed based on CSI specifications for providing specific storage for container workloads, typically provided by third party storage providers.

The user space file system (FILESYSTEM IN Userspace, FUSE), which is a Unix-like computer operating system oriented software interface, enables non-privileged users to create their own file system without editing kernel code, and Linux currently supports this through kernel modules.

An embodiment of the present application provides an information processing method, which may be applied to an information processing apparatus, as shown with reference to fig. 1, including the steps of:

And step 101, under the condition that the access of the first storage volume on the service node fails, mounting the second storage volume to the first binding node based on the mounting process in the working state.

Wherein the mount process runs on the mount node.

In the embodiment of the application, the first storage volume is a storage volume mounted on the service node, and the second storage volume is a storage volume mounted on the first binding node; the method comprises the steps that a first storage volume on an access service node fails due to the abnormality of the service node, a first storage volume on the service node fails due to the fact that a first storage volume is not mounted on the service node, a first storage volume on the access service node fails due to the abnormality of a first binding node, a first storage volume on the access service node fails due to the fact that a second storage volume is not mounted on the first binding node, and a first storage volume on the access service node fails due to the fact that a mounting node is abnormal; mount nodes may refer to nodes dedicated to performing persistent volume (PersistentVolume, PV) Mount tasks, mount nodes may be represented by Mount Pod, and Mount nodes are dynamically created by CSI DRIVER in K8s through a target interface.

In one possible implementation, the mount process may refer to a FUSE child mount process; the target interface may be referred to as NodePublishVolume interface.

Step 102, binding the service node and the first binding node through the mounting node under the condition that the second storage volume is mounted on the first binding node, and sharing the first storage volume to the second storage volume of the first binding node through sharing propagation.

In the embodiment of the application, the first binding node mounts the second storage volume to indicate that the first binding node (i.e. the intermediate mount point) is ready, and the service node mounts the first storage volume to indicate that the service node is ready; when the first binding node is ready, the first binding node and the service node can be bound and mounted (bind mount) through the mounting node, and then the service node shares (i.e. maps) the own first mounting volume onto a second mounting volume of the first binding node by using mounting propagation (mount Propagation); note that, the configuration mount Propagation is HostToContainer.

In the embodiment of the application, a main container fuse-main of a mounting node (i.e. Mount Pod) can run a mounting process to Mount a second storage volume to a first binding node, after the main container fuse-main is started, a postStart hook can be used to execute a command < csi-driver-binary > bind-Mount < stageMountDir > < podMountDir >, so as to bind a mounting intermediate mounting point (i.e. stageMountDir) to a service node (i.e. podMountDir), the service node and the first binding node can be specifically bound by the mounting node, as shown in fig. 2, the number of times of mounting the binding mounting intermediate mounting point to the mounting point of the service node can be counted by adopting an initialized counter, and when the number of times of mounting is more than or equal to 60, namely, the waiting time-out is indicated and the binding mounting fails; when the number of mounting times is smaller than 60, whether the intermediate mounting point mounts the second storage volume (i.e. the file system) or not can be determined through a code mount New ("") IsLikely NotMountPoint (stageMountDir), and when the intermediate mounting point mounts the second storage volume (i.e. the intermediate mounting point is ready), binding mounting the intermediate mounting point to the service node can be realized through a code mount-o bind stageMountDir podMountDir; when the second storage volume is not mounted on the intermediate mounting point, the mounting frequency can be determined after waiting for 1 second, and when the mounting frequency is still less than 60 times, whether the second storage volume is mounted on the intermediate mounting point can be continuously determined.

It should be noted that an initiation probe (Startup Probe) may be configured to initiate probing of the first binding node and the service node to detect whether the first binding node and the service node are ready.

Step 103, accessing the first storage volume in the first binding node through the mounting process.

In the embodiment of the application, after the service node shares the first storage volume of the service node to the second storage volume of the first binding node, the first storage volume in the first binding node can be directly accessed through the mounting process.

According to the information processing method provided by the embodiment of the application, as the mounting process runs in the mounting node instead of the target drive container as in the related art, when the mounting process exits abnormally and the first storage volume on the service node fails to be accessed, the mounting process can be restored to the working state by restarting the mounting node; and because the service node and the first binding node are bound and mounted, the first storage volume of the service node can be mapped into the second storage volume of the first binding node, and then the access to the first storage volume of the service node is realized by accessing the first storage volume of the first binding node instead of restarting the service node or newly creating a target node to drive the target drive container to derive the mounting process again as in the related art, thereby improving the speed of fault recovery and shortening the time of fault recovery.

Based on the foregoing embodiments, an embodiment of the present application provides an information processing method, as shown with reference to fig. 3, including the steps of:

Step 201, the information processing device determines that the first storage volume on the service node fails to access when it is detected that the first binding node is abnormal, or the first binding node does not mount the second storage volume, or the mounting node is abnormal.

In the embodiment of the application, whether the mounting process is abnormal or not can be determined by monitoring whether the first binding node is abnormal or not and whether the first binding node binds the second storage volume or not, specifically, the mounting process is abnormal can be determined under the condition that the first binding node is abnormal, and the mounting process is abnormal can also be determined under the condition that the second storage volume is not mounted on the first binding node.

In the embodiment of the application, the first storage volume on the service node cannot be accessed under the condition that the first binding node is abnormal, or the second storage volume is not mounted on the first binding node, or the mounting node is abnormal.

In a possible implementation manner, referring to fig. 4, before monitoring the first binding node (i.e. the intermediate mount point), the information of the intermediate mount point may be obtained by the code notMnt, err =mount.new (") IsLikelyNotMountPoint (stageMountDir), and then, judging whether the intermediate mount point exists or not, and when the intermediate mount point does not exist, creating the intermediate mount point may be achieved by the code os.mkdirall (stageMountDir, os.FileMode (0750)); when the middle mounting point exists, whether the middle mounting point is damaged (namely whether the middle mounting point is abnormal) can be monitored, and when the middle mounting point is damaged, the middle mounting point can be unloaded through codes umount stageMountDir, and at the moment, the failure of accessing the first storage volume on the service node can be determined; when the first binding node is monitored to be normal (i.e. the intermediate mounting point is not damaged), whether the intermediate mounting point mounts the file system (i.e. the second storage volume) needs to be judged again, when the intermediate mounting point mounts the file system, the intermediate mounting point is indicated to be normal, the mounting point information of the service node can be obtained at the moment, when the intermediate mounting point is monitored to not mount the file system, the access failure of the first storage volume on the service node can be determined at the moment.

And 202, the information processing equipment controls the mounting node to restart through the proxy node in the container arrangement node so as to restore the mounting process to a working state.

In the embodiment of the application, after the abnormality of the first binding node is detected, or the second storage volume is not mounted on the first binding node, or the mounting node is abnormal, the agent node can be prompted to restart the mounting node, and the mounting node can restart the mounting process (namely, the mounting process is restored to the working state); in one possible implementation, the proxy node may refer to Kubelet;

it should be noted that, step 203 may be performed after step 202, or steps 204 to 205 may be performed:

Step 203, under the condition that the first binding node is not mounted with the second storage volume or the mounting node is abnormal, the information processing device mounts the second storage volume to the first binding node based on the mounting process in the working state.

Wherein the mount process runs on the mount node.

In the embodiment of the application, under the condition that the first binding node is not mounted with the second storage volume or the mounting node is abnormal, the mounting node is controlled to restart through the proxy node so as to restore the mounting process to the working state, and then the mounting process restoring the working state can mount the second storage volume to the first binding node.

In one possible implementation, the second storage volume may be referred to as/var/lib/csi/volumes/< uniqueID.

Step 204, under the condition that the first binding node is abnormal, the information processing device creates a second binding node based on the mounting node through the target interface.

Wherein the second binding node is a binding node having the same function as the first binding node, and the target interface may refer to an interface that may create the binding node.

In the embodiment of the application, under the condition that the first binding node is monitored to be abnormal, the first binding node can be deleted through the target interface, and then a second binding node is newly built.

Step 205, the information processing apparatus mounts the second storage volume to the second binding node based on the mounting process.

In the embodiment of the application, after the second binding node is newly established, the second storage volume can be mounted on the newly established second binding node through the mounting process.

It should be noted that, step 206 may be performed after both step 203 and step 205:

step 206, in the case that the first binding node mounts the second storage volume, the information processing device binds the service node and the first binding node through the mounting node, and shares the first storage volume to the second storage volume of the first binding node through sharing propagation.

Step 207, the information processing apparatus accesses the first storage volume in the first binding node through the mount process.

Based on the foregoing embodiment, the foregoing embodiment may further include the steps of:

Step 208, in the case that the service node does not exist, the information processing device acquires the mounting node having an association relationship with the service node.

In the embodiment of the application, when the service node using the first storage volume is deleted, the target drive container firstly acquires the mounting node with the association relation with the service node corresponding to the first storage volume, and deletes the mounting node with the association relation with the service node corresponding to the first storage volume through the target interface.

Step 209, the information processing device deletes the mounting node having the association relation with the service node based on the target driving container in the container arrangement node through the target interface.

In an embodiment of the present application, the target drive container is completely isolated from the first node, and the target drive container is completely stateless; deleting the mounting node with the association relation with the service node through the target interface based on the target driving container in the container arranging node; in one possible implementation, the target drive vessel may be referred to as CSI DRIVER.

It should be noted that CSI DRIVER also periodically checks all Mount Pod, that is, checks whether there is a remaining Mount Pod that is not deleted in the case that the service node is deleted, and if there is a remaining Mount Pod that is not deleted, the remaining Mount Pod may be deleted by the cleaning flow shown in the following fig. 5: the Mount Pod list (i.e., mountPods) is obtained through an application programming interface (Application Programming Interface, API) of K8s, and can be implemented by code mountPods kubectl get pods-n kube-system-field-selector spec. Nodename= < CSI DRIVER-selector app. Kubernetes. Io/type = Mount-Pod, after which the counter is initialized and traversed mountPods with the counter, in case mountPods is not traversed, code Pod ID mountPodName: obtaining a Mount Pod name by = mountPods [ i ]. Name bizPodID = mountPods [ i ]. Annotations [ "Mount-pod.pod" ], obtaining the Identity (ID) of the service node having an association relationship with the service node from annotations of the Mount Pod, then judging whether the field (uid) is the service node (bizPodID) with the Identity or not, if not, deleting the Mount Pod by code kubectl delete pods-n kube-system < mountPodName >, obtaining a Mount Pod name from mountPods until all Mount pods in mountPods are traversed, and finishing cleaning all Mount pods.

It should be noted that, since CSI DRIVER is completely stateless and completely isolated from the Mount Pod, when CSI DRIVER fails, CSI DRIVER may be restarted by the proxy node (i.e., kubelet), that is, CSI DRIVER may be restored to the working state.

Based on the foregoing embodiments, an embodiment of the present application provides an information processing method, as shown with reference to fig. 6, including the steps of:

In step 301, in case of monitoring that the service node is abnormal, the information processing apparatus determines a first number of third storage volumes on the service node.

Wherein the third storage volume refers to the total storage volume (i.e., file system) mounted on the service node, and the first number refers to the total number of storage volumes mounted on the service node.

In the embodiment of the application, the service node can be monitored through Liveness Probe, and as shown in fig. 4, whether the mounting point of the service node is damaged or not can be judged through a code mount. When Liveness Probe detects that the service node is abnormal (i.e. the detection fails), i.e. when the mounting point of the service node is damaged as shown in fig. 4, counting the number of all storage volumes mounted on the service node to obtain a first number, i.e. obtaining a first number (count) as shown in fig. 4. In one possible implementation, liveness Probe may execute the command < csi-driver-binary > check < podMountDir > to enable monitoring of the service node.

It should be noted that, before judging whether the mounting point of the service node is damaged, as shown in fig. 4, the obtaining of the mounting point information of the service node may be implemented by the code notMnt, err =mount.new ("). IsLikelyNotMountPoint (podMountDir), then, whether the mounting point of the service node exists may be judged, and in the case that the mounting point of the service node does not exist, the creating of the mounting point of the service node may be implemented by the code os.mkdirall (podMountDir, os.FileMode (0750)). And under the condition that the mounting point of the service node exists, judging whether the mounting point of the service node is damaged or not.

It should be noted that Liveness Probe is executed only after the probe (Startup Probe) is successfully started, so that the first binding node and the service node (i.e., the container) are prevented from being killed by Liveness Probe due to too slow starting of the mount process.

Step 302, the information processing apparatus processes the third storage volume based on the first number and the target number.

Wherein the target number may be determined based on historical experimental data.

In the embodiment of the application, the first quantity and the target quantity can be compared in size, and then the third storage volume is processed based on the comparison result; in one possible implementation, the target number may be set to 1.

It should be noted that, step 302 may be implemented in the following manner:

In step 302a1, the information processing apparatus determines a second number based on the first number and the target number, in a case where the first number is larger than the target number.

In the embodiment of the present application, the second number is the number of third storage volumes that need to be deleted; and under the condition that the comparison result represents that the first quantity is larger than the target quantity, subtracting operation can be carried out on the first quantity and the target quantity to obtain a second quantity.

Step 302a2, the information processing apparatus deletes, through the mounting node, the second number of third storage volumes on the service node.

In the embodiment of the application, when the service node is repaired, the third storage volume mounted on the service node is deleted, so that the useless mounting can be cleaned; it should be noted that, when cleaning the third storage volumes mounted on the service node, all the third storage volumes mounted on the service node cannot be deleted, otherwise, mount Propagation is disabled, and then the service node needs to be restarted to enable mount Propagation to be effective, so that when cleaning the third storage volumes mounted on the service node, at least one third storage volume mounted on the service node needs to be reserved, which is just the purpose of setting the target number.

In one possible implementation, as shown in FIG. 4, when the first number (count) is greater than 1, the second number may be obtained by count-1, after which the second number of third storage volumes on the service node is deleted (i.e., the third storage volumes on the service node are offloaded to the target number).

Step 303, in the case that the access of the first storage volume on the service node fails, the information processing apparatus mounts the second storage volume to the first binding node based on the mounting process in the working state.

Wherein the mount process runs on the mount node.

Step 304, under the condition that the first binding node mounts the second storage volume, the information processing device re-binds the service node and the first binding node to a successful binding state through the mounting node under the condition that the service node is not mounted with the first storage volume, so as to share the first storage volume to the second storage volume of the first binding node through sharing propagation.

In the embodiment of the present application, when the first binding node mounts the second storage volume, referring to fig. 4, it may be monitored by the active probe whether the service node mounts the first storage volume (i.e. the file system), and when it is monitored that the service node does not mount the first storage volume (i.e. the detection fails), the mounting node may implement rebinding of the service node and the first binding node to the binding success state (i.e. the binding mounting intermediate node) by the code mount-o bind stageMountDir podMountDir, and then may propagate the shared first storage volume to the second storage volume of the first binding node by mounting; referring to fig. 4, when it is monitored that the service node mounts the first storage volume, it is indicated that the mounting point of the service node is normal; it should be noted that, referring to fig. 4, when the first number (count) is less than or equal to 1, the intermediate node needs to be bound and mounted.

Step 305, the information processing apparatus accesses the first storage volume in the first binding node through the mount process.

It should be noted that, as shown in fig. 7, a process of accessing a first storage volume (FilesystemVolume) of a service Node in the related art, specifically, an agent Node (Kubelet) in a k8s Node (k 8s Node) accesses a container storage interface Node Server (i.e., CSI Node Server) through a google remote procedure call (Google Remote Procedure Calls, GRPC), then creates (i.e., fork) a mount process (i.e., FUSE sub mount process) in a target drive container (i.e., CSI DRIVER) in the CSI Node Server, and accesses the first storage volume on the service Node through the FUSE sub mount process; however, when accessing the first storage volume on the service node through the FUSE sub-mount process, the first storage volume on the service node cannot be accessed due to abnormal operation of the service node or abnormal operation of the service node but not mounting of the first storage volume, which would cause abnormal exit of the FUSE sub-mount process; at this time, a Liveness Probe-based fault recovery scheme may be adopted in the related art for fault recovery; alternatively, in the related art, a failure recovery scheme based on Pod reconstruction as shown in fig. 8 may be used for performing failure recovery, that is, a mechanism that a target node (target Pod) independently checks the health of a storage volume on each service node (i.e., service Pod) is used, so as to reconstruct a Pod to drive CSI DRIVER to re-derive the FUSE sub-mount process when the storage volume is checked to be unavailable, thereby completing the failure recovery. However, the fault recovery scheme in the related art needs CSI DRIVER to have a self-healing capability of the mounting point or needs to have a capability of independently checking the health of the service Pod through the target Pod.

Based on this, in the information processing method provided in the embodiment of the present application, as shown in fig. 9, when the service Pod uses the first storage volume (i.e. FilesystemVolume), kubelet controls CSI DRIVER in the CSI Node Server to dynamically create a Mount Pod and a first binding Node (i.e. an intermediate Mount point) for executing a Mount task, a Mount process (i.e. a FUSE sub Mount process) is running in the Mount Pod, then the Mount Pod created in CSI DRIVER mounts the second storage volume to an intermediate Mount point, then CSI DRIVER maps the Mount point bind Mount of the intermediate Node to the service Node, and then uses HostToContainer Mount Propagation to map the service Node to the intermediate Mount point, so that the Mount Node can be not restarted when the Mount point self-healing is performed later.

In the embodiment of the present application, a specific process of creating a Mount Pod may be shown in fig. 10, and may be represented by code uniqueID: =md5 (< service Pod ID >: volume ID >) mountPodName: =mount-pod- < uniqueID >

StageMountDir =/var/lib/csi/volumes/< uniquelD > podMountDir = req. Gettargetpath () to enable generation of a unique Identification (ID) as a name and an intermediate Mount point of a Mount Pod, then enabling determination of whether a Mount Pod exists through code namespace:="kube-system" exsitingMountPod:=k8sClient.GetPod(Namespace,mountPodName) exsitingMountPod!=nil, generating configuration information of the Mount Pod when the Mount Pod does not exist, then creating the Mount Pod based on the configuration information of the Mount Pod, then waiting for the service node to successfully Mount the first storage volume (i.e., the service node is ready); when the Mount Pod exists, the first storage volume is indicated to be mounted by the service node, and repeated mounting is not needed.

In the embodiment of the application, after the intermediate mounting point (i.e. the first binding node) is created, the intermediate mounting point is also required to be initialized, namely, the intermediate mounting point can be initialized through an init container execution command < csi-driver-binary > init-stage-dir < stageMountDir > of the mounting node, wherein the csi-driver-binary is a binary file of CSI DRIVER, and stageMountDir is the intermediate mounting point; specifically, in the initializing process of the intermediate mount point, as shown in fig. 11, it may be determined whether the intermediate mount point exists, when the intermediate mount point exists, it is determined whether the intermediate mount point mounts the second storage volume, when the intermediate mount point mounts the second storage volume, the second storage volume mounted by the intermediate mount point is unloaded to complete initializing the intermediate mount point, and when the intermediate mount point does not mount the second storage volume, i.e., the intermediate mount point is in an initialized state, i.e., initializing the intermediate mount point is completed; when the intermediate mount point does not exist, the intermediate mount point is created to complete initializing the intermediate mount point.

In other embodiments of the present application, a fault recovery scheme based on Liveness Probe (scheme one), a fault recovery scheme based on Pod reconstruction (scheme two) and a fault recovery scheme provided by the embodiments of the present application (i.e., a high-availability scheme based on Mount Pod, bind Mount and HostToContainer Mount Propagation) in the related art may be tested, where the test is described as follows: 1. business scenario: configuration file: some service configurations including system background, etc.; front page static file: static files such as html, css, js, png are included; platform customization file: the system user can customize a login page background image, a brand LOGO and a product LOGO according to the own requirements; work order attachment: the user can submit worksheets to the administrator for processing, such as data restoration, performance optimization and the like, and upload some accessories for supplementary explanation. 2. Test scenario: a failure scenario is now simulated by executing kill commands on CSI DRIVER to kill mount processes (FUSE child mount processes) and CSI DRIVER container host processes, respectively, on 50 nodes on a certain K8s worker node using a certain CSI DRIVER storage volume. 3. Configuration description: the Liveness Probe of which the lines are required to be configured in the service Pod in the Liveness Probe-based fault recovery scheme is subjected to the CSI Volume health check （livenessProbe:exec:command:-"ls/var/lib";failureThreshold:1;initialDelaySeconds:10;periodSeconds:5;successThreshold:1;timeoutSeconds:3）;, and configuration parameters can be determined according to the implementation effect of a specific scheme in the Pod-based fault recovery scheme; configuration mountPropagation: hostToContainer in a highly available scheme based on Mount Pod, bind Mount and HostToContainer Mount Propagation. 4. Test results: in the first scheme, 5 seconds are required for fault recovery based on abnormal exit of the FUSE sub-mount process, and 251 seconds are required for fault recovery based on abnormal exit of CSI DRIVER containers; in the second scheme, 10 seconds are required for fault recovery based on abnormal exit of FUSE sub-mount processes, and 330 seconds are required for fault recovery based on abnormal exit of CSI DRIVER containers; in the high available schemes based on mount Pod, bind mount and HostToContainer mount Propagation, 3 seconds are required for recovery from failure based on abnormal exit of FUSE child mount process, and 5 seconds are required for recovery from failure based on abnormal exit of CSI DRIVER container; obviously, the fault recovery scheme based on the Mount Pod, bind Mount and HostToContainer Mount Propagation has certain universality, is transparent to the service, and is a fault fast recovery scheme of CSI DRIVER with a second level of fault recovery; the above configuration explanation shows that the fault recovery scheme provided by the embodiment of the application has little influence on the service, only one row of fixed configuration is needed, and the row of configuration can be even omitted along with the development of K8s later; meanwhile, the fault recovery scheme provided by the embodiment of the application can greatly shorten CSI DRIVER fault recovery time and reduce the influence degree of CSI DRIVER faults on services, and greatly improves the high availability of CSI DRIVER.

Meanwhile, the application uses the independent Mount Pod for running the mounting process, so that CSI DRIVER and FUSE sub-mounting processes can be independently upgraded, and the Mount Pod and FUSE sub-mounting processes can also be upgraded together; the service is not influenced when CSI DRIVER is upgraded, and the service is influenced in a short time when the mounting process is upgraded; independent observation can be carried out on each mounting process, so that observability is stronger; the isolation between the mounting processes is stronger, and the fine resource control can be performed.

Based on the foregoing embodiments, an embodiment of the present application provides an information processing apparatus that can be applied to the information processing method provided by the embodiments corresponding to fig. 1, 3, and 6, and referring to fig. 12, the information processing apparatus 4 may include: a processing unit 41, a sharing unit 42 and an access unit 43, wherein:

a processing unit 41, configured to mount, in case of failure of access to the first storage volume on the service node, the second storage volume to the first binding node based on the mounting process in the working state; wherein, the mounting process runs on the mounting node;

a sharing unit 42, configured to bind the service node and the first binding node through the mounting node and adopt a second storage volume that shares the first storage volume to the first binding node when the second storage volume is mounted on the first binding node;

an accessing unit 43, configured to access the first storage volume in the first binding node through the mount process.

In other embodiments of the application, the processing unit 41 is further configured to perform the following steps:

under the condition that the first binding node is monitored to be abnormal, or the second storage volume is not mounted on the first binding node, or the mounting node is abnormal, determining that the first storage volume on the service node fails to access;

and controlling the mounting node to restart through the proxy node in the container arrangement node so as to restore the mounting process to a working state.

Under the condition that the first binding node is monitored to be abnormal, a second binding node is established based on the mounting node through the target interface;

The third storage volume is processed based on the first number and the target number.

Acquiring a mounting node with an association relation with a service node under the condition that the service node does not exist;

It should be noted that, the specific implementation process of the steps executed by each unit in this embodiment may refer to the implementation process in the information processing method provided in the embodiment corresponding to fig. 1, fig. 3, and fig. 6, which is not described herein again.

According to the information processing device provided by the embodiment of the application, as the mounting process runs in the mounting node instead of the target drive container as in the related art, when the mounting process abnormally exits to cause the access failure of the first storage volume on the service node, the mounting process can be restored to the working state by restarting the mounting node; and because the service node and the first binding node are bound and mounted, the first storage volume of the service node can be mapped into the second storage volume of the first binding node, and then the access to the first storage volume of the service node is realized by accessing the first storage volume of the first binding node instead of restarting the service node or newly creating a target node to drive the target drive container to derive the mounting process again as in the related art, thereby improving the speed of fault recovery and shortening the time of fault recovery.

Based on the foregoing embodiments, an embodiment of the present application provides an information processing apparatus that can be applied to the information processing method provided by the embodiments corresponding to fig. 1, 3, and 6, and referring to fig. 13, the information processing apparatus 5 may include: a processor 51, a memory 52 and a communication bus 53, wherein:

a communication bus 53 for enabling communication connection between the processor 51 and the memory 52;

The processor 51 is configured to execute an information processing program in the memory 52 to realize the steps of:

Under the condition that the access of the first storage volume on the service node fails, mounting the second storage volume to the first binding node based on a mounting process in a working state; wherein, the mounting process runs on the mounting node;

Under the condition that the first binding node mounts the second storage volume, binding the service node and the first binding node through the mounting node, and sharing the first storage volume to the second storage volume of the first binding node through sharing propagation;

A first storage volume in a first binding node is accessed through an mount process.

In other embodiments of the present application, the processor 51 is configured to mount the second storage volume to the first binding node based on the mount process in the working state in case that the access of the first storage volume on the service node fails to execute the information processing program in the memory 52, so as to implement the following steps:

In other embodiments of the present application, the processor 51 is configured to execute the mounting process in the memory 52 based on the working state to mount the second storage volume to the first binding node, so as to implement the following steps:

In other embodiments of the present application, in the case that the processor 51 is configured to execute the information processing program in the memory 52 and the access of the first storage volume on the service node fails, before the second storage volume is mounted to the first binding node based on the mounting process in the working state, the following steps may be further implemented:

In other embodiments of the present application, the processor 51 is configured to execute the information processing program in the memory 52 to process the third storage volume based on the first number and the target number to implement the following steps:

In other embodiments of the present application, the processor 51 is configured to execute the information processing program in the memory 52, and the following steps may be implemented:

In other embodiments of the present application, the processor 51 is configured to execute an information processing program in the memory 52 to implement the following steps:

It should be noted that, in the specific implementation process of the steps executed by the processor in this embodiment, reference may be made to the implementation process in the information processing method provided in the embodiment corresponding to fig. 1, fig. 3 and fig. 6, which is not repeated here.

According to the information processing device provided by the embodiment of the application, since the mounting process runs in the mounting node instead of the target drive container as in the related art, when the mounting process abnormally exits to cause the access failure of the first storage volume on the service node, the mounting process can be restored to the working state by restarting the mounting node; and because the service node and the first binding node are bound and mounted, the first storage volume of the service node can be mapped into the second storage volume of the first binding node, and then the access to the first storage volume of the service node is realized by accessing the first storage volume of the first binding node instead of restarting the service node or newly creating a target node to drive the target drive container to derive the mounting process again as in the related art, thereby improving the speed of fault recovery and shortening the time of fault recovery.

Based on the foregoing embodiments, embodiments of the present application provide a computer-readable storage medium storing one or more programs executable by one or more processors to implement steps in the information processing method provided by the corresponding embodiments of fig. 1, 3, and 6.

The computer readable storage medium may be a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable programmable Read Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), a magnetic random access Memory (Ferromagnetic Random Access Memory, FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a compact disk Read Only Memory (Compact Disc Read-Only Memory, CD-ROM), or the like; but may be various electronic devices such as mobile phones, computers, tablet devices, personal digital assistants, etc., that include one or any combination of the above-mentioned memories.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. An information processing method, characterized in that the method comprises:

Under the condition that the access of the first storage volume on the service node fails, mounting the second storage volume to the first binding node based on a mounting process in a working state; wherein the mounting process runs on a mounting node; the first storage volume and the second storage volume are different storage volumes;

accessing the first storage volume in the first binding node through the mount process;

Under the condition that the access of the first storage volume on the service node fails, before the second storage volume is mounted to the first binding node based on the mounting process in the working state, the method further comprises the following steps:

Determining a first number of third storage volumes on the service node if the service node anomaly is detected; the third storage volume is a total storage volume mounted on the service node; the first number is a total number of storage volumes mounted on the service node;

2. The method of claim 1, wherein the mounting the second storage volume to the first binding node based on the mounting process in the operational state in the event of a failure of access to the first storage volume on the service node comprises:

3. The method of claim 2, wherein the mounting the second storage volume to the first binding node based on the mounting process in an operational state comprises:

4. The method of claim 1, wherein the processing the third storage volume based on the first number and the target number comprises:

5. The method according to claim 1, wherein the method further comprises:

6. The method according to claim 1, wherein the method further comprises:

7. An information processing apparatus, characterized in that the apparatus comprises:

The processing unit is used for mounting the second storage volume to the first binding node based on the mounting process in the working state under the condition that the access of the first storage volume on the service node fails; wherein the mounting process runs on a mounting node; the first storage volume and the second storage volume are different storage volumes;

an access unit, configured to access, through the mount process, the first storage volume in the first binding node;

8. An information processing apparatus, characterized in that the apparatus comprises: a processor, a memory, and a communication bus;

the processor is configured to execute an information processing program stored in the memory to realize the steps of the information processing method according to any one of claims 1 to 6.

9. A computer-readable storage medium storing one or more programs executable by one or more processors to implement the steps of the information processing method of any one of claims 1 to 6.