WO2022083690A1

WO2022083690A1 - Data management method and apparatus, and device, computer storage medium and program

Info

Publication number: WO2022083690A1
Application number: PCT/CN2021/125290
Authority: WO
Inventors: 焦宏宇; 邱路达; 丁易元; 张若君; 孙芮; 邓虹雨
Original assignee: 深圳前海微众银行股份有限公司
Priority date: 2020-10-22
Filing date: 2021-10-21
Publication date: 2022-04-28
Also published as: CN112311596B; CN112311596A

Abstract

Provided are a data management method and apparatus, and an electronic device and a computer storage medium. The method comprises: establishing a data storage architecture, wherein the data storage architecture comprises a parent cluster and a plurality of child clusters, each child cluster in the plurality of child clusters comprises one master node and at least one slave node, the nodes of the parent cluster are the master nodes of the plurality of child clusters, and the nodes of the parent cluster comprise one master node of the parent cluster and at least one slave node of the parent cluster; and writing data into each node of each child cluster on the basis of the data storage architecture.

Description

Data management method, apparatus, device, computer storage medium and program

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on the Chinese patent application with the application number of 202011139659.7 and the application date of October 22, 2020 and the title of "Data Management Method, Apparatus, Equipment and Computer Storage Medium", and claims the priority of the Chinese patent application. The entire contents of the Chinese patent application are incorporated herein by reference.

technical field

This application relates to the distributed structure technology of financial technology (Fintech), and relates to, but is not limited to, a data management method, apparatus, electronic device, computer storage medium and computer program.

Background technique

With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually transforming into financial technology. However, due to the security and real-time requirements of the financial industry, higher requirements are also placed on technology.

At present, in the field of financial technology, data management can be implemented based on a single-layer distributed architecture. The single-layer distributed architecture is not suitable for super-large distributed clusters. The premise of data management in a distributed cluster is to determine the master node through cluster election. However, when cluster election is performed based on a single-layer distributed architecture, each node must send election messages to other nodes in the cluster, which is likely to cause a message storm. , so that most of the traffic is occupied by the packets sent by the cluster election, which affects the transmission of normal packets.

SUMMARY OF THE INVENTION

The embodiments of the present application provide a data management method, apparatus, electronic device, computer storage medium and computer, which can solve the problem of message storm caused by electing a master node in the prior art.

The technical solutions of the embodiments of the present application are implemented as follows:

The embodiment of the present application provides a data management method, the method includes:

Establish a data storage architecture, the data storage architecture includes a parent cluster (Parent Cluster) and a plurality of sub-clusters (Child Cluster), each sub-cluster in the multiple sub-clusters includes a master node and at least one slave node, the parent cluster's The node is the master node of the multiple sub-clusters, and the nodes of the parent cluster include a master node of the parent cluster and at least one slave node of the parent cluster;

Based on the data storage architecture, data is written to each node of each sub-cluster.

In some embodiments of the present application, establishing a data storage architecture includes:

Determine the master node and health indicators of each sub-cluster;

Based on the health index of each sub-cluster, among the master nodes of each sub-cluster, select a master node as the master node of the parent cluster;

The data storage architecture is established based on the master node of each sub-cluster and the master node of the parent cluster.

In some embodiments of the present application, the method further includes:

The health indicator of the master node of each subcluster is derived based on at least one of the following: the number of failures, the duration of at least one failure, and the length of time since the last failure.

It can be seen that the embodiment of the present application can accurately obtain the health index of the master node of each sub-cluster based on the fault information, thus, it is beneficial to accurately select a node with a higher health degree among the master nodes of the sub-cluster As the master node of the parent cluster,

In some embodiments of the present application, selecting a master node as the master node of the parent cluster from the master nodes of the respective sub-clusters based on the health degree indicators of the respective sub-clusters, including:

Based on the health index of the master node of each sub-cluster and the first score value, among the master nodes of each sub-cluster, a master node is selected as the master node of the parent cluster; wherein the first score The value is a preconfigured value.

It can be seen that in the embodiment of the present application, the selection of the master node of the parent cluster is not only based on the ID of the node, but the selection of the master node of the parent cluster can be based on comprehensive consideration of the score corresponding to the ID of the node and the health index. The master node is beneficial to reduce the problem of frequent replacement of master nodes caused by frequent addition of nodes with larger ID scores to the cluster.

In some embodiments of the present application, based on the health index of each sub-cluster and the first score value, among the master nodes of each sub-cluster, a master node is selected as the master node of the parent cluster ,include:

Based on the data interaction between the master nodes of each sub-cluster, the master node of each sub-cluster obtains the second score value of the master node of each sub-cluster, where the second score value is the sum of the health index and the first score value ;

Based on the second score value of the master node of each sub-cluster, the master node of each sub-cluster elects the master node of the parent cluster.

It can be understood that the embodiment of the present application can obtain the master node of the parent cluster through the election of the sub-cluster based on the second score value, and the second score value is not only related to the ID of the node, but also related to the health degree index of the node. Therefore, The embodiment of the present application can reduce the problem of frequent replacement of master nodes caused by nodes with larger ID scores frequently joining the cluster.

Based on the data interaction of the master node of each sub-cluster, the master node of each sub-cluster acquires the first message of each sub-cluster, where the first message represents a join message (Join Message) of the parent cluster;

In the case where the master node of each sub-cluster acquires the first message of each sub-cluster, based on the health index of each sub-cluster, among the master nodes of each sub-cluster, one master node is selected as the The master node of the parent cluster.

It can be seen that the embodiment of the present application can initiate the process of selecting the master node of the parent cluster at an appropriate time based on the first message of the master node of each sub-cluster, which is conducive to accurately selecting the parent node from the master nodes of each sub-cluster. The master node of the cluster.

In some embodiments of the present application, enabling the master node of each sub-cluster to obtain the first message of each sub-cluster based on the data interaction between the master nodes of each sub-cluster includes:

Selecting a seed cluster (Seed Cluster) in each of the sub-clusters;

After the master node (Seed Cluster Master) of the seed cluster receives the first message sent by the master node of other sub-clusters, it interacts with the data of the other sub-clusters through the seed cluster, so that the The master node obtains the first message of each sub-cluster; wherein, the other sub-clusters are sub-clusters of the multiple sub-clusters except the seed cluster; the first message is the master of the other sub-clusters. The node is sent based on the address of the master node of the seed cluster, and the address of the master node of the seed cluster is information predetermined by the master nodes of the other sub-clusters.

It can be seen that the embodiment of the present application can realize the data interaction of the master nodes of each sub-cluster through the selection of seed clusters, and further, is conducive to selecting the master node of the parent cluster from the master nodes of each sub-cluster.

In some embodiments of the present application, the method further includes:

When the slave node of the parent cluster fails, delete the failed slave node from the parent cluster, and use the master node of the parent cluster to send the member change information of the parent cluster to the slave node of the parent cluster;

After the master node of the parent cluster receives the first message sent by the master node of the first sub-cluster, it joins the master node of the first sub-cluster in the parent cluster, and sends a message of the parent cluster to each node of the parent cluster. Member change information; the first sub-cluster represents the sub-cluster to which the faulty slave node belongs, and the master node of the first sub-cluster represents the master node re-selected from each node of the first sub-cluster; The first message represents a message for joining the parent cluster.

It can be understood that the embodiment of the present application can update the member change information of the parent cluster in time when the slave node of the parent cluster fails, which is beneficial to accurately realize the master node of the parent cluster when the master node of the parent cluster is elected subsequently. election.

In some embodiments of the present application, the method further includes:

In the case that the master node of the parent cluster fails, select a node from other nodes of the parent cluster as the master node of the parent cluster;

After receiving the first message sent by the master node of the second sub-cluster, add the master node of the second sub-cluster to the parent cluster, and send the member change information of the parent cluster to each node of the parent cluster; The second sub-cluster represents the sub-cluster to which the faulty master node belongs, and the first message represents the message of joining the parent cluster.

It can be seen that in the embodiment of the present application, when the main node of the parent cluster fails, the main node information of the parent cluster can be re-elected in time, and the main node of the second sub-cluster can be received after the main node of the second sub-cluster is selected. The join message is conducive to the subsequent re-election of the master node of the parent cluster.

In some embodiments of the present application, the method further includes:

Obtain a data read command, where the data read command carries the data read address of any node;

Based on the data read address, data is read from the arbitrary node.

It can be seen that since the data in each sub-cluster is consistent, the embodiment of the present application can read data based on any node in any sub-cluster, which is easy to implement.

In some embodiments of the present application, the writing data to each node of each sub-cluster based on the data storage architecture includes:

obtaining a data writing instruction, where the data writing instruction carries the data to be written;

Send the data writing instruction to the master node of the parent cluster, the master node of the parent cluster sends the data writing instruction to the master node of each sub-cluster, and the master node of each sub-cluster will The data writing instruction is sent to each slave node of the sub-cluster, so that each node of each sub-cluster writes the data to be written.

It can be seen that the embodiment of the present application can write the same data to each node of each sub-cluster based on the data write instruction, so that the data of each sub-cluster can be consistent.

In some embodiments of the present application, the data to be written includes at least two levels of tag data;

Writing the data to be written by each node of each sub-cluster includes:

A hash table is established for each level of label data, and a multi-fork tree data structure is constructed based on the hash table corresponding to each level of label data;

The at least two levels of label data are written into each node of each sub-cluster based on the multi-tree data structure.

It can be seen that, based on the multi-tree data structure, the embodiment of the present application can implement hierarchical storage of multi-level tag data, which is beneficial to the subsequent management of multi-level tag data.

In some embodiments of the present application, in the case that the tag data of any level is not the tag data of the lowest level, the hash table of the tag data of any level includes the hash address corresponding to the tag data of the next level.

It can be seen that, based on the multi-tree data structure of the embodiment of the present application, the label data of the current level can be quickly queried to the label data of the next level.

In some embodiments of the present application, the method further includes:

Based on the multi-tree data structure, at least one of the following operations is performed on the at least two-level tag data: adding, deleting, modifying, and querying.

It can be seen that based on the characteristics of the multi-tree data structure, it is beneficial to quickly realize the addition, deletion, modification or query of at least two levels of label data.

An embodiment of the present application provides a data management device, and the device includes:

The establishment module is configured to establish a data storage architecture, the data storage architecture includes a parent cluster and multiple sub-clusters, each sub-cluster in the multiple sub-clusters includes a master node and at least one slave node, and the nodes of the parent cluster are all the sub-clusters. master nodes of the multiple sub-clusters, the nodes of the parent cluster include a master node of the parent cluster and at least one slave node of the parent cluster;

The processing module is configured to write data to each node of each sub-cluster based on the data storage architecture.

An embodiment of the present application provides an electronic device, and the electronic device includes:

a memory configured to store executable instructions;

The processor, when configured to execute the executable instructions stored in the memory, implements any one of the above data management methods.

Embodiments of the present application provide a computer-readable storage medium storing executable instructions for implementing any of the foregoing data management methods when executed by a processor.

Embodiments of the present application further provide a computer program product, where the computer program product includes computer-executable instructions, where the computer-executable instructions are used to implement any one of the data management methods provided by the embodiments of the present application.

In the embodiment of the present application, a data storage architecture is established, the data storage architecture includes a parent cluster and multiple sub-clusters, each sub-cluster in the multiple sub-clusters includes a master node and at least one slave node, and the nodes of the parent cluster are The master nodes of the multiple sub-clusters, the nodes of the parent cluster include a master node of the parent cluster and at least one slave node of the parent cluster; based on the data storage architecture, to each node of each sub-cluster data input. It can be seen that the data storage architecture of the embodiment of the present application is a two-layer distributed architecture, which can realize data interaction based on the master node of each sub-cluster. It reduces the negotiation message data, limits the propagation range of negotiation messages, and helps to alleviate the problems that affect normal packet transmission caused by message storms.

Description of drawings

1 is an optional flowchart of a data management method provided by an embodiment of the present application;

2 is a schematic diagram of a data storage architecture according to an embodiment of the present application;

Fig. 3 is a schematic diagram of cluster establishment in the embodiment of the present application;

4 is a schematic diagram of electing a parent cluster master node in an embodiment of the present application;

5 is a schematic diagram of a multi-tree data structure according to an embodiment of the present application;

6 is a schematic flowchart of an operation of adding data to at least two levels of tag data in an embodiment of the present application;

7 is a schematic flowchart of a data query operation on at least two levels of tag data in an embodiment of the present application;

8 is a schematic diagram of an optional composition structure of a data management apparatus according to an embodiment of the present application;

FIG. 9 is a schematic diagram of an optional composition structure of an electronic device provided by an embodiment of the present application.

Detailed ways

In the related art, data management can be implemented based on a single-layer distributed architecture, which has the following defects:

1) The single-layer distributed architecture is not suitable for super-large distributed clusters, because there are many super-large distributed cluster nodes, and the premise of data management based on super-large distributed clusters is to determine the master node through cluster election. During cluster election, each node must send election messages to other nodes in the cluster, which can easily cause message storms, and cause most of the traffic to be occupied by the packets sent by the cluster election, affecting normal packet transmission.

2) In determining the master node through cluster election, the election algorithm does not balance the election speed and stability. For example, the bully algorithm, the cluster election is simply judged according to the size of the node ID, which will cause frequent elections The problem of the master node; such as the raft algorithm, the majority voting mechanism is used for cluster election. This mechanism can avoid frequent election of the master, but it will cause the election time to become longer, and more than half of the nodes must vote to elect the master node. .

3) When using a single-layer distributed architecture to realize data storage, each node uses a unified Hash table structure for data storage. When there is a lot of stored data and needs to be expanded, it is necessary to calculate all the involved nodes in the node. The hash value of all key values (key) leads to a longer expansion time.

In view of the above technical problems, the technical solutions of the embodiments of the present application are proposed.

In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail below with reference to the accompanying drawings. All other embodiments obtained under the premise of creative work fall within the scope of protection of the present application.

The embodiments of the present application provide a data management method, an apparatus, an electronic device, and a computer storage medium. The data management methods of the embodiments of the present application can be applied to electronic devices. The following describes exemplary applications of the electronic devices provided by the embodiments of the present application. The electronic device provided by the embodiments of the present application may be implemented as a server, and the server may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, or may provide cloud services, cloud databases, cloud computing, Cloud servers for basic cloud computing services such as cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms.

The data management method according to the embodiment of the present application will be exemplarily described below.

FIG. 1 is an optional flowchart of a data management method provided by an embodiment of the present application. As shown in FIG. 1 , the flowchart may include:

Step 101 : establish a data storage architecture, the data storage architecture includes a parent cluster and multiple sub-clusters, each sub-cluster in the multiple sub-clusters includes a master node and at least one slave node, the node of the parent cluster is the master node of the multiple sub-clusters, and the parent cluster The nodes include one master node of the parent cluster and at least one slave node of the parent cluster.

In the embodiment of the present application, a sub-cluster represents a small cluster obtained by dividing the nodes of all clusters, each sub-cluster can independently elect a sub-cluster master node (Child Cluster Master), and the remaining nodes in the sub-cluster except the sub-cluster master node are Child Cluster Slave.

The parent cluster represents a cluster composed of master nodes of each sub-cluster. You can select a node from the nodes of the parent cluster as the parent cluster master node (Parent Cluster Master), and the rest of the nodes in the parent cluster except the parent cluster master node are the parent cluster slave nodes. Node (Parent Cluster Slave).

FIG. 2 is a schematic diagram of a data storage architecture according to an embodiment of the present application. As shown in FIG. 2 , sub-cluster 0, sub-cluster 1 and sub-cluster 2 represent three different sub-clusters; master node 01 represents the master node of sub-cluster 0, and slave Node 02 and slave node 03 represent different slave nodes in sub-cluster 0; master node 11 represents the master node of sub-cluster 1, slave node 12 and slave node 13 represent different slave nodes in sub-cluster 1; master node 21 represents the sub-cluster The master node of 2, slave node 22 and slave node 23 represent different slave nodes in sub-cluster 2; master node 01, master node 11 and master node 21 form the parent cluster, master node 21 is the master node of the parent cluster, master node 11 and The master node 01 is a different slave node in the parent cluster.

2, the data storage architecture of the embodiment of the present application is a two-layer structure, wherein the first layer is composed of multiple sub-clusters, and the sub-clusters of the first layer include sub-cluster 0, sub-cluster 1 and sub-cluster 2; the second layer represents The parent cluster consists of master nodes of multiple sub-clusters, that is, the parent cluster consists of master node 01 , master node 11 and master node 21 . After each sub-cluster is established, it will independently elect its own master node. Then, the master node of each sub-cluster can obtain the master ground of the parent cluster through election, so that one super-large cluster is realized through a two-layer structure.

Step 102: Write data to each node of each sub-cluster based on the data storage architecture.

In the embodiment of the present application, based on the above data storage structure, after receiving the data write instruction, the same data can be written to each node of each sub-cluster based on the data write instruction, so that the data of each sub-cluster can be consistent. Based on the above data storage structure, data can be read from any node because the data in each node is consistent.

In practical applications, steps 101 to 102 may be implemented based on a processor of an electronic device, and the above-mentioned processor may be an application specific integrated circuit (ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital Signal Processing Device (Digital Signal Processing Device, DSPD), Programmable Logic Device (Programmable Logic Device, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), CPU, Controller, Microcontroller, Microprocessor at least one of them. It can be understood that the electronic device that implements the function of the above processor may also be other, which is not limited in the embodiment of the present application.

It can be seen that the data storage architecture of the embodiment of the present application is a two-layer distributed architecture, which can realize data interaction based on the master node of each sub-cluster. It reduces the negotiation message data, limits the propagation range of negotiation messages, and helps to alleviate problems that affect normal packet transmission caused by message storms.

In some embodiments of the present application, establishing a data storage architecture may include: determining a master node and a health index of each sub-cluster; based on the health index of each sub-cluster, selecting a One master node is used as the master node of the parent cluster; the data storage architecture is established based on the master nodes of each sub-cluster and the master node of the parent cluster.

Here, the health index is used to reflect the health of the master node of the sub-cluster; after determining the master node of each sub-cluster, the slave nodes of each sub-cluster can be determined, and after determining the master node of the parent cluster, the parent node can be determined. The slave node of the cluster, thus, the above data storage architecture can be obtained.

It can be understood that, different from the scheme in the related art that only selects the master node according to the ID of the node, the embodiment of the present application can select the child nodes of the parent cluster based on the health index of the master node of each sub-cluster, which is similar to the bully algorithm. Compared with the raft algorithm, even if a node with a large ID is added frequently, it will not cause frequent replacement of the master node of the parent cluster, and the stability is stronger; and compared with the raft algorithm, it is necessary to elect the master node based on the half voting mechanism to determine the speed of the master node. Faster, to a certain extent, achieves a balance between election speed and stability.

In some embodiments of the present application, the health index of the master node of each sub-cluster can be obtained according to at least one of the following: the number of failures, the duration of at least one failure, and the duration from the last failure to the current time.

Here, the health index is negatively correlated with the number of failures, the health index is negatively correlated with the duration of at least one failure, and the health index is the length of time from the last failure to the current time; that is, the longer the duration of at least one failure If it is short, the value of the health index is larger; the longer the time from the last fault to the current time, the larger the value of the health index; the greater the number of failures, the smaller the value of the health index.

In some embodiments, the duration of the at least one failure may include the duration of the last failure.

In practical applications, each time a node fails, the failure start time and failure recovery time can be recorded to determine the failure duration.

In some embodiments, the health index of the master node of each sub-cluster can be obtained according to the following formula (1):

S0=t1-(fault_count*t2) (1)

Among them, S0 represents the health index of the master node of the sub-cluster, t1 represents the duration of the last failure from the current time, fault_count represents the number of failures, and t2 represents the duration of the last failure.

It is understandable that the embodiment of the present application can accurately obtain the health index of the master node of each sub-cluster based on the fault information. Therefore, it is beneficial to accurately select a node with a higher health degree among the master nodes of the sub-cluster. As the master node of the parent cluster,

In some embodiments of the present application, based on the health index and the first score value of the master node of each sub-cluster, one master node may be selected as the master node of the parent cluster among the master nodes of each sub-cluster; wherein, the first The scoring value is a preconfigured numerical value.

In some embodiments, the first score value may be a score value corresponding to the ID of the node; in practical applications, the first score value may be predetermined, that is, the first score value of each node is a fixed value.

In some embodiments of the present application, based on the health index and the first score value of each sub-cluster, selecting a main node as the main node of the parent cluster among the main nodes of each sub-cluster may include:

Based on the data interaction of the master nodes of each sub-cluster, the master node of each sub-cluster obtains the second score value of the master node of each sub-cluster, where the second score value is the sum of the health index and the first score value;

In practical applications, the master node of each sub-cluster can calculate the second score value of this node, and then, through the data interaction of the master node of each sub-cluster, the second score value of the master node of other sub-clusters can be obtained, and then, The magnitude relationship between the second score values of the current node and the master nodes of other sub-clusters can be judged, and it can be judged whether the current node can serve as the master node of the parent cluster.

In some embodiments, if the second score value of this node is greater than or equal to the second score value of the master node of other sub-clusters, it means that this node can serve as the master node of the parent cluster; if the second score value of this node is less than The second score value of the master node of any other sub-cluster indicates that this node cannot serve as the master node of the parent cluster.

In some embodiments of the present application, based on the health index of each sub-cluster, among the master nodes of each sub-cluster, selecting a master node as the master node of the parent cluster may include:

Based on the data interaction between the master nodes of each sub-cluster, the master node of each sub-cluster obtains the first message of each sub-cluster, where the first message represents the message of joining the parent cluster;

When the master node of each sub-cluster acquires the first message of each sub-cluster, based on the health index of each sub-cluster, one master node is selected as the master node of the parent cluster among the master nodes of each sub-cluster.

In practical applications, after generating the first message, that is, the message of joining the parent cluster, the master node of each sub-cluster can send the first message to the master nodes of other sub-clusters; the master node of each sub-cluster can also receive other sub-clusters. The first message sent by the master node of the sub-cluster; after the master node of each sub-cluster obtains the first message of each sub-cluster, it can initiate the process of selecting the master node of the parent cluster.

In some embodiments of the present application, the above-mentioned data interaction based on the master node of each sub-cluster, enabling the master node of each sub-cluster to obtain the first message of each sub-cluster includes:

Select seed clusters from each sub-cluster;

After the master node of the seed cluster receives the first message sent by the master node of other sub-clusters, it interacts with the data of other sub-clusters through the seed cluster, so that the master node of each sub-cluster obtains the first message of each sub-cluster; wherein, The other sub-clusters are sub-clusters of multiple sub-clusters except the seed cluster; the first message is sent by the master node of other sub-clusters based on the address of the master node of the seed cluster, and the address of the master node of the seed cluster is the master node of the other sub-clusters. Node predetermined information.

In some embodiments, one sub-cluster may be arbitrarily selected from each sub-cluster as a seed cluster, or a seed cluster may be selected from each sub-cluster according to a preset selection method.

In some embodiments, the address of the master node of the seed cluster may be pre-configured in the master nodes of other sub-clusters, so that the master nodes of other sub-clusters may send the first message to the seed cluster; the seed cluster receives the first message after receiving the first message. After the message is sent, the sending address of the first message, that is, the address of the master node of other sub-clusters can be recorded, and further, the seed cluster can send the received first message to the master node of each sub-cluster.

FIG. 3 is a schematic diagram of cluster establishment in an embodiment of the application. As shown in FIG. 3 , a seed cluster is selected first, and the seed cluster is a sub-cluster in each sub-cluster. The seed cluster, sub-cluster 1 and sub-cluster 2 represent three different The master node 01 represents the master node of the seed cluster, and the slave node 02 and slave node 03 represent the slave nodes (Seed Cluster Slave) in the seed cluster; the meaning of master node 11, slave node 12 and slave node 13 is the same as that of Figure 2 In the same way, the meanings of the master node 21 , the slave node 22 and the slave node 23 are the same as those in FIG. 2 , and will not be repeated here.

The following describes the process of cluster establishment with reference to FIG. 3 .

After the seed cluster is selected, the master node of the seed cluster can be selected, and then the IDs of all sub-clusters (including sub-cluster 1 and sub-cluster 2) at the initial moment can be written into the master node of the seed cluster, so that the seed cluster can know ID of each subcluster.

Other sub-clusters except the seed cluster can selectively select the master node, and the ID and address of the seed cluster can be pre-configured in other sub-clusters.

After each sub-cluster selects the master node, the master node of each sub-cluster can send a first message to the master node of the seed cluster (master node 01 shown in FIG. 3 ), where the first message includes the ID of the sub-cluster and the ID of the sub-cluster Information such as the address of the master node.

After receiving the first message sent by the other sub-cluster, the master node of the seed cluster can store the ID of the sub-cluster as the sub-cluster information according to the first message, and record the address of the master node of the corresponding sub-cluster.

After the seed cluster receives the first messages of all other sub-clusters, it can broadcast the sub-cluster information to the master node of each other sub-cluster, where the sub-cluster information may include the received first messages of all other sub-clusters, so that, Each sub-cluster can obtain the information of the master node of each sub-cluster. At this time, the election process of the master node of the parent cluster can be initiated; here, in determining the election process of the master node that initiates the parent cluster, the seed cluster has no meaning and needs to be Fair competition with other subclusters.

Here, the seed cluster is the coordinator at the initialization moment of the cluster establishment, and each sub-cluster can obtain the information of the master node of each sub-cluster and then degenerate into a sub-cluster.

FIG. 4 is a schematic diagram of electing a master node of a parent cluster in an embodiment of the application. In FIG. 4, sub-cluster 0, sub-cluster 1 and sub-cluster 2 represent three different sub-clusters; master node 01, slave node 02, slave node 03 , the master node 11 , the slave node 12 , the slave node 13 , the master node 21 , the slave node 22 and the slave node 23 have the same meanings as in FIG. 2 , and will not be repeated here.

Referring to FIG. 4 , the master node of each sub-cluster first determines the second score value of the node, and can send a success (victory) message to the master nodes of other sub-clusters, for example, the master node 21 can send the master node 11 and master node 11 and the master node 01 to send a victory message; the victory message represents a message announcing that this node is the master node, and the victory message can carry the second score value of this node.

The master node of each sub-cluster can receive the victory message sent by the master node of other sub-clusters. After receiving the victory message, it compares the second score value in the victory message with the second score value of this node. The second score value of the node is greater than or equal to the second score value of this node, then an election message can be sent to the source node of the victory message. If the second score value in the message is less than the second score value, a response (alive) message may be sent to the source node of the victory message, where the alive message for the victory message is a message indicating re-election of the parent cluster master node.

For example, according to FIG. 4 , after the master node 21 sends the victory message to the master node 11 and the master node 01, if the second score value in the victory message is greater than the second score value of the master node 11, the master node 11 can send the master node 11 to the master node. 21 sends an election message; if the second score value in the victory message is greater than the second score value of the master node 01, the master node 01 may send an election message to the master node 21.

After receiving the election message, the master node of each sub-cluster can reply to the alive message. Here, the alive message for the election message indicates that the reply information of the election message has been received. For example, referring to FIG. 4 , after receiving the election messages sent by the master node 11 and the master node 01, the master node 21 may send an alive message to the master node 11 and the master node 01, respectively.

After receiving the alive message for the victory message, the master node of each sub-cluster may not process the alive message and wait to receive messages sent by the master nodes of other sub-clusters.

After the master node of any sub-cluster sends the master node of the victory to the master node of other sub-clusters, if it does not receive the alive message sent by the master node of other sub-clusters within a given time, it is considered that the master node is the parent cluster At this time, the master node of the other sub-clusters can send a victory message to the master nodes of other sub-clusters, so that the master nodes of other sub-clusters can determine the master node of the parent cluster. In this way, the election process of the master node of the parent cluster ends.

In some embodiments, the given time may be set according to an actual application scenario, for example, the given time may be determined according to the maximum communication delay between each sub-cluster.

In some embodiments of the present application, when a node of the parent cluster fails, targeted processing needs to be performed, and two situations in which the parent node fails are respectively described below.

Exemplarily, when the slave node of the parent cluster fails, delete the failed slave node from the parent cluster, and use the master node of the parent cluster to send the member change information of the parent cluster to the slave node of the parent cluster;

After the master node of the parent cluster receives the first message sent by the master node of the first sub-cluster, the master node of the first sub-cluster is added to the parent cluster, and the member change information of the parent cluster is sent to each node of the parent cluster; The first sub-cluster represents the sub-cluster to which the faulty slave node belongs, and the master node of the first sub-cluster represents the master node re-selected from each node of the first sub-cluster.

Here, the member change information of the parent cluster may include information such as IDs of each node of the parent cluster.

In some embodiments, the master node of the parent cluster can delete the faulty slave node from the parent cluster, mark the ID of the corresponding child cluster as unavailable, and update the member information of the parent cluster, and then the parent cluster can be deleted. Membership change information is broadcast to each slave node of the parent cluster.

After the master node of the first sub-cluster fails, the master node can be re-selected in the first sub-cluster, and then the master node of the first sub-cluster can send the first message to the master node of the parent cluster; the master node of the parent cluster receives the first message. After the first message is received, the above-mentioned unavailable mark can be cleared, and then the member information of the parent cluster can be updated, and then the member change information of the parent cluster can be broadcasted to each slave node of the parent cluster.

In some embodiments, the master node of the parent cluster may also update the failure information related to the first sub-cluster for use in electing the master node of the parent cluster.

It can be understood that the embodiment of the present application can update the member information of the parent cluster in time when the slave node of the parent cluster fails, and is conducive to accurately realizing the master node of the parent cluster when the election of the master node of the parent cluster is performed subsequently. election.

Exemplarily, in the case that the master node of the parent cluster fails, select a node from other nodes of the parent cluster as the master node of the parent cluster;

After receiving the first message sent by the master node of the second sub-cluster, the master node of the second sub-cluster is added to the parent cluster, and the member change information of the parent cluster is sent to each node of the parent cluster; the second sub-cluster indicates The subcluster to which the failed master belongs.

In some embodiments, when the master node of the parent cluster fails, it is necessary to elect a node from each slave node of the parent cluster as the master node of the parent cluster, and the process of electing the master node of the parent cluster has been described above. The description is made in the content and will not be repeated here.

Since the master node of the second sub-cluster fails, it can be considered that the master node of the second sub-cluster is separated from the parent cluster. In this case, after the master node is selected from the second sub-cluster, it needs to join the parent cluster. The specific process is as follows:

When the master node of the parent cluster fails, other nodes in the second sub-cluster can delete the information of the master node of the parent cluster, and re-select a node in the second sub-cluster as the master node of the second sub-cluster; After the master node of the second sub-cluster is selected, the master node of the second sub-cluster may send the first message to any node of the parent cluster; in some embodiments, a hash ring may be established for all nodes of the parent cluster in advance, and the first message may be The master node of the two sub-clusters may select a node adjacently in a clockwise or counterclockwise direction based on the hash ring, and send the first message to the selected node.

When any one of the above-mentioned nodes of the parent cluster is the current master node of the parent cluster, the master node of the parent cluster may directly record the ID of the second sub-cluster and the address information of the current master node of the second sub-cluster. In the case that the node of any one of the parent clusters is not the current master node of the parent cluster, the node of any one of the parent clusters can return the master node information of the parent cluster to the current master node of the second sub-cluster, and the master node of the second sub-cluster The node may send the first message to the master node of the parent cluster based on the master node information of the parent cluster.

In some embodiments of the present application, the above-mentioned writing data to each node of each sub-cluster based on the data storage architecture may include:

Get the data write command, the data write command carries the data to be written;

Send the data write instruction to the master node of the parent cluster, the master node of the parent cluster sends the data write instruction to the master node of each sub-cluster, and the master node of each sub-cluster sends the data write instruction to the sub-cluster. Each slave node enables each node of each sub-cluster to write the data to be written.

In some embodiments, after receiving the data write instruction sent by the client, it indicates that data needs to be written to each node in the data storage architecture. In this case, the data write instruction can be sent to any node in the parent cluster. A node; if the node receiving the data write command is not the master node of the parent cluster, the node receiving the data write command will reply the information of the master node of the parent cluster to the client, and then the client can send the data write command To the master node of the parent cluster, the master node of the parent cluster can send the data write command to the master node of each sub-cluster; if the node receiving the data write command is the master node of the parent cluster, the master node of the parent cluster can write the data to the master node of the parent cluster. Write commands are sent to the master node of each subcluster.

In some embodiments, when the master node of a sub-cluster determines that all nodes in the sub-cluster have successfully written the data to be written, it will return successful writing information to the master node of the parent cluster; otherwise, it will return unsuccessful writing information to the master node of the parent cluster; when the master node of the parent cluster receives the successful write information sent by the master nodes of each sub-cluster, it can return the successful write information to the client; when the master node of the parent cluster receives at least one When the master node of the sub-cluster sends the unsuccessful writing information, it can be determined that the node of the at least one sub-cluster is faulty, or it can be determined that the data has not been successfully written to the at least one sub-cluster. The master node of a sub-cluster sends a data write instruction until the successful write information sent by the master node of the at least one sub-cluster is received.

In some embodiments of the present application, a data read instruction can also be obtained, and the data read instruction carries the data read address of any node; then, based on the data read address, data can be read from any of the above nodes .

In some embodiments of the present application, the above-mentioned data to be written includes at least two levels of tag data;

Correspondingly, writing the data to be written by each node of each sub-cluster includes: establishing a hash table for each level of label data, and building a multi-fork tree data structure based on the hash table corresponding to the label data at each level; A multi-tree data structure, the at least two levels of label data are written into each node of each sub-cluster.

In related technologies, micro-service technology has been widely used. The characteristic of micro-service is dynamic. For using traditional Internet Protocol (Internet Protocol, IP) to manage data (for example, it can be application instance data), it cannot meet the needs of micro-services. Therefore, in the embodiment of the present application, a data management solution based on tag data is proposed instead of an IP-based data management solution.

In some embodiments, the data to be written is application instance data, the application instance data may include the application cluster and the system to which the application belongs, and the application instance data may be represented as /cluster/system/app, where the symbol / represents a fixed separator , cluster, system, and app represent clusters, systems, and applications, respectively, cluster represents the first-level label, system represents the second-level label, and app represents the third-level label.

FIG. 5 is a schematic diagram of a multi-fork tree data structure according to an embodiment of the present application. Referring to FIG. 5 , in a cluster, hashtable1-0 represents a hash address of cluster data, and bucket0 and bucket1 in hashtable1-0 represent different stored data. In bucket0 in hashtable1-0, the key is cluster0 and the value is hashtable1-0; in bucket1 in hashtable1-0, the key is cluster1 and the value is hashtable1-1.

Referring to Figure 5, in the system, hashtable1-0 and hashtable1-1 represent hash addresses of different system data, bucket0 and bucket1 in hashtable1-0 represent different storage data, and in bucket0 in hashtable1-0, the key is system0, the value is hashtable2-0; in bucket1 in hashtable1-0, the key is system1, and the value is hashtable2-1; bucket0 in hashtable1-1 represents the storage data, in bucket0 in hashtable1-1, the key is system0, The value is hashtable2-2.

Referring to Figure 5, hashtable2-0, hashtable2-1 and hashtable2-2 respectively represent hash addresses of different application data; bucket0 in hashtable2-0 represents storage data, in bucket0 in hashtable2-0, the key is app0, The value is ip1; bucket0 and bucket1 in hashtable2-1 represent different storage data, in bucket0 in hashtable2-0, the key is app1 and the value is ip2; in bucket1 in hashtable2-0, the key is app2 and the value is ip2; in bucket0 in hashtable2-2, the key is app0 and the value is ip3; here, app0, app1, and app2 are different applications, and ip1, ip2, and ip3 represent different IP addresses.

In some embodiments, in order to be compatible with the traditional IP-based data management architecture, label-to-IP conversion needs to be performed, and the actual physical-level communication can be completed by converting the logical concept label into physical concept IP when actually performing network-level communication; For example, referring to the third-level label data in FIG. 5 , a corresponding relationship between labels and IPs can be established, thus facilitating efficient storage and search of the corresponding relationships between labels and IPs.

In some embodiments, each tree node in the multi-tree data structure is correspondingly established with a hash table, and the hash table is used to store corresponding first-level label data.

In this embodiment of the present application, multi-level label data may be divided according to fixed separators, each level of label data corresponds to a tree node of a multi-tree data structure, and the tree node in the multi-tree data structure is not a leaf node (there is no child node), the key in the tree node is an element of the corresponding hash table, and the value is the hash address of the next-level hash table; here, the hash address of the next-level hash table is the above-mentioned lower hash table. The hash address corresponding to the first-level tag data.

The data storage scheme based on the multi-tree data structure has the following characteristics:

1) Tag data with the same prefix string can share the same hash table node, which can reduce storage space usage. For example, referring to Figure 5, in the application-level tag data, bucket0 and bucket1 in hashtable2-1 have the same The prefix string, that is, the prefix strings of bucket0 and bucket1 in hashtable2-1 are both cluster0/system1; thus, bucket0 and bucket1 in hashtable2-1 share the same hash table node hashtable2-1.

2) Based on the form of multi-level tag data, the number of strings separated by a fixed separator is a fixed number, so the height of the multi-fork tree is also fixed, and the lookup time of each hash table is a constant level , the overall hash table-based query time complexity is also a constant level.

3) When suffix wildcard matching is required, such as /cluster0/system0/*, you only need to find the node that exactly matches the prefix, and the hash table node it points to contains all the information matched by the suffix wildcard. Compared with a solution for implementing data storage with a hash table, the embodiment of the present application does not need to traverse the hash table corresponding to the entire data to be written, which can further reduce the time complexity of suffix wildcard matching.

4) In the related art, a hash table needs to be used uniformly to realize data storage. Therefore, when the data volume of the hash table is large, the more keys that need to be recalculated during expansion, the longer the expansion time; When data expansion is required, it is only necessary to expand the hash table of the node corresponding to the new element that needs to be placed, which reduces the expansion time to a certain extent.

In some embodiments of the present application, based on the multi-tree data structure, at least one of the following operations may be performed on the at least two-level tag data: adding, deleting, modifying, and querying.

FIG. 6 is a schematic flowchart of an operation of adding data to at least two levels of label data in an embodiment of the present application. As shown in FIG. 6 , the process may include:

Step 601: Acquire label data at all levels.

Here, the multi-level label data can be divided according to the fixed delimiter described above to obtain the label data of each level.

Step 602: Take out the first-level label.

For example, for data/cluster/system/app, cluster represents the first-level label, system represents the second-level label, and app represents the third-level label.

Step 603: Determine whether the hash table node corresponding to the currently fetched tag exists, if not, execute step 604; if yes, execute step 605.

Step 604: Create a corresponding hash table node, and perform step 605.

Here, a corresponding hash table node can be created for the currently fetched tag.

Step 605: Determine whether there is a corresponding label in the hash table node, if not, go to step 606, and if so, go to step 607.

Step 606 : insert the corresponding label, and then perform step 607 .

Here, the currently fetched tag can be inserted into the hash table node, and then step 607 is performed.

Step 607: Determine whether the traversal of the label data at all levels is completed, if yes, end the process, if not, go to Step 608.

Here, if the label data of all levels has been taken out, it means that the traversal of the label data of all levels has been completed; otherwise, it means that the traversal of the labels of all levels has not been completed.

Step 608 : take out the next-level label data, and then return to step 603 .

FIG. 7 is a schematic flowchart of a data query operation for at least two levels of tag data in an embodiment of the present application. As shown in FIG. 7 , the process may include:

Step 701: Acquire label data at all levels.

Step 702: Take out the first-level label.

Step 703: Determine whether the hash table node corresponding to the currently fetched tag exists, if yes, go to Step 704, if not, go to Step 708.

Step 704: Determine whether there is a corresponding label in the hash table node, if yes, go to Step 705, if not, go to Step 708.

Step 705: Determine whether the traversal of the label data at all levels is completed, if yes, go to Step 706, if not, go to Step 707.

Step 706: It is determined that the corresponding tag is queried, and then the process ends.

Here, after it is determined that the corresponding tag is queried, the corresponding tag data can be read; in practical applications, the corresponding data can be read from the hash table node where the tag data is located.

Step 707 : take out the next-level label, and then return to step 703 .

Step 708: It is determined that the corresponding tag is not queried, and then the process ends.

In some embodiments, when the data deletion operation is performed on the at least two-level tag data, the corresponding data may be queried first, and then the corresponding data may be deleted in the corresponding hash node.

In some embodiments, when the data modification operation is performed on the at least two-level tag data, the corresponding data may be queried first, and then the corresponding data may be modified in the corresponding hash node.

On the basis of the data management method proposed in the foregoing embodiment, an embodiment of the present application also proposes a data management apparatus; FIG. 8 is a schematic diagram of an optional composition structure of the data management apparatus according to the embodiment of the present application, as shown in FIG. 8 . As shown, the data management apparatus 800 may include:

The establishment module 801 is configured to establish a data storage architecture, the data storage architecture includes a parent cluster and multiple sub-clusters, each sub-cluster in the multiple sub-clusters includes a master node and at least one slave node, and the node of the parent cluster is master nodes of the multiple sub-clusters, the nodes of the parent cluster include a master node of the parent cluster and at least one slave node of the parent cluster;

The processing module 802 is configured to write data to each node of each sub-cluster based on the data storage architecture.

In some embodiments of the present application, the establishing module 801, configured to establish a data storage architecture, includes:

Determine the master node and health indicators of each sub-cluster;

In some embodiments of the present application, the establishing module 801 is further configured to obtain the health index of the master node of each sub-cluster according to at least one of the following: the number of failures, the duration of at least one failure, the last time The length of time from the fault to the current time.

In some embodiments of the present application, the establishing module 801 is configured to select a master node as the master node of the parent cluster from the master nodes of the respective sub-clusters based on the health index of the respective sub-clusters nodes, including:

In some embodiments of the present application, the establishing module 801 is configured to, based on the health index and the first score value of the respective sub-clusters, select one master node as the primary node among the master nodes of the respective sub-clusters. The master node of the parent cluster, including:

Based on the data interaction between the master nodes of each sub-cluster, the master node of each sub-cluster acquires the first message of each sub-cluster, where the first message represents the message of joining the parent cluster;

In some embodiments of the present application, the establishing module 801 is configured to enable the master node of each sub-cluster to obtain the first message of each sub-cluster based on data interaction between the master nodes of each sub-cluster, including:

Selecting a seed cluster in each of the sub-clusters;

After the master node of the seed cluster receives the first message sent by the master node of other sub-clusters, the master node of each sub-cluster interacts with the data of the other sub-clusters, so that the master node of each sub-cluster obtains the The first message of each sub-cluster; wherein, the other sub-clusters are sub-clusters in the multiple sub-clusters except the seed cluster; the first message is that the master node of the other sub-clusters is based on the seed The address of the master node of the cluster is sent from the address of the master node of the seed cluster, and the address of the master node of the seed cluster is the information predetermined by the master nodes of the other sub-clusters.

In some embodiments of the present application, the establishing module 801 is further configured to:

After the master node of the parent cluster receives the first message sent by the master node of the first sub-cluster, it joins the master node of the first sub-cluster in the parent cluster, and sends a message of the parent cluster to each node of the parent cluster. Member change information; the first sub-cluster represents the sub-cluster to which the faulty slave node belongs, and the master node of the first sub-cluster represents the master node re-selected from each node of the first sub-cluster, The first message represents a message for joining the parent cluster.

In some embodiments of the present application, the processing module 802 is further configured to:

Based on the data read address, data is read from the arbitrary node.

In some embodiments of the present application, the processing module 802 is configured to write data to each node of each sub-cluster based on the data storage architecture, including:

Send the data writing instruction to the master node of the parent cluster, the master node of the parent cluster sends the data writing instruction to the master node of each sub-cluster, and the master node of each sub-cluster will The data writing instruction is sent to each slave node of the sub-cluster, so that each node of each sub-cluster writes the to-be-written data.

Writing the data to be written by each node of each sub-cluster includes:

In some embodiments of the present application, the processing module is further configured to perform at least one of the following operations on the at least two-level tag data based on the multi-tree data structure: adding, deleting, modifying, and querying.

In practical applications, both the establishment module 801 and the processing module 802 can be implemented by a processor, and the above-mentioned processor can be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. kind. It can be understood that the electronic device that implements the function of the above processor may also be other, which is not limited in the embodiment of the present application.

It should be noted that the descriptions of the above apparatus embodiments are similar to the descriptions of the above method embodiments, and have similar beneficial effects to the method embodiments. For technical details not disclosed in the device embodiments of the present application, please refer to the descriptions of the method embodiments of the present application for understanding.

It should be noted that, in the embodiments of the present application, if the above data management method is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence or in the parts that make contributions to the prior art. The computer software products are stored in a storage medium and include several instructions for A computer device (which may be a terminal, a server, etc.) is caused to execute all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), magnetic disk or optical disk and other media that can store program codes. As such, the embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, the embodiments of the present application further provide a computer program product, where the computer program product includes computer-executable instructions, and the computer-executable instructions are used to implement any one of the data management methods provided by the embodiments of the present application.

Correspondingly, an embodiment of the present application further provides a computer storage medium, where computer-executable instructions are stored on the computer storage medium, and the computer-executable instructions are used to implement any one of the data management methods provided in the foregoing embodiments.

An embodiment of the present application further provides an electronic device, and FIG. 9 is an optional structural schematic diagram of the electronic device provided by the embodiment of the present application. As shown in FIG. 9 , the electronic device 900 includes:

a memory 901 for storing executable instructions;

The processor 902 is configured to implement any one of the above data management methods when executing the executable instructions stored in the memory 901 .

The above-mentioned processor 902 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor.

The above-mentioned computer-readable storage medium/memory can be a read-only memory (Read Only Memory, ROM), a programmable read-only memory (Programmable Read-Only Memory, PROM), an erasable programmable read-only memory (Erasable Programmable Read-Only Memory) Memory, EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Magnetic Random Access Memory (FRAM), Flash Memory (Flash Memory), Magnetic Surface Memory, optical disk, or memory such as Compact Disc Read-Only Memory (CD-ROM); it can also be various terminals including one or any combination of the above memories, such as mobile phones, computers, tablet devices, personal digital Assistant etc.

It should be pointed out here that the descriptions of the above storage medium and device embodiments are similar to the descriptions of the above method embodiments, and have similar beneficial effects to the method embodiments. For technical details not disclosed in the embodiments of the storage medium and device of the present application, please refer to the description of the method embodiments of the present application to understand.

It is to be understood that reference throughout the specification to "some embodiments" means that a particular feature, structure or characteristic associated with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of "in some embodiments" in various places throughout this specification are not necessarily necessarily referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation. The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms. of.

The unit described above as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit; it may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions of the embodiments of the present application.

In addition, each functional unit in each embodiment of the present application may all be integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration The unit can be implemented either in the form of hardware or in the form of hardware plus software functional units.

Alternatively, if the above-mentioned integrated units of the present application are implemented in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of software products in essence or the parts that contribute to related technologies. The computer software products are stored in a storage medium and include several instructions to make The automatic test line of the device performs all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes various media that can store program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

The methods disclosed in the several method embodiments provided in this application can be arbitrarily combined under the condition of no conflict to obtain new method embodiments.

The features disclosed in several method or device embodiments provided in this application can be combined arbitrarily without conflict to obtain new method embodiments or device embodiments.

The above is only the embodiment of the present application, but the protection scope of the present application is not limited to this. Covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

A data management method, the method comprising:

Establish a data storage architecture, the data storage architecture includes a parent cluster and multiple sub-clusters, each sub-cluster in the multiple sub-clusters includes a master node and at least one slave node, and the nodes of the parent cluster are the multiple sub-clusters. a master node, the nodes of the parent cluster include a master node of the parent cluster and at least one slave node of the parent cluster;

Based on the data storage architecture, data is written to each node of each sub-cluster.
The method of claim 1, wherein said establishing a data storage architecture comprises:

Determine the master node and health indicators of each sub-cluster;

Based on the health index of each sub-cluster, among the master nodes of each sub-cluster, select a master node as the master node of the parent cluster;

The data storage architecture is established based on the master node of each sub-cluster and the master node of the parent cluster.
The method of claim 2, wherein the method further comprises:

The health indicator of the master node of each subcluster is derived based on at least one of the following: the number of failures, the duration of at least one failure, and the length of time since the last failure.
The method according to claim 2, wherein, based on the health degree indicators of the respective sub-clusters, selecting a master node as the master node of the parent cluster among the master nodes of the respective sub-clusters, comprising:

Based on the health index of the master node of each sub-cluster and the first score value, among the master nodes of each sub-cluster, a master node is selected as the master node of the parent cluster; wherein the first score The value is a preconfigured value.
The method according to claim 4, wherein, based on the health index of each sub-cluster and the first score value, among the master nodes of each sub-cluster, one master node is selected as the master node of the parent cluster. Master node, including:

Based on the data interaction between the master nodes of each sub-cluster, the master node of each sub-cluster obtains the second score value of the master node of each sub-cluster, where the second score value is the sum of the health index and the first score value ;

Based on the second score value of the master node of each sub-cluster, the master node of each sub-cluster elects the master node of the parent cluster.
The method according to claim 2, wherein, based on the health degree indicators of the respective sub-clusters, selecting a master node as the master node of the parent cluster among the master nodes of the respective sub-clusters, comprising:

Based on the data interaction between the master nodes of each sub-cluster, the master node of each sub-cluster acquires the first message of each sub-cluster, where the first message represents the message of joining the parent cluster;

In the case where the master node of each sub-cluster acquires the first message of each sub-cluster, based on the health index of each sub-cluster, among the master nodes of each sub-cluster, one master node is selected as the The master node of the parent cluster.
The method according to claim 6, wherein, based on the data interaction of the master node of each sub-cluster, enabling the master node of each sub-cluster to obtain the first message of each sub-cluster, comprising:

Selecting a seed cluster in each of the sub-clusters;

After the master node of the seed cluster receives the first message sent by the master node of other sub-clusters, the master node of each sub-cluster interacts with the data of the other sub-clusters, so that the master node of each sub-cluster obtains the The first message of each sub-cluster; wherein, the other sub-clusters are sub-clusters in the multiple sub-clusters except the seed cluster; the first message is that the master node of the other sub-clusters is based on the seed The address of the master node of the cluster is sent from the address of the master node of the seed cluster, and the address of the master node of the seed cluster is the information predetermined by the master nodes of the other sub-clusters.
The method of claim 1, wherein the method further comprises:

When the slave node of the parent cluster fails, delete the failed slave node from the parent cluster, and use the master node of the parent cluster to send the member change information of the parent cluster to the slave node of the parent cluster;

After the master node of the parent cluster receives the first message sent by the master node of the first sub-cluster, it joins the master node of the first sub-cluster in the parent cluster, and sends a message of the parent cluster to each node of the parent cluster. Member change information; the first sub-cluster represents the sub-cluster to which the faulty slave node belongs, and the master node of the first sub-cluster represents the master node re-selected from each node of the first sub-cluster node, the first message represents the message of joining the parent cluster.
The method of claim 1, wherein the method further comprises:

In the case that the master node of the parent cluster fails, select a node from other nodes of the parent cluster as the master node of the parent cluster;

After receiving the first message sent by the master node of the second sub-cluster, add the master node of the second sub-cluster to the parent cluster, and send the member change information of the parent cluster to each node of the parent cluster; The second sub-cluster represents the sub-cluster to which the faulty master node belongs, and the first message represents the message of joining the parent cluster.
The method of claim 1, wherein the method further comprises:

Obtain a data read command, where the data read command carries the data read address of any node;

Based on the data read address, data is read from the arbitrary node.
The method according to any one of claims 1-10, wherein the writing data to each node of each sub-cluster based on the data storage architecture comprises:

obtaining a data writing instruction, where the data writing instruction carries the data to be written;

Send the data writing instruction to the master node of the parent cluster, the master node of the parent cluster sends the data writing instruction to the master node of each sub-cluster, and the master node of each sub-cluster will The data writing instruction is sent to each slave node of the sub-cluster, so that each node of each sub-cluster writes the data to be written.
The method of claim 11, wherein the data to be written includes at least two levels of tag data;

Writing the data to be written by each node of each sub-cluster includes:

A hash table is established for each level of label data, and a multi-fork tree data structure is constructed based on the hash table corresponding to each level of label data;

The at least two levels of label data are written into each node of each sub-cluster based on the multi-tree data structure.
The method according to claim 12, wherein in the case that the label data of any level is not the label data of the lowest level, the hash table of the label data of any level includes the hash address corresponding to the label data of the next level.
The method of claim 12, wherein the method further comprises:

Based on the multi-tree data structure, at least one of the following operations is performed on the at least two-level tag data: adding, deleting, modifying, and querying.
A data management device, the device includes:

The establishment module is used to establish a data storage architecture, the data storage architecture includes a parent cluster and multiple sub-clusters, each sub-cluster in the multiple sub-clusters includes a master node and at least one slave node, and the nodes of the parent cluster are all the sub-clusters. master nodes of the multiple sub-clusters, the nodes of the parent cluster include a master node of the parent cluster and at least one slave node of the parent cluster;

The processing module is configured to write data to each node of each sub-cluster based on the data storage architecture.
An electronic device comprising:

memory for storing executable instructions;

The processor is configured to implement the data management method according to any one of claims 1 to 14 when executing the executable instructions stored in the memory.
A computer-readable storage medium storing executable instructions for implementing the data management method according to any one of claims 1 to 14 when executed by a processor.
A computer program, comprising computer-readable codes, when the computer-readable codes are executed in an electronic device, a processor in the electronic device executes the data management method for implementing any one of claims 1 to 14 .