[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115269556A - Database fault processing method, device, equipment and storage medium - Google Patents

Database fault processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN115269556A
CN115269556A CN202210969279.9A CN202210969279A CN115269556A CN 115269556 A CN115269556 A CN 115269556A CN 202210969279 A CN202210969279 A CN 202210969279A CN 115269556 A CN115269556 A CN 115269556A
Authority
CN
China
Prior art keywords
node
database
slave
master
backup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210969279.9A
Other languages
Chinese (zh)
Inventor
张谦
吴扬扬
王增亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202210969279.9A priority Critical patent/CN115269556A/en
Publication of CN115269556A publication Critical patent/CN115269556A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a database fault processing method, a device, equipment and a storage medium; in the scheme, the node needs to monitor whether a target node fails, and a master-slave copy relationship exists between a database of the target node and a database of the node; if the target node is monitored to have a fault and the latest data is stored in the database of the node, a database is created in the backup node, the data in the database of the node is copied to the database of the backup node, and a master-slave replication relation between the database of the node and the database of the backup node is established. By the method, when the database fails, the failed database can be quickly switched, the master-slave copy relationship between the master-slave nodes is automatically recovered, and the recovery speed is increased; in addition, according to the scheme, the fault database is switched only when the latest data is stored in the database of the node, so that high availability of the master database and the slave database on the premise of not losing the data is realized, and the data consistency between the master node and the slave node is realized.

Description

Database fault processing method, device, equipment and storage medium
Technical Field
The present invention relates to the field of database technologies, and in particular, to a method, an apparatus, a device, and a storage medium for database fault handling.
Background
Databases such as MySQL (relational database management system), oracle (relational database management system), postgreSQL (relational database management system), and the like have been widely used in internet companies. With the consideration of the security and controllability of the database, higher requirements are put on the database architecture, and how to ensure the data consistency is the most concerned issue. Current master-slave replication techniques for databases include: master-slave asynchronous replication technology, master-slave semi-synchronous replication technology and master-slave full synchronous replication technology. If the master-slave replication technology is abnormal, a data loss phenomenon exists when the database is subjected to fault recovery at present, and the data consistency between the master node and the slave node cannot be ensured.
Disclosure of Invention
The invention aims to provide a database fault processing method, a database fault processing device and a database fault processing storage medium, so that a data loss phenomenon is avoided when a database is recovered from a fault, and data consistency between a master node and a slave node is ensured.
In order to achieve the above object, the present invention provides a database fault processing method, where the database fault processing method includes:
determining a target node corresponding to the node; the database of the node and the database of the target node have a master-slave copy relationship;
if the target node is monitored to have a fault and the latest data is stored in the database of the local node, a database is created in the backup node, the data in the database of the local node is copied to the database of the backup node, and a master-slave replication relationship between the database of the local node and the database of the backup node is established.
Wherein before creating the database in the backup node, the method further comprises:
canceling the master-slave copy relationship between the database of the node and the database of the target node;
if the node is a first main node and the target node is a first slave node, providing database reading and writing service through the first main node;
and if the node is a second slave node, the target node is a second master node, the second slave node is upgraded to a third master node, and database reading and writing services are provided through the third master node.
Wherein the creating a database in the backup node comprises:
and selecting a third slave node for creating the database from the plurality of backup nodes according to the health state of each backup node.
After providing database read-write service through the first host node/the third host node, the method further includes:
detecting whether the first master node/the third master node has a non-failed slave node;
if not, continuing to execute the step of creating the database in the backup node;
and if so, providing database read-write service through the first main node/the third main node and the corresponding non-fault slave node.
The step of storing the latest data in the database of the node includes:
judging whether the node is in a data consistency state or not;
if yes, judging that the latest data is stored in the database of the node;
if not, judging whether the node is a main node or not;
if the local node is a main node, judging that the latest data is stored in the database of the local node; and if the local node is the slave node, judging that the latest data is not stored in the database of the local node.
After the establishing of the master-slave copy relationship between the database of the local node and the database of the backup node, the method further includes:
and storing the position information of the local node and the position information of the backup node in a storage unit.
Wherein, this method still includes:
if the fault of the target node is recovered, the target node judges whether a master-slave copy relationship exists between the database of the target node and the database of the local node or not according to the position information in the storage unit;
if not, deleting the database of the target node;
if yes, database reading and writing services are continuously provided through the local node and the target node.
In order to achieve the above object, the present invention further provides a database fault processing apparatus, including:
the determining module is used for determining a target node corresponding to the node; the database of the node and the database of the target node have a master-slave copy relationship;
the creating module is used for creating a database in the backup node when monitoring that the target node fails and the latest data is stored in the database of the node;
the copying module is used for copying the data in the database of the node to the database of the backup node;
and the setting module is used for establishing a master-slave copy relationship between the database of the node and the database of the backup node.
To achieve the above object, the present invention further provides an electronic device comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the database fault processing method when executing the computer program.
To achieve the above object, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above database fault handling method.
According to the scheme, the embodiment of the invention provides a database fault processing method, a device, equipment and a storage medium; in the scheme, the node needs to monitor whether a target node fails, and a master-slave copy relationship exists between a database of the target node and a database of the node; if the target node is monitored to have a fault and the latest data is stored in the database of the node, a database is created in the backup node, the data in the database of the node is copied to the database of the backup node, and a master-slave replication relationship between the database of the node and the database of the backup node is established. By the method, when the database fails, the failed database can be quickly switched, the master-slave copy relationship between the master node and the slave node can be automatically recovered, the recovery speed is increased, and the high availability of the database is improved; in addition, according to the scheme, the fault database is switched only when the latest data is stored in the database of the node, so that high availability of the master database and the slave database on the premise of not losing the data is realized, and the data consistency between the master node and the slave node is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic structural diagram of a database fault handling system according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a database fault handling method according to an embodiment of the present invention;
FIG. 3 is a flow chart of slave node failure handling disclosed in the embodiments of the present invention;
FIG. 4 is a flowchart illustrating a master node failure handling process according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of a database fault handling apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For convenience of understanding, a system architecture applicable to the technical solution of the present application is introduced below, referring to fig. 1, which is a schematic structural diagram of a database fault handling system disclosed in the embodiment of the present invention, and as can be seen from fig. 1, the system mainly includes the node 11 and the target node 12. The node 11 is configured to execute a database processing scheme, and the node 11 may be a master node or a slave node, which is not specifically limited herein. The target node 12 is a node that is detected whether a failure occurs in the present application, and the target node 12 may be a master node or a slave node. Moreover, a master-slave copy relationship exists between the database of the master node and the database of the target node 12 in the present application, if the node 11 is the master node, the target node 12 is a slave node having a master-slave copy relationship with the master node, and if the node 11 is a slave node, the target node 12 is a master node having a master-slave copy relationship with the slave node; in this embodiment, the service for implementing the database fault processing is referred to as a database management service, and the database management service may be used to monitor whether a slave node having a master-slave copy relationship with the slave node has a fault, and implement automated operation and maintenance processing on the faulty node. That is to say: in the application, the database fault processing scheme can be executed no matter whether the node is a master node or a slave node, and as long as the node detects that a target node has a fault, a database can be created in a backup node, so that the database in the fault node can be quickly switched, the master-slave replication relationship can be automatically recovered, and the recovery speed is improved; in addition, according to the scheme, the fault database is switched only when the latest data is stored in the database of the node, so that high availability of the master database and the slave database on the premise of not losing the data is realized, and the data consistency between the master node and the slave node is realized.
Referring to fig. 2, a schematic flow chart of a database fault processing method disclosed in the embodiment of the present invention is shown, and as can be seen from fig. 2, the database fault processing method includes:
s101, determining a target node corresponding to the node; the database of the node and the database of the target node have a master-slave copy relationship;
specifically, each master-slave database cluster comprises a master node and a plurality of slave nodes, the master node monitors the working state of each slave node, and the slave nodes can also monitor the working state of the master node. And when the master-slave database cluster is in a normal state, all nodes do not have faults, and at the moment, a master-slave replication relationship needs to be established between the database of the master node and the database of the slave node. The master-slave replication relationship can be a master-slave asynchronous replication relationship established by a master-slave asynchronous replication technology for a database of a master-slave node, a master-slave semi-synchronous replication relationship established by the master-slave semi-synchronous replication technology for the database of the master-slave node, or a master-slave fully-synchronous replication relationship established by the master-slave fully-synchronous replication technology for the database of the master-slave node, but when the master-slave asynchronous replication technology is used, a result is returned to a client immediately after a transaction submitted by the client is executed by a master database, and the result received from the database is not concerned, so that the contents of the databases of the master-slave node and the master-slave node are inconsistent because the master database is delayed due to certain reasons and the slave database is not synchronized to the master database; when the master-slave full-synchronous replication technology is used, all transactions must be received by slave nodes and applied to slave databases, the commit can be completed by the threads of the master database and returned to the client, and the time for completing one transaction in the mode is prolonged, so that the performance is sharply reduced; when the master-slave semi-synchronous replication technology is used, the master database does not return to the client immediately after the transaction submitted by the client is executed, but returns to the client after at least one slave database receives and writes the transaction into the relay log. Therefore, in order to take performance and data consistency into consideration, the master-slave database cluster preferably uses a master-slave semi-synchronous replication technology, and therefore a master-slave semi-synchronous replication relationship exists between the database of the node and the database of the target node.
S102, if the target node is monitored to have a fault and the latest data is stored in the database of the node, a database is created in the backup node, the data in the database of the node is copied to the database of the backup node, and a master-slave replication relationship between the database of the node and the database of the backup node is established.
In this embodiment, the database management service in the node needs to monitor whether the target node fails, where the failure of the target node may be a failure of a database in the target node, or a failure of communication between the node and the target node, and this is not limited specifically, and the failure of the target node may be determined as long as the node detects that the working state of the target node is abnormal.
It should be noted that, when monitoring whether a target node fails, the node may periodically detect whether the target node fails according to a preset time interval, for example: if the time interval is 1 minute, the node needs to detect whether the target node has a fault every 1 minute, if the target node has no fault, the node waits for 1 minute and then detects whether the target node has a fault again, and by the method, the fault node can be detected in time; of course, the present embodiment only describes the present embodiment with the time interval as 1 minute, and in practical application, the time interval may be set according to actual requirements.
In this embodiment, if the node monitors that the target node fails, before switching the failed database, it is first determined which node database in the master-slave database cluster stores the latest data, and if the node that does not fail stores the latest data, when switching the database, the latest data stored in the node is copied to the database of the backup node, which does not result in data loss; if the target node with the fault stores the latest data, only the non-latest data in the node is copied to the database of the backup node if the database is continuously switched at the moment, and the latest data in the target node is lost. Therefore, in the method, the database is created in the backup node only when the latest data is stored in the database of the node, otherwise, the database is not created in the backup node, and the target node continues to wait for the fault repair of the target node. If the target node has not repaired the fault after waiting for the predetermined time, a warning message may be sent to a manager, or a subsequent step of creating a database in the backup node may be automatically performed, and a specific execution operation may be set in advance, which is not specifically limited herein.
Further, if the latest data is stored in the database of the node, the node selects a normal node without failure to replace the target node, and the node replacing the target node is called a backup node in the scheme. After the backup node is selected, a database can be created in the backup node, the data in the database of the node is copied to the database of the backup node, the master-slave copy relationship between the database of the node and the database of the target node is cancelled, the master-slave copy relationship between the database of the node and the database of the backup node is established, after the operations are executed, the node and the backup node can restore the master-slave database cluster and normally provide database read-write service for the outside.
It should be noted that, if the node is the master node, the master node may normally provide a database read-write service to the outside after monitoring that the slave node fails and canceling the master-slave copy relationship between the database of the node and the database of the target node, and continue to perform the subsequent selection of the backup node, and restore the master-slave database cluster through the master node and the backup node; if the node is a slave node, the master node is monitored to have a fault, the subsequent selection of a new master node (backup node) can be continuously executed after the master-slave copy relationship between the database of the node and the database of the target node is canceled, the database read-write service is normally provided to the outside through the new master node after the master-slave database cluster is recovered through the new master node and the slave node, but when the database read-write service is provided to the outside through the mode, delay is caused by the execution of the operations of selecting the new master node, copying data and the like, so that the slave node can be directly upgraded to the new master node for quickly recovering the service access of the database, the selection of the backup node is continuously executed after the database read-write service is normally provided to the outside through the new master node, and the master-slave cluster is recovered through the new master node and the backup node.
In conclusion, when a database node fails, the method and the system can quickly restore the master-slave database cluster in a backup node creating mode, automatically restore the master-slave replication relationship among the databases, improve the restoration speed, realize the quick switching of the failed database, and improve the high availability of the database. In addition, before the database is created in the backup node, whether the data in the node providing the service is the latest after the fault database is automatically switched can be judged, and the automatic switching can be carried out only when the data in the node providing the service after the switching is the latest, so that the high availability of the master-slave database on the premise of not losing the data is realized, and the data consistency between the master-slave nodes is realized.
Based on the database fault processing method described in the foregoing embodiment, in this embodiment, in order to enable a target node to quickly provide a service to the outside after a fault occurs, and to avoid the influence of a node fault on a service as much as possible, the following operations need to be performed before a database is created in a backup node in the present solution: canceling the master-slave copy relationship between the database of the node and the database of the target node; if the node is a first main node and the target node is a first slave node, providing database read-write service through the first main node; and if the node is a second slave node, the target node is a second master node, the second slave node is upgraded to a third master node, and database reading and writing services are provided through the third master node.
In addition, the process of creating a database in the backup node in this embodiment specifically includes: and selecting a third slave node for creating the database from the plurality of backup nodes according to the health state of each backup node.
When the database is created in the backup node, the node used for creating the database needs to be selected from the multiple backup nodes according to a preset node selection rule. The node selection rule may specifically be: the nodes creating the database are selected according to the health status of each backup node, and the health status may be embodied by different parameters, specifically, at least one of parameters such as a disk status, a remaining memory, a Central Processing Unit (CPU) busy level, and the like, for example: if the node selection rule is that the nodes for creating the database are selected according to the CPU busy degree of each backup node, when the database is created, the CPU busy degree of each backup node is firstly determined, and the backup node with the lowest busy degree is selected as the node for creating the database.
Moreover, as can be seen from the foregoing embodiments, if the node is the first master node and the target node with the failure is the first slave node, in order to recover the master-slave copy relationship, the backup node that creates the database needs to be used as the slave node, so that the backup node and the first master node recover the master-slave copy relationship; if the node is a second slave node, the target node with the fault is a second master node, and the second slave node needs to be upgraded to a third master node at this time, then in order to restore the master-slave copy relationship, the backup node creating the database also needs to be used as a slave node, so that the master-slave copy relationship between the backup node and the third master node can be restored. Thus, in this embodiment, the slave node that created the database will be selected from the plurality of backup nodes as the third slave node.
Referring to fig. 3, which is a flowchart of a slave node fault processing disclosed in the embodiment of the present invention, if the node is a first master node, the target node is a first slave node at this time, and if the database management service of the first master node monitors that the first slave node has a fault, the master-slave copy relationship between the database of the first master node and the database of the first slave node is cancelled at this time, the database read-write service is provided by the first master node, and at this time, a third slave node is selected from the backup nodes. It should be noted that, in order to distinguish the third slave node selected when the slave node fails from the third slave node selected when the master node fails, the third slave node selected in fig. 3 is labeled as a third slave node (1), and the third slave node selected in fig. 4 is labeled as a third slave node (2). Further, after the third slave node (1) is selected, a database is created in the third slave node (1), the data in the database of the first master node is copied to the database of the third slave node (1), and a master-slave replication relationship between the database of the first master node and the database of the third slave node (1) is established. Referring to fig. 4, which is a flowchart of the master node failure processing disclosed in the embodiment of the present invention, if the node is the second slave node, the target node is the second master node at this time, if the database management service of the second slave node monitors that the second master node fails, the master-slave copy relationship between the database of the second master node and the database of the second slave node is cancelled at this time, the second slave node is upgraded to the third master node, the third master node provides the database read-write service, and selects the third slave node (2) from the backup node, creates a database in the third slave node (2), copies the data in the database of the third master node to the database of the third slave node (2), and establishes the master-slave copy relationship between the database of the third master node and the database of the third slave node (2).
In conclusion, when a database node fails, the master-slave database cluster can be quickly recovered by creating a backup node; and if the slave node fails, the master node can provide database read-write service to the outside in time, and if the master node fails, the slave node can be quickly upgraded to a new master node and then provides database read-write service to the outside.
It can be known from the foregoing embodiments that, in order to enable a target node to quickly provide a service to the outside after a failure occurs, a database read-write service needs to be provided through a first host node/a third host node that has not failed, so as to avoid the influence of the node failure on a service as much as possible. In this embodiment, after providing database read-write service through the non-failed first master node/third master node, before creating a database in the backup node, it is further required to detect whether the first master node/third master node has a non-failed slave node; if not, continuing to execute the step of creating a database in the backup node; and if so, providing database read-write service through the first main node/the third main node and the corresponding slave nodes without faults.
Specifically, a master-slave database cluster may have a master node and a slave node, and may also have a master node and a plurality of slave nodes, so in this embodiment, after providing a database read-write service through a master node that has not failed, it may also be detected whether a master node that provides a service to the outside has a slave node, if there is a slave node, it indicates that the current master-slave database cluster may provide a normal service to the outside through a master-slave node that has not failed, and even if any one of the master-slave nodes fails again, the outside service may be continued through the remaining nodes that have not failed, so in order to save system resources, when the master node has a corresponding slave node, the operation of creating a database in a backup node may not be performed temporarily, and a database read-write service is provided through a master node that has not failed and a corresponding slave node; if the master node does not have a corresponding slave node, although the service can be provided to the outside through the master node at this time, if the master node fails at this time, the service cannot be provided to the outside continuously, so in the present application, if the master node does not have a corresponding slave node, a third slave node needs to be selected, a database is created in the third slave node, data in the database of the master node is copied to the database of the third slave node, and a master-slave replication relationship between the database of the master node and the database of the third slave node is established to normally provide the service to the outside. It should be noted that the master node that does not fail in this embodiment may be the first master node that does not fail in the foregoing embodiment, or may be the third master node that is obtained by upgrading the second slave node that does not fail in the foregoing embodiment, which is not specifically limited herein.
In summary, in this embodiment, only when the master node that has not failed does not have a corresponding slave node, the slave node is selected from the backup nodes to create the database, and if the master node that has not failed has a corresponding slave node, the current master node and the current slave node can continue to provide services to the outside.
Based on the database fault processing method in the foregoing embodiment, in this embodiment, the process of determining whether the latest data is stored in the database of the node specifically includes: judging whether the node is in a data consistency state or not; if yes, judging that the latest data is stored in the database of the node; if not, judging whether the node is a main node or not; if the node is a main node, judging that the latest data is stored in the database of the node; and if the node is the slave node, judging that the latest data is not stored in the database of the node.
Specifically, the present embodiment is further provided with a distributed consistency storage unit, where the storage unit is used to record node state information of the latest data stored in the master-slave database cluster, and if the node state information of the node is in a data consistency state, it indicates that data between corresponding master-slave nodes are completely consistent, and then the node necessarily stores the latest data; such as: the master node 1 and the slave node 1 have a master-slave copy relationship, and when the master node 1 fails, whether the node state information of the slave node 1 is in a data consistency state needs to be judged, and if the node state information is in the data consistency state, the latest data is stored in the slave node 1; moreover, when the master node and the slave node copy data, it is inevitable that the master node successfully writes data and then copies data to the slave node, and therefore even if data consistency is not maintained between the slave node and the master node, the data stored by the master node at this time is still the latest data, so when the present embodiment determines that the present node is not in the data consistency state, it may also be determined whether the present node stores the latest data by the node type, that is: and judging whether the node is a master node, if so, judging that the latest data is stored in the database of the node, and if so, judging that the latest data is not stored in the node.
In summary, in this embodiment, whether the node stores the latest data or not can be determined according to the node state information and the node type, and by this way, it can be ensured that data is not lost when a database failure is automatically switched.
Based on the database fault processing method described in the foregoing embodiment, in this embodiment, the storage unit further stores location information of each node in a master-slave database cluster, and therefore in this embodiment, after a master-slave copy relationship between the database of the local node and the database of the backup node is established, the location information of the local node and the location information of the backup node need to be stored in the storage unit.
It should be noted that, the storage unit only stores the location information of each master-slave node in the current master-slave database cluster, and if the target node is replaced by the backup node after the failure, storing the location information of the node and the location information of the backup node in the storage unit is equivalent to updating the location information of the master-slave database cluster in the storage unit, and the location information of the failed target node is not stored in the storage unit.
Therefore, in this embodiment, if the failure of the target node is recovered, the target node may determine whether the database of the target node and the database of the local node have a master-slave copy relationship according to the location information in the storage unit; if not, deleting the database of the target node; if yes, database reading and writing services are continuously provided through the node and the target node. As shown in fig. 3, when the failure of the first slave node occurs, if the failure of the first slave node is recovered at this time, the database management service in the first slave node reads the location information of the database node in the storage unit, and if the location information of the first slave node is not found, it indicates that the first slave node is not a slave node of the first master node, and then the database instance on the first slave node is closed, and the first slave node at this time may serve as a backup node. As shown in fig. 4, when the failure of the second host node occurs, if the failure of the second host node is recovered at this time, the database management service in the second host node reads the location information of the database node in the storage unit, and if the location information of the second host node is not found, it indicates that the second host node is not a host node, and then the database instance on the second host node is closed, and the second host node at this time may also serve as a backup node; after the backup node is recovered, the target node with the fault can be replaced after any node in the master-slave database cluster fails.
It can be understood that, when the application switches databases, the database management services on a plurality of database nodes perform mutual exclusion by adding distributed locks, and it is ensured that only one node can complete automatic switching, for example: after the master node fails, if the number of the slave nodes is multiple, each slave node detects the master node failure, and in order to avoid the situation that each slave node selects a backup node to create a database, only one slave node can select the backup node and create the database in a distributed locking manner, so that switching of the failure database is realized.
In conclusion, in order to realize high availability of the master database and the slave database, the automatic operation and maintenance scheme of the master database and the slave database without data loss is provided. In the scheme, after the position information of a master node and a slave node where a database is located is selected, a database management service can be operated in the master node and the slave node, and the database management service is responsible for maintaining the life cycle of a master database cluster and the slave database cluster, such as: starting a database instance in a master node and a slave node by a database management service, monitoring the running state of the master database and the slave database, and if the master database and the slave database are monitored to be abnormal, performing automatic operation and maintenance treatment, quickly recovering service access of the databases, and recovering the master database and the slave database to be normal; in addition, the database management service stores the information of which node data in the database stores the latest data in the consistency protocol storage in the life cycle of the master-slave database cluster, and the switching operation of the database can be executed only if the data cannot be lost after being switched, so that the high availability is realized on the premise that the data of the master-slave database is not lost; further, when the database automation operation and maintenance is carried out, the information in the consistency storage is synchronously locked and updated, so that the accuracy of the automation operation and maintenance is ensured.
The processing apparatus, the processing device, and the storage medium according to the embodiments of the present invention are described below, and the processing apparatus, the processing device, and the storage medium described below may be referred to the processing method described above.
Referring to fig. 5, a schematic structural diagram of a database fault processing apparatus provided in an embodiment of the present invention includes:
a determining module 21, configured to determine a target node corresponding to the node; the database of the node and the database of the target node have a master-slave copy relationship;
a creating module 22, configured to create a database in a backup node when it is monitored that the target node fails and the latest data is stored in the database of the local node;
the copying module 23 is configured to copy the data in the database of the local node to the database of the backup node;
and the setting module 24 is configured to establish a master-slave copy relationship between the database of the local node and the database of the backup node.
Based on the foregoing embodiment, in this embodiment, the database fault processing apparatus further includes:
the processing module is used for canceling the master-slave copy relationship between the database of the node and the database of the target node before creating the database in the backup node;
the system comprises an execution module, a first master node and a second master node, wherein the execution module is used for providing database reading and writing service through the first master node when the node is the first master node and a target node is a first slave node; and when the node is a second slave node and the target node is a second main node, the second slave node is upgraded to a third main node, and database reading and writing service is provided through the third main node.
Based on the foregoing embodiment, in this embodiment, the creating module 22 is specifically configured to: and selecting a third slave node for creating the database from the plurality of backup nodes according to the health state of each backup node.
Based on the foregoing embodiment, in this embodiment, the database fault processing apparatus further includes:
a detection module, configured to detect whether the first master node/the third master node has a slave node that has not failed; if the first main node/the third main node does not have a slave node without a fault, triggering the creation module 22 to create a database in the backup node; and if the first main node/the third main node has the slave node which does not fail, providing database read-write service through the first main node/the third main node and the corresponding slave node which does not fail.
Based on the foregoing embodiment, in this embodiment, the database fault processing apparatus further includes:
the first judging module is used for judging whether the node is in a data consistency state or not; if the data are in a consistent state, judging that the latest data are stored in the database of the node; if the data is not in a consistent state, triggering a second judgment module;
the second judgment module is used for judging whether the node is a main node or not; if the local node is a main node, judging that the latest data is stored in the database of the local node; and if the local node is the slave node, judging that the latest data is not stored in the database of the local node.
Based on the foregoing embodiment, in this embodiment, the database fault processing apparatus further includes:
and the storage module is used for storing the position information of the node and the position information of the backup node into a storage unit after establishing a master-slave copy relationship between the database of the node and the database of the backup node.
Based on the foregoing embodiment, in this embodiment, the database fault processing apparatus further includes:
a third judging module, configured to, when the failure of the target node is recovered, judge, by the target node, whether a master-slave copy relationship exists between the database of the target node and the database of the local node according to the location information in the storage unit; if the master-slave copy relationship does not exist, deleting the database of the target node; and if the master-slave copy relationship exists, continuously providing database read-write service through the local node and the target node.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention; the apparatus comprises:
a memory 31 for storing a computer program;
a processor 32 for implementing the steps of the database fault handling method according to any of the above-described method embodiments when executing the computer program.
In this embodiment, the device may be a PC (Personal Computer), and may also be a terminal device such as a smart phone, a tablet Computer, a palmtop Computer, and a portable Computer.
The device may include a memory 31, a processor 32, and a bus 33.
The memory 31 includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program, and the memory provides an environment for the operating system and the execution of computer readable instructions in the non-volatile storage medium. The processor 32 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip in some embodiments, and provides computing and controlling capability for the gateway device, and when executing the computer program stored in the memory 31, the steps of executing the database fault Processing method disclosed in any of the foregoing embodiments may be implemented.
The bus 33 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
Further, the device may further include a network interface 34, and the network interface 34 may optionally include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are generally used to establish a communication connection between the device and other electronic devices.
Fig. 6 only shows the device with the components 31-34, and it will be understood by those skilled in the art that the structure shown in fig. 6 does not constitute a limitation of the device, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when being executed by a processor, the computer program realizes the steps of the database fault processing method according to any method embodiment.
Wherein the storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A database fault processing method is characterized by comprising the following steps:
determining a target node corresponding to the node; the database of the node and the database of the target node have a master-slave copy relationship;
if the target node is monitored to have a fault and the latest data is stored in the database of the local node, a database is created in the backup node, the data in the database of the local node is copied to the database of the backup node, and a master-slave replication relationship between the database of the local node and the database of the backup node is established.
2. The database fault handling method of claim 1, wherein creating the database in the backup node further comprises:
canceling the master-slave copy relationship between the database of the node and the database of the target node;
if the node is a first main node and the target node is a first slave node, providing database reading and writing service through the first main node;
and if the node is a second slave node, the target node is a second master node, the second slave node is upgraded to a third master node, and database reading and writing service is provided through the third master node.
3. The method of claim 2, wherein the creating a database in a backup node comprises:
a third slave node for creating the database is selected from the plurality of backup nodes based on the health status of each backup node.
4. The database fault handling method according to claim 2, wherein after providing the database read/write service through the first host node/the third host node, the method further comprises:
detecting whether the first master node/the third master node has a non-failed slave node;
if not, continuing to execute the step of creating the database in the backup node;
and if so, providing database read-write service through the first main node/the third main node and the corresponding slave nodes without faults.
5. The database fault handling method according to claim 1, wherein storing the latest data in the database of the local node includes:
judging whether the node is in a data consistency state or not;
if yes, judging that the latest data is stored in the database of the node;
if not, judging whether the node is a main node or not;
if the local node is a main node, judging that the latest data is stored in the database of the local node; and if the local node is the slave node, judging that the latest data is not stored in the database of the local node.
6. The method according to any one of claims 1 to 5, wherein after the establishing of the master-slave copy relationship between the database of the local node and the database of the backup node, the method further comprises:
and storing the position information of the local node and the position information of the backup node in a storage unit.
7. The database fault handling method of claim 6, further comprising:
if the fault of the target node is recovered, the target node judges whether a master-slave copy relationship exists between the database of the target node and the database of the local node or not according to the position information in the storage unit;
if not, deleting the database of the target node;
if yes, database reading and writing services are continuously provided through the local node and the target node.
8. A database fault handling apparatus, comprising:
the determining module is used for determining a target node corresponding to the node; the database of the node and the database of the target node have a master-slave copy relationship;
the creating module is used for creating a database in the backup node when monitoring that the target node fails and the latest data is stored in the database of the node;
the copying module is used for copying the data in the database of the node to the database of the backup node;
and the setting module is used for establishing a master-slave replication relationship between the database of the node and the database of the backup node.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the database fault handling method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the database fault handling method according to any one of claims 1 to 7.
CN202210969279.9A 2022-08-12 2022-08-12 Database fault processing method, device, equipment and storage medium Pending CN115269556A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210969279.9A CN115269556A (en) 2022-08-12 2022-08-12 Database fault processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210969279.9A CN115269556A (en) 2022-08-12 2022-08-12 Database fault processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115269556A true CN115269556A (en) 2022-11-01

Family

ID=83750367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210969279.9A Pending CN115269556A (en) 2022-08-12 2022-08-12 Database fault processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115269556A (en)

Similar Documents

Publication Publication Date Title
CN110807064B (en) Data recovery device in RAC distributed database cluster system
CN104036043B (en) High availability method of MYSQL and managing node
CN107480014B (en) High-availability equipment switching method and device
US7730029B2 (en) System and method of fault tolerant reconciliation for control card redundancy
CN110532278B (en) High availability method of declarative MySQL database system
US20080288812A1 (en) Cluster system and an error recovery method thereof
CN112199240B (en) Method for switching nodes during node failure and related equipment
CN112506710B (en) Distributed file system data restoration method, device, equipment and storage medium
CN115562911B (en) Virtual machine data backup method, device, system, electronic equipment and storage medium
CN108243031B (en) Method and device for realizing dual-computer hot standby
JPS6375963A (en) System recovery system
CN114020509A (en) Method, device and equipment for repairing work load cluster and readable storage medium
CN113986450A (en) Virtual machine backup method and device
CN115269556A (en) Database fault processing method, device, equipment and storage medium
CN108964992B (en) Node fault detection method and device and computer readable storage medium
CN113778763B (en) Intelligent switching method and system for three-way interface service faults
CN111258823A (en) Method and system for switching master server and slave server
CN117591014A (en) Data read-write control method, device, equipment and computer readable storage medium
CN111090491A (en) Method and device for recovering task state of virtual machine and electronic equipment
CN112269693B (en) Node self-coordination method, device and computer readable storage medium
CN115268785A (en) Management method and device applied to distributed storage system and storage medium
CN114741220A (en) Disk isolation method, system, device and storage medium
CN114691771A (en) Master-slave copying method and device for database, computer equipment and storage medium
CN111124729A (en) Fault disk determination method, device, equipment and computer readable storage medium
CN110597609A (en) Cluster migration and automatic recovery method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination