[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107818027B - Method and device for switching main name node and standby name node and distributed system - Google Patents

Method and device for switching main name node and standby name node and distributed system Download PDF

Info

Publication number
CN107818027B
CN107818027B CN201710963019.XA CN201710963019A CN107818027B CN 107818027 B CN107818027 B CN 107818027B CN 201710963019 A CN201710963019 A CN 201710963019A CN 107818027 B CN107818027 B CN 107818027B
Authority
CN
China
Prior art keywords
version number
switching
name node
node
main
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710963019.XA
Other languages
Chinese (zh)
Other versions
CN107818027A (en
Inventor
彭安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710963019.XA priority Critical patent/CN107818027B/en
Publication of CN107818027A publication Critical patent/CN107818027A/en
Application granted granted Critical
Publication of CN107818027B publication Critical patent/CN107818027B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/142Reconfiguring to eliminate the error
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method and a device for switching main and standby name byte points and a distributed system, and relates to the field of computers. One embodiment of the method comprises: receiving a switching notification sent by a distributed coordination server, and acquiring a switching version number stored by the distributed coordination server; wherein, the switching notice is used for informing the standby name node to be switched into the main name node; generating an updating version number according to the switching version number, and sending a deleting instruction to the main name node; wherein the deletion instruction carries the update version number; and determining a switching version number stored by the primary name node, and executing the deleting instruction to the primary name node when the switching version number and the updating version number accord with a preset switching rule so as to switch the standby name node into the primary name node. The embodiment can effectively realize the automatic switching of the main name node and the standby name node and ensure the stability of the cluster.

Description

Method and device for switching main name node and standby name node and distributed system
Technical Field
The invention relates to the field of computers, in particular to a method and a device for switching a main backup of a name byte point and a distributed system.
Background
With the advent of the big data age, distributed systems represented by Hadoop (Hadoop is a distributed system infrastructure) are increasingly widely used. In a Hadoop cluster, name nodes (namenodes) are used to manage the namespace for the entire distributed cluster, and data nodes (datanodes) are used to store and retrieve data blocks under the schedule of the name nodes.
In practical application, in order to prevent the problem that the whole cluster cannot be accessed due to the paralysis of name nodes, at least two name nodes are required to be arranged in the cluster to form mutual backup. The name node in an active (active) state is a main name node, and at least one name node in a standby (standby) state is a standby name node.
In the prior art, the active/standby switching of name nodes is generally performed through the following steps:
1. a Zookeeper (Zookeeper is a distributed application program coordination service) server in a distributed cluster monitors a main name node and a standby name node through a Zookeeper failure switching controller (Zkeeper failure switching controller) process;
if the Zookeeper server side monitors that the main name node is disconnected, a switching notice is sent to any standby name node, and the state of the standby name node is changed from 'standby' to 'active';
3. after receiving the switching notification, the standby name node sends a deleting instruction to the main name node; after the deletion is completed, the standby name node becomes the main name node, and the main-standby switching is completed.
In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:
in step 3 in the prior art, if the backup name node is disconnected after the deletion instruction is sent to the primary name node and before the deletion instruction is executed, the Zookeeper server modifies the state of the backup name node to "standby" and modifies the state of the primary name node to "active". At this time, the deleting instruction is sent out, and the main name node is about to be deleted; although the standby name node is connected, the state of the standby name node is 'standby', and two name nodes in the cluster cannot provide services, so that a server is down and the cluster is unavailable.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, and a distributed system for switching between main and standby name nodes, which can effectively implement automatic switching between main and standby name nodes, and ensure stability of a cluster.
To achieve the above object, according to an aspect of the present invention, a method for switching between main and standby namebyte points is provided.
The method for switching the main name node and the standby name node comprises the following steps: receiving a switching notification sent by a distributed coordination server, and acquiring a switching version number stored by the distributed coordination server; wherein, the switching notice is used for informing the standby name node to be switched into the main name node; generating an updating version number according to the switching version number, and sending a deleting instruction to the main name node; wherein the deletion instruction carries the update version number; and determining a switching version number stored by the primary name node, and executing the deleting instruction to the primary name node when the switching version number and the updating version number accord with a preset switching rule so as to switch the standby name node into the primary name node.
Optionally, the method further comprises: and when the switching version number and the updating version number do not accord with the preset switching rule, the deleting instruction is not executed to the main name node.
Optionally, the obtained switching version number stored by the distributed coordination server is stored in a permanent node of the distributed coordination server.
Optionally, the method further comprises: and when the standby name node is switched to be the main name node, storing the updated version number in the permanent node of the distributed coordination server.
Optionally, the update version number is: the sum of the acquired switching version number stored by the distributed coordination server and a preset value is obtained, and the preset switching rule comprises the following steps: the update version number is greater than the switch version number stored by the primary name node.
Optionally, prior to determining the switch version number stored by the primary name node: if the main name node does not receive the switching notification sent by the distributed coordination server, the switching version number stored by the main name node is the same as the switching version number stored by the distributed coordination server; and if the main name node receives a switching notification sent by the distributed coordination server, the switching version number stored by the main name node is the sum of the switching version number stored by the distributed coordination server and the preset value.
Optionally, the method further comprises: before sending a deleting instruction to a main name node, remotely logging in the main name node; and the distributed coordination server is a Zookeeper server.
To achieve the above object, according to another aspect of the present invention, an apparatus for switching between main and standby namebyte points is provided.
The device for switching the main name node and the standby name node in the embodiment of the invention can comprise: the version number acquisition module can be used for receiving a switching notification sent by the distributed coordination server and acquiring a switching version number stored by the distributed coordination server; wherein, the switching notice is used for informing the standby name node to be switched into the main name node; the version number updating module can be used for generating an updated version number according to the switching version number and sending a deleting instruction to the main name node; wherein the deletion instruction carries the update version number; and the execution module can be used for determining the switching version number stored by the primary name node, and executing the deleting instruction on the primary name node when the switching version number and the updating version number accord with a preset switching rule so as to switch the standby name node into the primary name node.
Optionally, the execution module may be further configured to: when the switching version number and the updating version number do not accord with the preset switching rule, the deleting instruction is not executed to the main name node; the acquired switching version number stored by the distributed coordination server is stored in a permanent node of the distributed coordination server; and the apparatus may further comprise: and the version number returning module can be used for storing the updated version number in the permanent node of the distributed coordination server when the standby name node is switched to be the main name node.
Optionally, the update version number is: the sum of the acquired switching version number stored by the distributed coordination server and a preset value is obtained, and the preset switching rule comprises the following steps: the updating version number is larger than the switching version number stored by the main name node, and the distributed coordination server is a Zookeeper server; prior to determining the switch version number stored by the primary name node: if the main name node does not receive the switching notification sent by the distributed coordination server, the switching version number stored by the main name node is the same as the switching version number stored by the distributed coordination server; if the primary name node receives the switching notification sent by the distributed coordination server, the switching version number stored by the primary name node is as follows: the sum of the switching version number stored by the distributed coordination server and the preset value; and the version number update module may be further operable to: before sending a delete instruction to a primary name node, remotely logging in at the primary name node.
To achieve the above object, according to still another aspect of the present invention, a distributed system is provided.
The distributed system of the embodiment of the invention can comprise: the Zookeeper client comprises a Zookeeper server, a main name node serving as the Zookeeper client and at least one standby name node; wherein: the Zookeeper server can be used to: storing the switching version number; sending a switching notice; wherein, the switching notice is used for informing the standby name node to be switched into the main name node; any alternate name node may be used to: receiving a switching notification, and acquiring a switching version number stored by a Zookeeper server; generating an update version number according to the switching version number, and sending a deletion instruction carrying the update version number to the primary name node; and determining the switching version number stored by the primary name node, and executing the deleting instruction to the primary name node when the switching version number and the updating version number accord with a preset switching rule so as to switch the standby name node into the primary name node.
Optionally, the alternate name node may be further operable to: when the switching version number and the updating version number do not accord with the preset switching rule, the deleting instruction is not executed to the main name node; the switching version number stored by the Zookeeper server acquired by the standby name node is stored in a permanent node of the Zookeeper server; and the alternate name node is further operable to: and when the standby name node is switched to be the main name node, storing the updated version number in the permanent node of the Zookeeper server.
Optionally, the update version number is: the sum of the switching version number stored by the obtained Zookeeper server and a preset value, where the preset switching rule may include: the updating version number is larger than the switching version number stored by the main name node; prior to determining the switch version number stored by the primary name node: if the main name node does not receive the switching notification sent by the Zookeeper server, the switching version number stored by the main name node is the same as the switching version number stored by the Zookeeper server; if the main name node receives the switching notification sent by the Zookeeper server, the switching version number stored by the main name node is as follows: the switching version number stored by the Zookeeper server is added to the preset value; and the alternate name node is further operable to: before sending a delete instruction to a primary name node, remotely logging in at the primary name node.
To achieve the above object, according to still another aspect of the present invention, there is provided an electronic apparatus.
An electronic device of the present invention includes: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the method for switching the main and standby name nodes provided by the invention.
To achieve the above object, according to still another aspect of the present invention, there is provided a computer-readable storage medium.
The invention relates to a computer readable storage medium, on which a computer program is stored, wherein the computer program realizes the method for switching between main and standby name nodes provided by the invention when being executed by a processor.
According to the technical scheme of the invention, one embodiment of the invention has the following advantages or beneficial effects: by storing the switching version number in the permanent node of the Zookeeper server, the standby name node can generate an updating version number based on the switching version number stored by the Zookeeper server after receiving the switching notification, and then executes the deleting instruction when the updating version number is larger than the switching version number stored by the main name node, so that the automatic switching of the main and standby nodes is realized, the defect that the main and standby name nodes are easy to lose efficacy simultaneously in the main and standby switching process in the prior art is overcome, and the stability and the high availability of the cluster are ensured.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram illustrating main steps of a method for switching between main and standby name nodes according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a main portion of a device for switching between main and standby name nodes according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the main components of a distributed system according to an embodiment of the invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 5 is a schematic structural diagram of an electronic device for implementing the method for switching between main and standby name nodes according to the embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
According to the technical scheme of the embodiment of the invention, the switching version number is stored in the permanent node of the Zookeeper server, so that the standby name node can generate the updating version number based on the switching version number stored in the Zookeeper server after receiving the switching notification, and the deleting instruction is executed when the updating version number is larger than the switching version number stored in the main name node, so that the automatic switching of the main and standby nodes is realized, the defect that the main and standby name nodes are easy to fail simultaneously in the main and standby switching process in the prior art is avoided, and the stability and the high availability of the cluster are ensured.
Fig. 1 is a schematic diagram of the main steps of the method according to the present embodiment.
As shown in fig. 1, the method for switching between master and standby name nodes according to the embodiment of the present invention may be executed according to the following steps:
step S101: and the standby name node receives the switching notification sent by the distributed coordination server to acquire the switching version number stored by the distributed coordination server.
In this step, the switching notification is information sent by the distributed coordination server and used for notifying the standby name node to switch to the primary name node, that is, the switching notification changes the state of the standby name node from "standby" to "active". In specific application, the distributed coordination server is connected with the main name node and the standby name node through a plurality of Zkfc processes, and the Zkfc processes are used for sending heartbeat signals to monitor the name nodes. When the distributed coordination server side finds that the main name node is disconnected, the distributed coordination server side sends a switching notice to the standby name node. In particular, the switching version number refers to a specific version number for active-standby switching of name nodes. In the embodiment of the invention, the automatic switching of the main node and the standby node is realized by judging the switching version numbers stored by the nodes with different names. Preferably, the distributed coordination server is a Zookeeper server.
It can be understood that, the Zookeeper server sends the switching notification to the standby name node and also sends the notification to the main name node, so that the state of the Zookeeper server is changed from "active" to "standby".
In the embodiment of the present invention, before step S101, an initialization process is first performed by:
1. and creating a data node Znode at the Zookeeper server side for storing the initial switching version number.
Wherein the data node is preferably a permanent node for storing the switching version number for a longer time. For the permanent node, the life cycle is independent of the session, and the permanent node can be deleted only when the client executes the deletion operation. The initial handoff version number may be set according to an application environment, for example, the initial handoff version number is set to 0.
2. When any standby name node receives a switching notification sent by the Zookeeper server, the standby name node reads the switching version number from the data node of the Zookeeper server, generates an update version number according to the read switching version number, and stores the update version number in the standby name node. And after the switching is completed, further writing the updated version number into the data node of the Zookeeper server. In specific application, the standby name node reads and writes switching version number data at a Zookeeper server through a Zkfc process.
Preferably, the updated version number is obtained by adding a preset value, such as 1, to the read switching version number. For example: if the initial switching version number stored by the data node of the Zookeeper server is 0, after a switching notification is sent to a standby name node for the first time, the standby name node reads the switching version number 0 from the data node of the Zookeeper server, increases the switching version number 0 by a preset value 1 to obtain an updated version number 1, stores the updated version number 1, and writes the updated version number 1 into the data node of the Zookeeper server after switching is completed. At this time, the switching version number stored by the data node of the Zookeeper server is 1.
In practical applications, after receiving the switching notification, the backup name node also needs to send a deletion instruction to the primary name node, and after deleting the primary name node, the backup name node can become a new primary name node.
Step S102: and the standby name node generates an updated version number according to the switching version number and sends a deleting instruction to the main name node.
In this step, after reading the switching version number stored by the data node of the Zookeeper server, the standby name node generates an update version number according to the switching version number by using a preset calculation rule, stores the update version number, generates a deletion instruction carrying the update version number, and sends the deletion instruction to the main name node. It will be appreciated that the update version number is a switching version number of an update. In a specific application, the calculation rule may be to add a preset value to the switching version number.
As a preferred approach, the backup name node may first log in remotely at the primary name node before sending the delete instruction to the primary name node. Specifically, the backup name node may remotely log in to the primary name node by using a GRPC (Google Remote Procedure Call Protocol) or an SSH (Secure Shell Protocol), and then send a delete instruction to the primary name node. Preferably, the delete command may be a kill command under Linux (Linux is a set of Unix-like operating systems that are free to use and propagate).
Step S103: and the standby name node determines the switching version number stored by the main name node, and executes a deleting instruction to the main name node when the switching version number and the updating version number accord with a preset switching rule so as to switch the standby name node into the main name node.
In this step, before the standby name node executes the delete instruction, it first needs to determine the switching version number stored in the primary name node. Generally, there are two cases when the switch version number stored by the primary name node exists:
the Zookeeper server does not send a switching notification to the main name node. Specifically, at this time, the standby name node is not disconnected from the Zookeeper server, and the switching version number stored by the primary name node is the switching version number updated when the primary name node previously obtained the active state. Since the switching version number is stored in the data node of the Zookeeper server after the previous switching is successful, the switching version number stored by the primary name node is the same as the switching version number stored by the Zookeeper server acquired by the standby name node.
The Zookeeper server has sent a switch notification to the primary name node. Specifically, at this time, the standby name node is disconnected from the Zookeeper server, and the Zookeeper server sends a switching notification to the primary name node, so that the state of the primary name node is changed from "standby" to "active". And then, adding preset data on the stored switching version number by the primary name node to obtain and store a new switching version number. Thus, the switching version number currently stored by the primary name node is the new switching version number. It can be understood that the new switching version number is the sum of the switching version number stored by the Zookeeper server and a preset value.
In the prior art, after a deletion instruction is sent, the deletion instruction is necessarily executed, so that the primary name node is necessarily deleted. At this time, if the spare name byte point is disconnected from the Zookeeper server, both name nodes in the cluster cannot provide services, and thus faults such as server downtime, cluster unavailability and the like are caused.
In order to solve the above problems in the prior art, in step S103, before executing the delete instruction, it needs to be determined whether the switching version number stored in the primary name node and the update version number carried in the delete instruction conform to the preset switching rule, and the delete instruction can be executed on the primary name node only when the switching rule is satisfied; if the two are not in accordance with the switching rule, the deleting instruction is not executed to the main name node, the standby name node exits from the active state, and the main name node is recovered to the active state.
In practical application, the switching rule may be set according to the service requirement. In the embodiment of the present invention, the switching rule may be: and the updating version number carried by the deleting instruction is larger than the switching version number stored by the main name node.
For example: if the switching version number stored by the Zookeeper server side is 1 before the standby name node receives the switching notification, the standby name node reads the switching version number 1, adds a preset value 1 to the switching version number to obtain an updated version number 2, and sends a deletion instruction carrying the updated version number 2 to the main name node.
If the standby name node is not disconnected with the Zookeeper server before the deletion instruction is executed, the main name node does not receive the switching notification, the stored switching version number is the same as the version number stored by the Zookeeper server and is 1, the updating version number 2 carried by the deletion instruction is larger than the switching version number 1 stored by the main name node, then the deletion instruction is executed to delete the main name node, and the standby name node becomes a new main name node.
If the standby name node is disconnected from the Zookeeper server before the delete instruction is executed, the primary name node receives the switch notification. And then, the main name node adds a preset value 1 to the stored switching version number 1 to obtain a new switching version number 2, the updating version number 2 carried by the deleting instruction is not greater than the new switching version number 2 stored by the main name node, the deleting instruction is not executed on the main name node, the standby name node exits from the active state, and the main name node is restored to the active state.
Therefore, whether the deleting instruction is executed or not is determined by comparing the updated version number with the version number stored by the main name node, so that no matter whether the Zookeeper server sends a switching notification to the main name node or not in the process of switching the main name and the standby name, the main name node capable of providing service inevitably exists in the distributed cluster, the stability of the cluster is improved, and the high availability of the cluster is ensured.
It can be understood that if the backup name node becomes the new primary name node, the updated version number is written into the data node of the Zookeeper server through the Zkfc process. And if the primary name node is recovered to be in an active state, keeping the switching version number stored in the data node of the Zookeeper server unchanged.
According to the method provided by the embodiment of the invention, the switching version number is stored in the permanent node of the Zookeeper server, so that the standby name node can generate the updated version number based on the switching version number stored in the Zookeeper server after receiving the switching notification, and the technical means of deleting the instruction is executed when the updated version number is greater than the switching version number stored in the main name node is adopted, thereby realizing the automatic switching of the main and standby nodes, avoiding the defect that the main and standby name nodes are easy to fail simultaneously in the main and standby switching process in the prior art, and ensuring the stability and high availability of the cluster.
Fig. 2 is a schematic diagram of a main part of a device for primary/standby switching of name nodes according to an embodiment of the present invention.
As shown in fig. 2, a device 200 for switching between active and standby name nodes according to an embodiment of the present invention may include: a version number obtaining module 201, a version number updating module 202 and an executing module 203.
Wherein: the version number obtaining module 201 may be configured to receive a switching notification sent by the distributed coordination server, and obtain a switching version number stored by the distributed coordination server; wherein, the switching notice is used for informing the standby name node to be switched into the main name node;
the version number updating module 202 may be configured to generate an updated version number according to the switching version number, and send a deletion instruction to the primary name node; wherein the deletion instruction carries the update version number;
the executing module 203 may be configured to determine a switching version number stored in the primary name node, and execute the deleting instruction on the primary name node when the switching version number and the update version number meet a preset switching rule, so that the standby name node is switched to the primary name node.
Generally, the execution module 203 may be further configured to: and when the switching version number and the updating version number do not accord with the preset switching rule, the deleting instruction is not executed to the main name node.
Preferably, the acquired switching version number stored by the distributed coordination server is stored in a permanent node of the distributed coordination server.
As a preferred scheme, the device 200 further comprises: and the version number returning module can be used for storing the updated version number in the permanent node of the distributed coordination server when the standby name node is switched to be the main name node.
In practical applications, the update version number may be: the sum of the acquired switching version number stored by the distributed coordination server and a preset value, where the preset switching rule may include: the updating version number is larger than the switching version number stored by the main name node, and the distributed coordination server is a Zookeeper server.
In this embodiment of the present invention, before determining the switching version number stored by the primary name node: if the main name node does not receive the switching notification sent by the distributed coordination server, the switching version number stored by the main name node is the same as the switching version number stored by the distributed coordination server; if the primary name node receives the switching notification sent by the distributed coordination server, the switching version number stored by the primary name node is as follows: and the sum of the switching version number stored by the distributed coordination server and the preset value.
Furthermore, preferably, the version number updating module 202 may be further configured to: before sending a delete instruction to a primary name node, remotely logging in at the primary name node.
According to the technical scheme of the embodiment of the invention, the switching version number is stored in the permanent node of the Zookeeper server, so that the standby name node can generate the updating version number based on the switching version number stored in the Zookeeper server after receiving the switching notification, and the deleting instruction is executed when the updating version number is larger than the switching version number stored in the main name node, so that the automatic switching of the main and standby nodes is realized, the defect that the main and standby name nodes are easy to lose efficacy simultaneously in the main and standby switching process in the prior art is avoided, and the stability and the high availability of the cluster are ensured.
FIG. 3 is a schematic diagram of the main components of a distributed system according to an embodiment of the invention;
as shown in fig. 3, the distributed system of the embodiment of the present invention may include: a Zookeeper server 301 and a primary name node 302 as a Zookeeper client and at least one backup name node 303.
Wherein: the Zookeeper server 301 is configured to: storing the switching version number; sending a switching notice; wherein, the switching notice is used for informing the standby name node to be switched into the main name node;
any alternate name node 303 is used to: receiving a switching notification, and acquiring a switching version number stored by the Zookeeper server 301; generating an update version number according to the switching version number, and sending a deletion instruction carrying the update version number to the primary name node 302; and determining the switching version number stored by the primary name node 302, and executing the deleting instruction to the primary name node 302 when the switching version number and the updated version number accord with a preset switching rule, so that the standby name node 303 is switched to be the primary name node.
In this embodiment of the present invention, the alternate name node 303 is further configured to: and when the switching version number and the updating version number do not accord with the preset switching rule, the deleting instruction is not executed to the main name node.
Preferably, the switching version number stored by the Zookeeper server 301 and acquired by the standby name node 303 is stored in a permanent node of the Zookeeper server 301.
In practice, the backup name node 303 is further configured to: when the standby name node 303 is switched to the primary name node, the updated version number is stored in the permanent node of the Zookeeper server 301.
As a preferred scheme, the update version number is: the sum of the switching version number stored in the Zookeeper server 301 and a preset value is obtained, and the preset switching rule includes: the update version number is greater than the switch version number stored by the primary name node 302.
Preferably, prior to determining the switch version number stored by the primary name node 302: if the primary name node 302 does not receive the switching notification sent by the Zookeeper server 301, the switching version number stored by the primary name node 302 is the same as the switching version number stored by the Zookeeper server 301; if the primary name node 302 receives the switching notification sent by the Zookeeper server 301, the switching version number stored by the primary name node 302 is the sum of the switching version number stored by the Zookeeper server 301 and the preset value.
In addition, in this embodiment of the present invention, the backup name node 303 is further configured to: before sending a delete instruction to the primary name node 302, a remote login is performed at the primary name node 302. It is understood that the Zookeeper server 301 is connected to the primary name node 302 and the backup name node 303 through the Zkfc process.
According to the technical scheme of the embodiment of the invention, the switching version number is stored in the permanent node of the Zookeeper server, so that the standby name node can generate the updating version number based on the switching version number stored in the Zookeeper server after receiving the switching notification, and the deleting instruction is executed when the updating version number is larger than the switching version number stored in the main name node, so that the automatic switching of the main and standby nodes is realized, the defect that the main and standby name nodes are easy to lose efficacy simultaneously in the main and standby switching process in the prior art is avoided, and the stability and the high availability of the cluster are ensured.
Fig. 4 shows an exemplary system architecture 400 of a method for primary/secondary name node switching or a device for primary/secondary name node switching, to which an embodiment of the present invention may be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405 (this architecture is merely an example, and the components included in a particular architecture may be adapted according to application specific circumstances). The network 404 serves as a medium for providing communication links between the terminal devices 401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The terminal devices 401, 402, 403 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 405 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 401, 402, 403. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the method for switching between main and standby name nodes provided in the embodiment of the present invention is generally executed by the server 405, and accordingly, the device for switching between main and standby name nodes is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The invention also provides the electronic equipment.
The electronic device of the embodiment of the invention comprises: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the method for switching the main and standby name nodes provided by the invention.
Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use in implementing an electronic device of an embodiment of the present invention. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the computer system 500 are also stored. The CPU501, ROM 502, and RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, the processes described in the main step diagrams above may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the main step diagram. In the above-described embodiment, the computer program can be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the central processing unit 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor comprises a version number acquisition module, a version number updating module and an execution module. The names of these units do not in some cases form a limitation on the units themselves, and for example, the version number obtaining module may also be described as a "module that sends a switching version number to the version number updating module".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to perform steps comprising: receiving a switching notification sent by a distributed coordination server, and acquiring a switching version number stored by the distributed coordination server; wherein, the switching notice is used for informing the standby name node to be switched into the main name node; generating an updating version number according to the switching version number, and sending a deleting instruction to the main name node; wherein the deletion instruction carries the update version number; and determining a switching version number stored by the primary name node, and executing the deleting instruction to the primary name node when the switching version number and the updating version number accord with a preset switching rule so as to switch the standby name node into the primary name node.
According to the technical scheme of the embodiment of the invention, the switching version number is stored in the permanent node of the Zookeeper server, so that the standby name node can generate the updating version number based on the switching version number stored in the Zookeeper server after receiving the switching notification, and the deleting instruction is executed when the updating version number is larger than the switching version number stored in the main name node, so that the automatic switching of the main and standby nodes is realized, the defect that the main and standby name nodes are easy to lose efficacy simultaneously in the main and standby switching process in the prior art is avoided, and the stability and the high availability of the cluster are ensured.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (14)

1. A method for switching between main and standby name byte points is characterized by comprising the following steps:
receiving a switching notification sent by a distributed coordination server, and acquiring a switching version number stored by the distributed coordination server; wherein, the switching notice is used for informing the standby name node to be switched into the main name node;
generating an updating version number according to the switching version number, and sending a deleting instruction to the main name node; wherein the deletion instruction carries the update version number;
determining a switching version number stored by the primary name node, and executing the deleting instruction on the primary name node when the switching version number and the updating version number accord with a preset switching rule so as to switch the standby name node into the primary name node;
prior to determining the switch version number stored by the primary name node:
if the main name node does not receive the switching notification sent by the distributed coordination server, the switching version number stored by the main name node is the same as the switching version number stored by the distributed coordination server;
if the primary name node receives the switching notification sent by the distributed coordination server, the switching version number stored by the primary name node is as follows: and the sum of the switching version number stored by the distributed coordination server and a preset value.
2. The method of claim 1, further comprising:
and when the switching version number and the updating version number do not accord with the preset switching rule, the deleting instruction is not executed to the main name node.
3. The method of claim 1, wherein the obtained switching version number stored by the distributed coordination server is stored in a permanent node of the distributed coordination server.
4. The method of claim 3, further comprising:
and when the standby name node is switched to be the main name node, storing the updated version number in the permanent node of the distributed coordination server.
5. The method of claim 1, wherein the update version number is: the sum of the acquired switching version number stored by the distributed coordination server and a preset value is obtained, and the preset switching rule comprises the following steps: the update version number is greater than the switch version number stored by the primary name node.
6. The method according to any one of claims 1-5, wherein the method further comprises:
before sending a deleting instruction to a main name node, remotely logging in the main name node; and
the distributed coordination server is a Zookeeper server.
7. A device for switching between main and standby name byte points is characterized by comprising:
the version number acquisition module is used for receiving a switching notification sent by the distributed coordination server and acquiring a switching version number stored by the distributed coordination server; wherein, the switching notice is used for informing the standby name node to be switched into the main name node;
the version number updating module is used for generating an updated version number according to the switching version number and sending a deleting instruction to the main name node; wherein the deletion instruction carries the update version number;
the execution module is used for determining the switching version number stored by the primary name node, and executing the deleting instruction on the primary name node when the switching version number and the updating version number accord with a preset switching rule so as to switch the standby name node into the primary name node;
prior to determining the switch version number stored by the primary name node: if the main name node does not receive the switching notification sent by the distributed coordination server, the switching version number stored by the main name node is the same as the switching version number stored by the distributed coordination server; if the primary name node receives the switching notification sent by the distributed coordination server, the switching version number stored by the primary name node is as follows: and the sum of the switching version number stored by the distributed coordination server and a preset value.
8. The apparatus of claim 7, wherein the execution module is further configured to:
when the switching version number and the updating version number do not accord with the preset switching rule, the deleting instruction is not executed to the main name node;
the acquired switching version number stored by the distributed coordination server is stored in a permanent node of the distributed coordination server; and
the apparatus further comprises:
and the version number returning module is used for storing the updated version number in the permanent node of the distributed coordination server when the standby name node is switched to the main name node.
9. The apparatus according to claim 7 or 8, wherein the update version number is: the sum of the acquired switching version number stored by the distributed coordination server and a preset value is obtained, and the preset switching rule comprises the following steps: the updating version number is larger than the switching version number stored by the main name node, and the distributed coordination server is a Zookeeper server;
the version number update module is further configured to: before sending a delete instruction to a primary name node, remotely logging in at the primary name node.
10. A distributed system, comprising: the Zookeeper client comprises a Zookeeper server, a main name node serving as the Zookeeper client and at least one standby name node; wherein:
the Zookeeper server is used for: storing the switching version number; sending a switching notice; wherein, the switching notice is used for informing the standby name node to be switched into the main name node;
any alternate name node is used to: receiving a switching notification, and acquiring a switching version number stored by a Zookeeper server; generating an update version number according to the switching version number, and sending a deletion instruction carrying the update version number to the primary name node; determining a switching version number stored by the primary name node, and executing the deleting instruction to the primary name node when the switching version number and the updating version number accord with a preset switching rule so as to switch the standby name node into the primary name node;
prior to determining the switch version number stored by the primary name node: if the main name node does not receive the switching notification sent by the Zookeeper server, the switching version number stored by the main name node is the same as the switching version number stored by the Zookeeper server; if the main name node receives the switching notification sent by the Zookeeper server, the switching version number stored by the main name node is as follows: and the switching version number stored by the Zookeeper server is added to the preset value.
11. The system of claim 10, wherein the alternate name node is further configured to:
when the switching version number and the updating version number do not accord with the preset switching rule, the deleting instruction is not executed to the main name node;
the switching version number stored by the Zookeeper server acquired by the standby name node is stored in a permanent node of the Zookeeper server; and
the alternate name node is further for: and when the standby name node is switched to be the main name node, storing the updated version number in the permanent node of the Zookeeper server.
12. The system according to claim 10 or 11, wherein the update version number is: the sum of the switching version number stored by the obtained Zookeeper server and a preset value, wherein the preset switching rule comprises the following steps: the updating version number is larger than the switching version number stored by the main name node;
the alternate name node is further for: before sending a delete instruction to a primary name node, remotely logging in at the primary name node.
13. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN201710963019.XA 2017-10-17 2017-10-17 Method and device for switching main name node and standby name node and distributed system Active CN107818027B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710963019.XA CN107818027B (en) 2017-10-17 2017-10-17 Method and device for switching main name node and standby name node and distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710963019.XA CN107818027B (en) 2017-10-17 2017-10-17 Method and device for switching main name node and standby name node and distributed system

Publications (2)

Publication Number Publication Date
CN107818027A CN107818027A (en) 2018-03-20
CN107818027B true CN107818027B (en) 2021-07-30

Family

ID=61607430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710963019.XA Active CN107818027B (en) 2017-10-17 2017-10-17 Method and device for switching main name node and standby name node and distributed system

Country Status (1)

Country Link
CN (1) CN107818027B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109510867B (en) * 2018-10-31 2021-11-12 恒生电子股份有限公司 Data request processing method and device, storage medium and electronic equipment
CN111865632B (en) * 2019-04-28 2024-08-02 阿里巴巴集团控股有限公司 Switching method of distributed data storage cluster and switching instruction sending method and device
CN111930706B (en) * 2020-07-08 2024-04-09 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Remote call-based distributed network file storage system and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013082787A1 (en) * 2011-12-08 2013-06-13 华为技术有限公司 Method, device, and system for deleting data in distributed storage system
CN104679604A (en) * 2015-02-12 2015-06-03 大唐移动通信设备有限公司 Method and device for switching between master node and standby node
CN105554130A (en) * 2015-12-18 2016-05-04 深圳中兴网信科技有限公司 Distributed storage system-based NameNode switching method and switching device
CN106789246A (en) * 2016-12-22 2017-05-31 广西防城港核电有限公司 The changing method and device of a kind of active/standby server
CN106817239A (en) * 2015-11-30 2017-06-09 华为软件技术有限公司 A kind of method of website switching, relevant apparatus and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9690671B2 (en) * 2013-11-01 2017-06-27 Cloudera, Inc. Manifest-based snapshots in distributed computing environments
JP2015106377A (en) * 2013-12-02 2015-06-08 富士通株式会社 Information processor, information processing method, and information processing program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013082787A1 (en) * 2011-12-08 2013-06-13 华为技术有限公司 Method, device, and system for deleting data in distributed storage system
CN104679604A (en) * 2015-02-12 2015-06-03 大唐移动通信设备有限公司 Method and device for switching between master node and standby node
CN106817239A (en) * 2015-11-30 2017-06-09 华为软件技术有限公司 A kind of method of website switching, relevant apparatus and system
CN105554130A (en) * 2015-12-18 2016-05-04 深圳中兴网信科技有限公司 Distributed storage system-based NameNode switching method and switching device
CN106789246A (en) * 2016-12-22 2017-05-31 广西防城港核电有限公司 The changing method and device of a kind of active/standby server

Also Published As

Publication number Publication date
CN107818027A (en) 2018-03-20

Similar Documents

Publication Publication Date Title
CN110177028B (en) Distributed health examination method and device
CN108023953B (en) High-availability implementation method and device for FTP service
CN109245908B (en) Method and device for switching master cluster and slave cluster
CN107729176B (en) Disaster recovery method and disaster recovery system for configuration file management system
CN111181765A (en) Task processing method and device
CN107818027B (en) Method and device for switching main name node and standby name node and distributed system
CN111338834A (en) Data storage method and device
CN110534136B (en) Recording method and device
CN112084254A (en) Data synchronization method and system
CN116932505A (en) Data query method, data writing method, related device and system
CN113742376A (en) Data synchronization method, first server and data synchronization system
CN110851192B (en) Method and device for responding to degraded switch configuration
CN114756173A (en) Method, system, device and computer readable medium for file merging
CN112883103A (en) Method and device for data transfer between clusters
CN115190125A (en) Monitoring method and device for cache cluster
CN113760487A (en) Service processing method and device
CN111277632B (en) Method and device for managing applications in system cluster
CN113742617A (en) Cache updating method and device
CN113761075B (en) Method, apparatus, device and computer readable medium for switching databases
CN114979187B (en) Data processing method and device
CN113364615B (en) Method, device, equipment and computer readable medium for rolling upgrade
CN108600025B (en) Method and device for automatic disaster recovery of system
CN111314457B (en) Method and device for setting virtual private cloud
CN112749042B (en) Application running method and device
CN113766437B (en) Short message sending method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant