CN107678918A - The OSD heartbeat mechanisms method to set up and device of a kind of distributed file system - Google Patents
The OSD heartbeat mechanisms method to set up and device of a kind of distributed file system Download PDFInfo
- Publication number
- CN107678918A CN107678918A CN201710881603.0A CN201710881603A CN107678918A CN 107678918 A CN107678918 A CN 107678918A CN 201710881603 A CN201710881603 A CN 201710881603A CN 107678918 A CN107678918 A CN 107678918A
- Authority
- CN
- China
- Prior art keywords
- heartbeat
- osd
- osds
- message
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3089—Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
- G06F11/3096—Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents wherein the means or processing minimize the use of computing system or of computing system component resources, e.g. non-intrusive monitoring which minimizes the probe effect: sniffing, intercepting, indirectly deriving the monitored data from other directly available data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3089—Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
- G06F11/3093—Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/1734—Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/10—Active monitoring, e.g. heartbeat, ping or trace-route
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/875—Monitoring of systems including the internet
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Cardiology (AREA)
- General Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mathematical Physics (AREA)
- Multi Processors (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种分布式文件系统的OSD心跳机制设置方法、装置及计算机可读存储介质,该方法包括:第一节点按预设时间间隔根据自身所有OSD的状态信息,组装心跳消息,并将心跳消息发送到对应的接收节点上;每个接收节点根据接收的心跳消息,调用自身所有OSD的心跳处理函数,更新自身所有OSD各自保存的PG组信息中心跳消息对应的OSD的心跳相关信息;本发明通过第一节点按预设时间间隔根据自身所有OSD的状态信息,组装心跳消息,并将心跳消息发送到对应的接收节点上,可以在单进程模式下,利用节点与节点之间发送的包含节点上所有OSD的状态信息的心跳消息,减少OSD的心跳消息数量,减少系统资源消耗,提升系统稳定性。
The invention discloses an OSD heartbeat mechanism setting method and device of a distributed file system, and a computer-readable storage medium. The method includes: a first node assembles a heartbeat message according to state information of all its own OSDs at a preset time interval, and Send the heartbeat message to the corresponding receiving node; each receiving node calls the heartbeat processing function of all its own OSDs according to the received heartbeat message, and updates the heartbeat related information of the OSD corresponding to the PG group information heartbeat message saved by all its own OSDs ; In the present invention, the first node assembles the heartbeat message according to the state information of all OSDs of itself according to the preset time interval, and sends the heartbeat message to the corresponding receiving node, which can be sent between nodes in single-process mode. The heartbeat message containing the status information of all OSDs on the node can reduce the number of OSD heartbeat messages, reduce system resource consumption, and improve system stability.
Description
技术领域technical field
本发明涉及分布式文件系统领域,特别涉及一种分布式文件系统的OSD心跳机制设置方法、装置及计算机可读存储介质。The invention relates to the field of distributed file systems, in particular to an OSD heartbeat mechanism setting method, device and computer-readable storage medium of a distributed file system.
背景技术Background technique
随着现代社会科技的发展,分布式文件系统的应用越来越受到人们的重视。分布式文件系统运行过程中,需要对故障进行及时响应,所以就需要依靠OSD心跳机制来对系统的健康状况进行检测。With the development of science and technology in modern society, the application of distributed file system has been paid more and more attention by people. During the operation of the distributed file system, it is necessary to respond to failures in a timely manner, so it is necessary to rely on the OSD heartbeat mechanism to detect the health of the system.
现有技术中,OSD(Object-based Storage Device,对象存储设备)心跳机制是一组OSD之间互发消息来进行状态检测,OSD心跳机制的发送和接收可以如图1所示,一个PG(数据对象存储的集合)所在的OSD为一个心跳检测小组,组中各个节点之间互发心跳消息,随着集群规模增大,OSD所属的PG组数量增多,心跳的发送规模将成指数的增加,这样既消耗系统资源,也容易因为系统资源匮乏导致心跳超时,进而引发集群状态异常。因此,如何减少OSD的心跳消息数量,减少系统资源消耗,提升系统稳定性,是现今亟需解决的问题。In the prior art, the OSD (Object-based Storage Device, object storage device) heartbeat mechanism is a group of OSDs that send messages to each other for status detection. The sending and receiving of the OSD heartbeat mechanism can be shown in Figure 1. A PG ( The OSD where the data object storage set) is located is a heartbeat detection group, and each node in the group sends heartbeat messages to each other. As the cluster size increases, the number of PG groups to which the OSD belongs increases, and the heartbeat sending scale will increase exponentially. This not only consumes system resources, but also easily causes heartbeat timeout due to lack of system resources, which in turn leads to abnormal cluster status. Therefore, how to reduce the number of OSD heartbeat messages, reduce system resource consumption, and improve system stability is an urgent problem to be solved today.
发明内容Contents of the invention
本发明的目的是提供一种分布式文件系统的OSD心跳机制设置方法、装置及计算机可读存储介质,以采用节点与节点之间发送心跳消息的方式,将心跳消息合并后进行发送和接收,极大降低心跳消息发送的数量,从而降低系统资源消耗,提高系统稳定性。The object of the present invention is to provide an OSD heartbeat mechanism setting method, device and computer-readable storage medium of a distributed file system, to send and receive heartbeat messages after being combined by adopting the mode of sending heartbeat messages between nodes, Greatly reduce the number of heartbeat messages sent, thereby reducing system resource consumption and improving system stability.
为解决上述技术问题,本发明提供一种分布式文件系统的OSD心跳机制设置方法,包括:In order to solve the above-mentioned technical problems, the present invention provides a method for setting an OSD heartbeat mechanism of a distributed file system, comprising:
第一节点按预设时间间隔根据自身所有OSD的状态信息,组装心跳消息,并将所述心跳消息发送到对应的接收节点上;The first node assembles a heartbeat message according to the state information of all OSDs of itself according to a preset time interval, and sends the heartbeat message to a corresponding receiving node;
每个所述接收节点根据接收的所述心跳消息,调用自身所有OSD的心跳处理函数,更新自身所有OSD各自保存的PG组信息中所述心跳消息对应的OSD的心跳相关信息。Each receiving node calls the heartbeat processing functions of all its own OSDs according to the received heartbeat message, and updates the heartbeat-related information of the OSD corresponding to the heartbeat message in the PG group information stored by all its own OSDs.
可选的,所述将所述心跳消息发送到对应的接收节点上,包括:Optionally, the sending the heartbeat message to the corresponding receiving node includes:
所述第一节点根据自身每个OSD保存的PG组信息中的其他OSD的信息,将所述心跳消息发送到所述其他OSD所在的所述接收节点上。The first node sends the heartbeat message to the receiving node where the other OSDs are located according to the information of other OSDs in the PG group information stored by each OSD of the first node.
可选的,所述调用自身所有OSD的心跳处理函数,更新自身所有OSD各自保存的PG组信息中所述心跳消息对应的OSD的心跳相关信息之后,包括:Optionally, after calling the heartbeat processing functions of all its own OSDs and updating the heartbeat-related information of the OSD corresponding to the heartbeat message in the PG group information stored by all its own OSDs, it includes:
每个OSD的心跳处理函数提取所述心跳消息中的OSD的状态信息;The heartbeat processing function of each OSD extracts the state information of the OSD in the heartbeat message;
遍历自身保存的PG组信息,判断是否存在所述心跳消息中的OSD对应的OSD;Traverse the PG group information saved by itself, and judge whether there is an OSD corresponding to the OSD in the heartbeat message;
若是,则更新自身保存的PG组信息中所述心跳消息中的OSD对应的OSD的心跳相关信息。If so, update the heartbeat related information of the OSD corresponding to the OSD in the heartbeat message in the PG group information saved by itself.
可选的,每个所述接收节点所有OSD的心跳处理函数处理完成后,还包括:Optionally, after the heartbeat processing functions of all OSDs of each receiving node are processed, it also includes:
每个所述接收节点组装心跳回复消息,并将所述心跳回复消息发送到所述第一节点;each of the receiving nodes assembles a heartbeat reply message, and sends the heartbeat reply message to the first node;
所述第一节点根据接收的所述心跳回复消息,调用自身所有OSD的心跳回复处理函数,对所述心跳回复消息对应的自身的OSD进行心跳状态更新。According to the received heartbeat reply message, the first node invokes the heartbeat reply processing functions of all its own OSDs to update the heartbeat state of its own OSDs corresponding to the heartbeat reply message.
可选的,所述调用自身所有OSD的心跳回复处理函数,对所述心跳回复消息对应的自身的OSD进行心跳状态更新,包括:Optionally, calling the heartbeat reply processing functions of all OSDs of the self, and updating the heartbeat state of the OSD corresponding to the heartbeat reply message, including:
每个OSD的心跳回复处理函数提取所述心跳回复消息中所需的回复消息,进行心跳状态更新。The heartbeat reply processing function of each OSD extracts the required reply message from the heartbeat reply message, and updates the heartbeat state.
本发明还提供了一种分布式文件系统的OSD心跳机制设置装置,包括:The present invention also provides an OSD heartbeat mechanism setting device of a distributed file system, comprising:
发送模块,用于按预设时间间隔根据自身所有OSD的状态信息,组装第一心跳消息,并将所述第一心跳消息发送到对应的节点上;A sending module, configured to assemble a first heartbeat message according to the status information of all its OSDs at preset time intervals, and send the first heartbeat message to a corresponding node;
接收模块,用于根据接收的第二心跳消息,调用自身所有OSD的心跳处理函数,更新自身所有OSD各自保存的PG组信息中所述第二心跳消息对应的OSD的心跳相关信息。The receiving module is used to call the heartbeat processing functions of all its own OSDs according to the received second heartbeat message, and update the heartbeat related information of the OSD corresponding to the second heartbeat message in the PG group information saved by all its own OSDs.
可选的,所述发送模块,包括:Optionally, the sending module includes:
发送子模块,用于根据自身每个OSD保存的PG组信息中的其他OSD的信息,将所述第一心跳消息发送到所述其他OSD所在的节点上。The sending sub-module is configured to send the first heartbeat message to the node where the other OSDs are located according to the information of other OSDs in the PG group information stored by each OSD of itself.
可选的,所述接收模块,包括:Optionally, the receiving module includes:
心跳处理子模块,用于利用每个OSD的心跳处理函数提取所述第二心跳消息中的OSD的状态信息;遍历自身保存的PG组信息,判断是否存在所述第二心跳消息中的OSD对应的OSD;若是,则更新自身保存的PG组信息中所述第二心跳消息中的OSD对应的OSD的心跳相关信息。The heartbeat processing submodule is used to extract the state information of the OSD in the second heartbeat message by using the heartbeat processing function of each OSD; traverse the PG group information saved by itself, and judge whether there is an OSD correspondence in the second heartbeat message If so, update the heartbeat related information of the OSD corresponding to the OSD in the second heartbeat message in the PG group information saved by itself.
可选的,该装置还包括:Optionally, the device also includes:
第二发送模块,用于所有OSD的心跳处理函数处理完成后,组装第一心跳回复消息,并将所述第一心跳回复消息发送到所述心跳消息的发送节点;The second sending module is used to assemble the first heartbeat reply message after the heartbeat processing functions of all OSDs are processed, and send the first heartbeat reply message to the sending node of the heartbeat message;
第二接收模块,用于接收的第二心跳回复消息,调用自身所有OSD的心跳回复处理函数,对所述第二心跳回复消息对应的自身的OSD进行心跳状态更新;其中,所述第二心跳回复消息为所述其他OSD所在的节点发送的心跳回复消息。The second receiving module is used to receive the second heartbeat reply message, call the heartbeat reply processing function of all OSDs of itself, and update the heartbeat state of the OSD corresponding to the second heartbeat reply message; wherein, the second heartbeat The reply message is a heartbeat reply message sent by the node where the other OSD is located.
此外,本发明还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上述任一项所述的分布式文件系统的OSD心跳机制设置方法的步骤。In addition, the present invention also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the distributed file system as described in any one of the above is implemented. Steps for setting the OSD heartbeat mechanism.
本发明所提供的一种分布式文件系统的OSD心跳机制设置方法,包括:第一节点按预设时间间隔根据自身所有OSD的状态信息,组装心跳消息,并将心跳消息发送到对应的接收节点上;每个接收节点根据接收的心跳消息,调用自身所有OSD的心跳处理函数,更新自身所有OSD各自保存的PG组信息中心跳消息对应的OSD的心跳相关信息;A method for setting an OSD heartbeat mechanism of a distributed file system provided by the present invention includes: the first node assembles a heartbeat message according to the state information of all its OSDs at preset time intervals, and sends the heartbeat message to the corresponding receiving node Above; each receiving node calls the heartbeat processing functions of all its own OSDs according to the received heartbeat message, and updates the heartbeat related information of the OSDs corresponding to the PG group information heartbeat messages saved by all its own OSDs;
可见,本发明通过第一节点按预设时间间隔根据自身所有OSD的状态信息,组装心跳消息,并将心跳消息发送到对应的接收节点上,可以在单进程模式下,利用节点与节点之间发送心跳消息,心跳消息包含节点上所有OSD的状态信息,极大的减少OSD的心跳消息数量,减少了系统资源消耗,提升了系统稳定性。此外,本发明还提供了一种分布式文件系统的OSD心跳机制设置装置及计算机可读存储介质,同样具有上述有益效果。It can be seen that the present invention uses the first node to assemble the heartbeat message according to the state information of all its own OSDs according to the preset time interval, and sends the heartbeat message to the corresponding receiving node. Send heartbeat messages, which contain the status information of all OSDs on the node, which greatly reduces the number of OSD heartbeat messages, reduces system resource consumption, and improves system stability. In addition, the present invention also provides an OSD heartbeat mechanism setting device for a distributed file system and a computer-readable storage medium, which also have the above beneficial effects.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present invention, and those skilled in the art can also obtain other drawings according to the provided drawings without creative work.
图1为现有技术中的OSD心跳机制的示意图;FIG. 1 is a schematic diagram of an OSD heartbeat mechanism in the prior art;
图2为本发明实施例所提供的一种分布式文件系统的OSD心跳机制设置方法的流程图;Fig. 2 is the flowchart of the OSD heartbeat mechanism setting method of a kind of distributed file system provided by the embodiment of the present invention;
图3为本发明实施例所提供的一种分布式文件系统的OSD心跳机制设置方法的OSD心跳机制的示意图;3 is a schematic diagram of an OSD heartbeat mechanism of an OSD heartbeat mechanism setting method of a distributed file system provided by an embodiment of the present invention;
图4为本发明实施例所提供的一种分布式文件系统的OSD心跳机制设置装置的结构图。FIG. 4 is a structural diagram of an OSD heartbeat mechanism setting device for a distributed file system provided by an embodiment of the present invention.
具体实施方式detailed description
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
请参考图2,图2为本发明实施例所提供的一种分布式文件系统的OSD心跳机制设置方法的流程图。该方法可以包括:Please refer to FIG. 2 , which is a flowchart of a method for setting an OSD heartbeat mechanism of a distributed file system provided by an embodiment of the present invention. The method can include:
步骤101:第一节点按预设时间间隔根据自身所有OSD的状态信息,组装心跳消息,并将心跳消息发送到对应的接收节点上。Step 101: The first node assembles a heartbeat message according to the status information of all its OSDs at a preset time interval, and sends the heartbeat message to a corresponding receiving node.
其中,第一节点可以为分布式文件系统中的任意一个节点。对于执行本步骤的第一节点的选择,可以由设计人员根据实用场景和用户需求自行设置,如可以将分布式文件系统中的每个节点均设置为第一节点。Wherein, the first node may be any node in the distributed file system. The selection of the first node for performing this step can be set by the designer according to practical scenarios and user requirements, for example, each node in the distributed file system can be set as the first node.
具体的,本步骤可以为系统启动时,第一节点的OSDmananger层开启定时器线程,遍历第一节点上所有OSD的状态信息,组装成心跳消息,然后将该心跳消息发送到对应的接收节点上。Specifically, this step can be that when the system is started, the OSDmananger layer of the first node starts a timer thread, traverses the state information of all OSDs on the first node, assembles it into a heartbeat message, and then sends the heartbeat message to the corresponding receiving node .
可以理解的是,本步骤中的接收节点的设置可以由设计人员根据实用场景和用户需求自行设置,如可以为分布式文件系统中包括第一节点的全部节点,也可以为第一节点的自身每个OSD保存的PG组信息中的其他OSD所在的节点。只要接收节点可以包括第一节点的自身每个OSD保存的PG组信息中的其他OSD所在的节点,对于接收节点的具体设置,可以由设计人员自行设置,本实施例对此不做任何限制。It can be understood that the setting of the receiving node in this step can be set by the designer according to practical scenarios and user needs, for example, it can be all nodes in the distributed file system including the first node, or it can be the first node itself The node where other OSDs in the PG group information saved by each OSD are located. As long as the receiving node can include nodes where other OSDs in the PG group information saved by each OSD of the first node are located, the specific setting of the receiving node can be set by the designer, and this embodiment does not impose any restrictions on this.
对应的,本步骤中将心跳消息发送到对应的接收节点上的过程,可以为:第一节点根据自身每个OSD保存的PG组信息中的其他OSD的信息,将心跳消息发送到其他OSD所在的接收节点上。Correspondingly, the process of sending the heartbeat message to the corresponding receiving node in this step may be: the first node sends the heartbeat message to the other OSDs where the other OSDs are located according to the information of other OSDs in the PG group information saved by each OSD of itself. on the receiving node.
需要说明的是,本步骤中的接收节点中可以包括第一节点,如图1所示的OSD心跳机制中的PG组,可以通过图3所示的OSD心跳机制,由第一节点(节点1)将心跳消息发送到接收节点(节点1和节点2)。It should be noted that the receiving node in this step can include the first node, and the PG group in the OSD heartbeat mechanism shown in Figure 1 can be configured by the first node (node 1) through the OSD heartbeat mechanism shown in Figure 3 ) sends a heartbeat message to the receiving nodes (node 1 and node 2).
步骤102:每个接收节点根据接收的心跳消息,调用自身所有OSD的心跳处理函数,更新自身所有OSD各自保存的PG组信息中心跳消息对应的OSD的心跳相关信息。Step 102: Each receiving node invokes the heartbeat processing functions of all its own OSDs according to the received heartbeat message, and updates the heartbeat related information of the OSDs corresponding to the PG group information heartbeat messages saved by all its own OSDs.
其中,每个接收节点调用自身所有OSD的心跳处理函数,更新自身所有OSD各自保存的PG组信息中心跳消息对应的OSD的心跳相关信息的方式,可以为逐一调用自身所有OSD的心跳处理函数;也可以为同时调用自身所有OSD的心跳处理函数,也就是同步进行更新自身所有OSD各自保存的PG组信息中心跳消息对应的OSD的心跳相关信息。本实施例对此不做任何限制。Wherein, each receiving node calls the heartbeat processing functions of all its own OSDs, and updates the heartbeat related information of the OSDs corresponding to the PG group information heartbeat messages saved by all its own OSDs, which can be called one by one the heartbeat processing functions of all its own OSDs; It can also call the heartbeat processing functions of all its own OSDs at the same time, that is, synchronously update the heartbeat related information of OSDs corresponding to the PG group information heartbeat messages saved by all its own OSDs. This embodiment does not impose any limitation on this.
具体的,本步骤中每个接收节点调用自身所有OSD的心跳处理函数,更新自身所有OSD各自保存的PG组信息中心跳消息对应的OSD的心跳相关信息的具体过程,可以为每个OSD的心跳处理函数提取心跳消息中的OSD状态信息,然后遍历自身保存的PG组信息,在这个PG组中,更新心跳相关信息。如每个OSD的心跳处理函数提取心跳消息中的OSD的状态信息;遍历自身保存的PG组信息,判断是否存在心跳消息中的OSD对应的OSD;若是,则更新自身保存的PG组信息中心跳消息中的OSD对应的OSD的心跳相关信息。Specifically, in this step, each receiving node calls the heartbeat processing functions of all its own OSDs, and updates the heartbeat related information of OSDs corresponding to the PG group information heartbeat messages saved by all its own OSDs, which can be the heartbeat of each OSD The processing function extracts the OSD status information in the heartbeat message, and then traverses the PG group information saved by itself, and updates the heartbeat related information in this PG group. For example, the heartbeat processing function of each OSD extracts the OSD status information in the heartbeat message; traverses the PG group information saved by itself, and judges whether there is an OSD corresponding to the OSD in the heartbeat message; if so, updates the PG group information saved by itself. The heartbeat related information of the OSD corresponding to the OSD in the message.
可以理解的是,与现有的OSD心跳机制相对应的,本步骤之后还可以包括每个接收节点在所有OSD的心跳处理函数处理完成后,组装心跳回复消息,并将心跳回复消息发送到第一节点的步骤。It can be understood that, corresponding to the existing OSD heartbeat mechanism, after this step, each receiving node may also assemble a heartbeat reply message after the heartbeat processing functions of all OSDs are completed, and send the heartbeat reply message to the first A node step.
对应的,第一节点可以根据接收的心跳回复消息,调用自身所有OSD的心跳回复处理函数,对心跳回复消息对应的自身的OSD进行心跳状态更新。Correspondingly, the first node may call the heartbeat reply processing functions of all its own OSDs according to the received heartbeat reply message, and update the heartbeat status of its own OSD corresponding to the heartbeat reply message.
需要说明的是,心跳回复消息可以为第一节点发送的心跳消息对应的回复消息,只要第一节点可以根据心跳回复消息,对心跳回复消息对应的自身的OSD进行心跳状态更新,对于心跳回复消息的具体内容,本实施例对此不做任何限制。It should be noted that the heartbeat reply message can be the reply message corresponding to the heartbeat message sent by the first node, as long as the first node can update the heartbeat status of its own OSD corresponding to the heartbeat reply message according to the heartbeat reply message, for the heartbeat reply message The specific content is not limited in this embodiment.
可以理解的是,当接收节点为第一节点的自身每个OSD保存的PG组信息中的其他OSD所在的节点时,每个接收节点均可以向第一节点发送心跳回复消息;当接收节点为全部节点时,第一节点的自身每个OSD保存的PG组信息中的其他OSD所在的节点之外的其他接收节点,可以不向第一节点发送心跳回复消息,减少第一节点的心跳回复处理函数的工作量。本实施例对此不做任何限制。It can be understood that when the receiving node is the node where other OSDs in the PG group information saved by each OSD of the first node are located, each receiving node can send a heartbeat reply message to the first node; when the receiving node is For all nodes, other receiving nodes other than the node where other OSDs are located in the PG group information saved by each OSD of the first node may not send a heartbeat reply message to the first node, reducing the heartbeat reply processing of the first node The workload of the function. This embodiment does not impose any limitation on this.
本实施例中,本发明实施例通过第一节点按预设时间间隔根据自身所有OSD的状态信息,组装心跳消息,并将心跳消息发送到对应的接收节点上,可以在单进程模式下,利用节点与节点之间发送心跳消息,心跳消息包含节点上所有OSD的状态信息,极大的减少OSD的心跳消息数量,减少了系统资源消耗,提升了系统稳定性。In this embodiment, the embodiment of the present invention uses the first node to assemble the heartbeat message according to the state information of all OSDs of itself according to the preset time interval, and sends the heartbeat message to the corresponding receiving node, which can be used in single-process mode. Heartbeat messages are sent between nodes. The heartbeat messages contain the status information of all OSDs on the node, which greatly reduces the number of OSD heartbeat messages, reduces system resource consumption, and improves system stability.
请参考图4,图4为本发明实施例所提供的一种分布式文件系统的OSD心跳机制设置装置的结构图。该装置可以包括:Please refer to FIG. 4 , which is a structural diagram of an OSD heartbeat mechanism setting device for a distributed file system provided by an embodiment of the present invention. The device can include:
发送模块100,用于按预设时间间隔根据自身所有OSD的状态信息,组装第一心跳消息,并将第一心跳消息发送到对应的节点上;The sending module 100 is configured to assemble a first heartbeat message according to the state information of all its own OSDs at a preset time interval, and send the first heartbeat message to a corresponding node;
接收模块200,用于根据接收的第二心跳消息,调用自身所有OSD的心跳处理函数,更新自身所有OSD各自保存的PG组信息中第二心跳消息对应的OSD的心跳相关信息。The receiving module 200 is configured to call the heartbeat processing functions of all its own OSDs according to the received second heartbeat message, and update the heartbeat-related information of the OSD corresponding to the second heartbeat message in the PG group information saved by all its own OSDs.
可选的,发送模块100,可以包括:Optionally, the sending module 100 may include:
发送子模块,用于根据自身每个OSD保存的PG组信息中的其他OSD的信息,将第一心跳消息发送到其他OSD所在的节点上。The sending sub-module is used to send the first heartbeat message to the node where other OSDs are located according to the information of other OSDs in the PG group information stored by each OSD of itself.
可选的,接收模块200,可以包括:Optionally, the receiving module 200 may include:
心跳处理子模块,用于利用每个OSD的心跳处理函数提取第二心跳消息中的OSD的状态信息;遍历自身保存的PG组信息,判断是否存在第二心跳消息中的OSD对应的OSD;若是,则更新自身保存的PG组信息中第二心跳消息中的OSD对应的OSD的心跳相关信息。The heartbeat processing submodule is used to utilize the heartbeat processing function of each OSD to extract the state information of the OSD in the second heartbeat message; traverse the PG group information saved by itself, and judge whether there is an OSD corresponding to the OSD in the second heartbeat message; if , then update the heartbeat related information of the OSD corresponding to the OSD in the second heartbeat message in the PG group information saved by itself.
可选的,该装置还可以包括:Optionally, the device may also include:
第二发送模块,用于所有OSD的心跳处理函数处理完成后,组装第一心跳回复消息,并将第一心跳回复消息发送到心跳消息的发送节点;The second sending module is used to assemble the first heartbeat reply message after the heartbeat processing functions of all OSDs are processed, and send the first heartbeat reply message to the sending node of the heartbeat message;
第二接收模块,用于接收的第二心跳回复消息,调用自身所有OSD的心跳回复处理函数,对第二心跳回复消息对应的自身的OSD进行心跳状态更新;其中,第二心跳回复消息为其他OSD所在的节点发送的心跳回复消息。The second receiving module is used to receive the second heartbeat reply message, call the heartbeat reply processing function of all OSDs of itself, and update the heartbeat state of the OSD corresponding to the second heartbeat reply message; wherein, the second heartbeat reply message is other The heartbeat reply message sent by the node where the OSD is located.
可选的,第二接收模块,可以包括:Optionally, the second receiving module may include:
心跳回复处理子模块,用于利用每个OSD的心跳回复处理函数提取第二心跳回复消息中所需的回复消息,进行心跳状态更新。The heartbeat reply processing sub-module is configured to use the heartbeat reply processing function of each OSD to extract the reply message required in the second heartbeat reply message, and update the heartbeat state.
可以理解的是,本实施例是以分布式文件系统中的一个节点为例进行的展示,其中,若该节点的发送模块100将第一心跳消息发送到该节点时,也就是,该节点自身每个OSD保存的PG组信息中的其他OSD中存在该节点的OSD,则该节点的接收模块200,可以用于根据接收的第一心跳消息,调用自身所有OSD的心跳处理函数,更新自身所有OSD各自保存的PG组信息中第一心跳消息对应的OSD的心跳相关信息。也就是说,本实施例中的第二心跳消息可以包括第一心跳消息。It can be understood that this embodiment is shown by taking a node in the distributed file system as an example, wherein, if the sending module 100 of the node sends the first heartbeat message to the node, that is, the node itself The OSD of the node exists in other OSDs in the PG group information saved by each OSD, then the receiving module 200 of the node can be used to call the heartbeat processing functions of all OSDs of itself according to the received first heartbeat message, and update all the OSDs of the node. The heartbeat-related information of the OSD corresponding to the first heartbeat message in the PG group information stored by the OSDs. That is to say, the second heartbeat message in this embodiment may include the first heartbeat message.
本实施例中,本发明实施例通过发送模块100按预设时间间隔根据自身所有OSD的状态信息,组装第一心跳消息,并将第一心跳消息发送到对应的节点上,可以在单进程模式下,利用节点与节点之间发送心跳消息,心跳消息包含节点上所有OSD的状态信息,极大的减少OSD的心跳消息数量,减少了系统资源消耗,提升了系统稳定性。In this embodiment, the embodiment of the present invention uses the sending module 100 to assemble the first heartbeat message according to the state information of all its own OSDs according to the preset time interval, and sends the first heartbeat message to the corresponding node. Next, heartbeat messages are sent between nodes. The heartbeat messages contain the status information of all OSDs on the node, which greatly reduces the number of OSD heartbeat messages, reduces system resource consumption, and improves system stability.
本发明实施例还提供了一种计算机可读存储介质,其上存有计算机程序,该计算机程序被执行时可以实现上述实施例所提供的分布式文件系统的OSD心跳机制设置方法的步骤。该存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed, the steps of the method for setting the OSD heartbeat mechanism of the distributed file system provided in the above-mentioned embodiments can be realized. The storage medium may include various media capable of storing program codes such as a U disk, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk.
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置及计算机可读存储介质而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。Each embodiment in the description is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the device and the computer-readable storage medium disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the description is relatively simple, and for the relevant details, please refer to the description of the method part.
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two. In order to clearly illustrate the possible For interchangeability, in the above description, the composition and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present invention.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.
以上对本发明所提供的分布式文件系统的OSD心跳机制设置方法、装置及计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本发明进行若干改进和修饰,这些改进和修饰也落入本发明权利要求的保护范围内。The method, device and computer-readable storage medium for setting the OSD heartbeat mechanism of the distributed file system provided by the present invention are described above in detail. In this paper, specific examples are used to illustrate the principle and implementation of the present invention, and the descriptions of the above embodiments are only used to help understand the method and core idea of the present invention. It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, some improvements and modifications can be made to the present invention, and these improvements and modifications also fall within the protection scope of the claims of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710881603.0A CN107678918B (en) | 2017-09-26 | 2017-09-26 | Method and device for setting OSD heartbeat mechanism of distributed file system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710881603.0A CN107678918B (en) | 2017-09-26 | 2017-09-26 | Method and device for setting OSD heartbeat mechanism of distributed file system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107678918A true CN107678918A (en) | 2018-02-09 |
CN107678918B CN107678918B (en) | 2021-06-29 |
Family
ID=61137254
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710881603.0A Active CN107678918B (en) | 2017-09-26 | 2017-09-26 | Method and device for setting OSD heartbeat mechanism of distributed file system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107678918B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108509157A (en) * | 2018-04-13 | 2018-09-07 | 郑州云海信息技术有限公司 | A kind of data balancing method and device applied to distributed file system |
CN109669822A (en) * | 2018-11-28 | 2019-04-23 | 平安科技(深圳)有限公司 | The creation method and computer readable storage medium of electronic device, spare memory pool |
CN109857344A (en) * | 2019-01-30 | 2019-06-07 | 平安科技(深圳)有限公司 | Heart beat status judgment method, device and computer equipment based on shared drive |
CN110457176A (en) * | 2019-07-12 | 2019-11-15 | 平安普惠企业管理有限公司 | For the monitoring method of distributed system, device, storage medium and electronic equipment |
CN111064613A (en) * | 2019-12-13 | 2020-04-24 | 新华三大数据技术有限公司 | Network fault detection method and device |
CN111506263A (en) * | 2020-03-31 | 2020-08-07 | 新华三技术有限公司成都分公司 | Heartbeat connection establishment method and device |
CN113079065A (en) * | 2021-03-26 | 2021-07-06 | 山东英信计算机技术有限公司 | Heartbeat detection method, device, equipment and medium based on Ambari |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050071867A1 (en) * | 2003-09-29 | 2005-03-31 | Lipsky Scott E. | Method and system for distributing images to client systems |
CN105553760A (en) * | 2015-12-11 | 2016-05-04 | 中国科学院信息工程研究所 | Heartbeat-based software module fault processing method and system |
CN106062717A (en) * | 2014-11-06 | 2016-10-26 | 华为技术有限公司 | Distributed storage replication system and method |
CN106936662A (en) * | 2015-12-31 | 2017-07-07 | 杭州华为数字技术有限公司 | A kind of method for realizing heartbeat mechanism, apparatus and system |
CN107181637A (en) * | 2016-03-11 | 2017-09-19 | 华为技术有限公司 | A kind of heartbeat message sending method, device and heartbeat sending node |
-
2017
- 2017-09-26 CN CN201710881603.0A patent/CN107678918B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050071867A1 (en) * | 2003-09-29 | 2005-03-31 | Lipsky Scott E. | Method and system for distributing images to client systems |
CN106062717A (en) * | 2014-11-06 | 2016-10-26 | 华为技术有限公司 | Distributed storage replication system and method |
CN105553760A (en) * | 2015-12-11 | 2016-05-04 | 中国科学院信息工程研究所 | Heartbeat-based software module fault processing method and system |
CN106936662A (en) * | 2015-12-31 | 2017-07-07 | 杭州华为数字技术有限公司 | A kind of method for realizing heartbeat mechanism, apparatus and system |
CN107181637A (en) * | 2016-03-11 | 2017-09-19 | 华为技术有限公司 | A kind of heartbeat message sending method, device and heartbeat sending node |
Non-Patent Citations (1)
Title |
---|
SKDKJZZ: ""ceph存储 ceph集群osd故障自我检测"", 《CSDN博客 公开网址:HTTPS://BLOG.CSDN.NET/SKDKJZZ/ARTICLE/DETAILS/41980885?DEPTH_1-UTM_SOURCE=DISTRIBUTE.PC_RELEVANT.NONE-TASK&UTM_SOURCE=DISTRIBUTE.PC_RELEVANT.NONE-TASK》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108509157A (en) * | 2018-04-13 | 2018-09-07 | 郑州云海信息技术有限公司 | A kind of data balancing method and device applied to distributed file system |
CN109669822A (en) * | 2018-11-28 | 2019-04-23 | 平安科技(深圳)有限公司 | The creation method and computer readable storage medium of electronic device, spare memory pool |
CN109669822B (en) * | 2018-11-28 | 2023-06-06 | 平安科技(深圳)有限公司 | Electronic device, method for creating backup storage pool, and computer-readable storage medium |
CN109857344A (en) * | 2019-01-30 | 2019-06-07 | 平安科技(深圳)有限公司 | Heart beat status judgment method, device and computer equipment based on shared drive |
CN109857344B (en) * | 2019-01-30 | 2022-05-20 | 平安科技(深圳)有限公司 | Heartbeat state judgment method and device based on shared memory and computer equipment |
CN110457176A (en) * | 2019-07-12 | 2019-11-15 | 平安普惠企业管理有限公司 | For the monitoring method of distributed system, device, storage medium and electronic equipment |
CN110457176B (en) * | 2019-07-12 | 2022-09-27 | 平安普惠企业管理有限公司 | Monitoring method and device for distributed system, storage medium and electronic equipment |
CN111064613A (en) * | 2019-12-13 | 2020-04-24 | 新华三大数据技术有限公司 | Network fault detection method and device |
CN111064613B (en) * | 2019-12-13 | 2022-03-22 | 新华三大数据技术有限公司 | Network fault detection method and device |
CN111506263A (en) * | 2020-03-31 | 2020-08-07 | 新华三技术有限公司成都分公司 | Heartbeat connection establishment method and device |
CN111506263B (en) * | 2020-03-31 | 2022-07-12 | 新华三技术有限公司成都分公司 | Heartbeat connection establishment method and device |
CN113079065A (en) * | 2021-03-26 | 2021-07-06 | 山东英信计算机技术有限公司 | Heartbeat detection method, device, equipment and medium based on Ambari |
Also Published As
Publication number | Publication date |
---|---|
CN107678918B (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107678918A (en) | The OSD heartbeat mechanisms method to set up and device of a kind of distributed file system | |
CN108023808B (en) | Method and device for message distribution in application program | |
CN105468450A (en) | Task scheduling method and system | |
CN107666493B (en) | Database configuration method and equipment thereof | |
CN108008950B (en) | Method and device for realizing user interface updating | |
CN104954444B (en) | A kind of method and apparatus that migration is data cached | |
WO2022033586A1 (en) | Message sending method and device | |
CN107203437B (en) | Method, device and system for preventing memory data from being lost | |
CN114900449B (en) | Resource information management method, system and device | |
WO2021184177A1 (en) | Master node selection method and apparatus, electronic device, and storage medium | |
CN105162879A (en) | Method, device and system for realizing data consistency among plurality of machine rooms | |
CN110569135A (en) | interprocess communication method and system based on publish-subscribe mode | |
TWI716822B (en) | Method and device for correcting transaction causality, and electronic equipment | |
CN111064626A (en) | Configuration updating method, device, server and readable storage medium | |
CN111200606A (en) | Deep learning model task processing method, system, server and storage medium | |
CN114531373A (en) | Node state detection method, node state detection device, equipment and medium | |
CN107395443A (en) | A kind of distributed type assemblies management method, apparatus and system | |
CN113867915A (en) | Task scheduling method, electronic device and storage medium | |
CN106331081A (en) | A method and device for information synchronization | |
CN106131241A (en) | A kind of method for connecting network, device and mobile terminal | |
CN111274047A (en) | Information processing method, terminal, system, computer device and storage medium | |
CN108111630B (en) | Zookeeper cluster system and connection method and system thereof | |
CN111953569B (en) | State information reporting method, device, equipment and medium | |
CN112583879B (en) | Request processing method, device and system, storage medium and electronic equipment | |
CN106559439B (en) | A kind of method for processing business and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |