CN106487599B

CN106487599B - Method and system for distributed monitoring of running state of cloud access controller

Info

Publication number: CN106487599B
Application number: CN201611092334.1A
Authority: CN
Inventors: 陈昊
Original assignee: Shanghai Feixun Data Communication Technology Co Ltd
Current assignee: Shenzhen Shuwang Internet Technology Co.,Ltd.
Priority date: 2016-11-30
Filing date: 2016-11-30
Publication date: 2020-02-04
Anticipated expiration: 2036-11-30
Also published as: CN106487599A

Abstract

The invention provides a method and a system for distributed monitoring of an operation state of a cloud access controller, wherein the method comprises the following steps: the method comprises the steps that a cloud access controller sets a plurality of monitoring processes, wherein one monitoring process is a current monitoring process, and the rest monitoring processes are backup monitoring processes, and the current monitoring process monitors server performance data of a plurality of servers and/or service processing information of a plurality of service modules in the cloud access controller; the distributed processing framework creates a permanent node and a temporary node under the permanent node, wherein the permanent node is a root node for monitoring the current monitoring process, and the temporary node is a child node of the root node; according to a preset rule, the distributed processing framework distributes monitoring authority to child nodes under the root node, and the child nodes with the monitoring authority monitor the current monitoring process. The invention can monitor the running state of the cloud access controller in real time through distributed monitoring.

Description

Method and system for distributed monitoring of running state of cloud access controller

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to a method and a system for distributed monitoring of an operation state of a cloud access controller.

Background

A cloud Access Controller (AC) system often presents a functional interface of the system through HyperText Markup Language (HTML), and a user may connect to the AC system through a browser to perform various operations.

Because the cloud access controller system needs to process a large amount of terminal device connection information at the same time, a single server cannot meet the performance requirement of service data processing, and in order to solve the processing scene of big data, service processing needs to be distributed into a plurality of servers. The distributed system of the cloud access controller is composed of a plurality of servers, each server runs a plurality of service modules in the system, and when the processing capacity reaches a threshold value, the cloud access controller system can solve the performance problem by dynamically adding servers or adding system service modules in the servers.

In a distributed environment, a cloud access controller is composed of a plurality of servers and a plurality of service modules. In the running process, various abnormal conditions may be encountered, such as a hardware failure of the server, an abnormal power supply, a CPU of the server, a memory usage exceeding a set threshold, a termination of execution of the processing process due to an abnormal service module in running, and the like. When these operation abnormalities occur, the abnormal conditions of the system need to be sent to operation and maintenance personnel in real time so as to be handled in time. Operation and maintenance personnel also need to know the operation state of each server in the cloud access controller system and the process state of the service module in real time.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

in the prior art, when a cloud access controller processes a large amount of service data by using a distributed method, a reliable mechanism is not provided for monitoring the operation state of each component in a distributed system in real time in consideration of the complexity of system operation and the possibility of occurrence of various faults, and the system does not provide a system operation monitoring scheme under the complex environment.

It should be noted that the above background description is only for the sake of clarity and complete description of the technical solutions of the present invention and for the understanding of those skilled in the art. Such solutions are not considered to be known to the person skilled in the art merely because they have been set forth in the background section of the invention.

Disclosure of Invention

In view of the foregoing problems, an object of embodiments of the present invention is to provide a method and a system for distributed monitoring of an operating state of a cloud access controller, which can place system monitoring in a distributed environment, ensure that a monitoring system finds an operating fault of the system in real time when the monitoring system fails, and ensure correct operation of a monitoring mechanism even if a single or multiple monitoring processes fail.

In order to achieve the above object, an embodiment of the present invention provides a method for distributed monitoring of an operating state of a cloud access controller, where the cloud access controller is a distributed system composed of a plurality of servers and a plurality of service modules, and the method includes: the cloud access controller sets a plurality of monitoring processes, wherein one monitoring process is a current monitoring process, and the rest monitoring processes are backup monitoring processes; the distributed processing framework establishes a root node for monitoring the current monitoring process, establishes child nodes under the root node, distributes monitoring authority to the child nodes under the root node according to a preset rule, and the child nodes with the monitoring authority monitor the current monitoring process; the current monitoring process monitors server performance data of a plurality of servers and/or service processing information of a plurality of service modules in the cloud access controller; if the current monitoring process is abnormal, the distributed processing framework deletes the child node monitoring the current monitoring process, determines a new current monitoring process from the backup monitoring process, and determines a new child node from all child nodes under the root node to monitor the new current monitoring process.

Further, the child nodes adopt child node names and sequence value identifications, wherein the sequence value of a child node in the root node is the sequence value of a previous child node plus 1; according to the preset rule, the distributed processing framework distributes monitoring authority for the child nodes under the root node, and the method comprises the following steps: acquiring the sequence values of all child nodes under the root node, and judging whether the sequence value of the current child node is minimum in the sequence values of all child nodes; if so, the distributed processing framework allocates monitoring authority to the current child node with the minimum sequence value; if not, setting a state monitoring callback in the previous child node by the current child node, and judging whether the sequence value of the previous child node is the minimum in the sequence values of all the child nodes; until determining a child node with the minimum sequence value under the root node; and the distributed processing framework allocates monitoring authority to the child node with the minimum sequence value.

Further, the monitoring of the server performance data of the plurality of servers and/or the service processing information of the plurality of service modules by the current monitoring process includes: creating a service node under the root node and creating sub-nodes under the service node, wherein the sub-nodes comprise a first sub-node and a second sub-node, the first sub-node is used for monitoring server performance data of a server, and the second sub-node is used for monitoring service processing information of a service module; after the current monitoring process is started, all sub-nodes under the service node are added into the cache and the states of all the sub-nodes are monitored, when the states of the sub-nodes are changed, all the sub-nodes under the service node are obtained, and all the obtained sub-nodes are compared with all the sub-nodes in the cache; if the sub-node in the cache does not exist in the acquired sub-node, the current monitoring process judges that the non-existing sub-node is abnormal and sends abnormal information to the cloud access controller; and if the subnode in the cache exists in the acquired subnode, the current monitoring process acquires the server performance data and/or the service processing information monitored by the existing subnode and sends the server performance data and/or the service processing information to the cloud access controller for processing.

Further, the current monitoring process is abnormal, including: the current monitoring process periodically sends server performance data and/or service processing information according to a preset time interval; and if the server performance data and/or the service processing information sent by the current monitoring process are not received in the time interval, the current monitoring process is abnormal.

Further, the step of deleting the child node monitoring the current monitoring process by the distributed processing framework, determining a new current monitoring process from the backup monitoring process, and determining a new child node from all child nodes under the root node to monitor the new current monitoring process includes: the distributed processing framework deletes the child node monitoring the current monitoring process and sends the node deletion information to the child node of the next sequence value; when the child node of the next sequence value receives the node deletion information, the abnormal information of the current monitoring process and the child node monitoring the current monitoring process is sent to the root node, and then the distributed processing framework determines a new current monitoring process from the backup monitoring process and determines a new child node from all child nodes under the root node to monitor the new current monitoring process according to the preset rule.

In order to achieve the above object, an embodiment of the present invention further provides a system for distributed monitoring of an operating state of a cloud access controller, including: the cloud access controller is a distributed system consisting of a plurality of servers and a plurality of service modules and is used for setting a plurality of monitoring processes, wherein one monitoring process is a current monitoring process, and the rest monitoring processes are backup monitoring processes; the distributed processing framework is used for creating a permanent node and creating a temporary node under the permanent node, wherein the permanent node is a root node for monitoring the current monitoring process, and the temporary node is a child node of the root node; distributing monitoring authority to child nodes under a root node according to a preset rule, and monitoring the current monitoring process by the child nodes with the monitoring authority; the current monitoring process monitors server performance data of a plurality of servers and/or service processing information of a plurality of service modules in the cloud access controller, if the current monitoring process is abnormal, child nodes monitoring the current monitoring process are deleted, a new current monitoring process is determined from the backup monitoring process, and new child nodes are determined from all child nodes under the root node to monitor the new current monitoring process.

Therefore, according to the method and the system for distributed monitoring of the operation state of the cloud access controller provided by the embodiment of the invention, the monitoring is placed in a distributed environment by using a distributed monitoring mode, so that the monitoring system can find the operation fault of the system in real time when the fault occurs, and the correct operation of a monitoring mechanism can be ensured even if a single or multiple monitoring processes have the fault.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for distributed monitoring of an operating state of a cloud access controller according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a system for distributed monitoring of an operation state of a cloud access controller according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The embodiment of the invention provides a distributed monitoring method for the running state of a cloud access controller. Referring to fig. 1, the method includes the following steps:

step S1: the cloud access controller is provided with a plurality of monitoring processes, wherein one monitoring process is a current monitoring process, and the rest monitoring processes are backup monitoring processes.

In the embodiment of the invention, the cloud access controller is a distributed system consisting of a plurality of servers and a plurality of service modules.

The cloud access controller is provided with a plurality of monitoring processes, preferably, the number of the monitoring processes is three, and the monitoring processes are respectively deployed in three different servers. At the same time, only one monitoring process is operated as the current monitoring process, and other monitoring processes are used as backups.

Step S2: and establishing a root node for monitoring the current monitoring process in the distributed processing framework, and establishing a child node under the root node.

In this embodiment, the distributed processing framework is based on ZooKeeper, which is a distributed application program coordination service of an open source code, and can provide a consistency service for distributed applications, and is mainly used to solve the problem of data management in distributed applications, for example, the provided functions include: unified naming services, state synchronization services, cluster management, management of distributed application configuration items, and the like. In addition, the ZooKeeper can also provide data storage based on a directory node tree mode similar to a file system, and is mainly used for maintaining and monitoring state change of stored data. By monitoring these data state changes, data-based cluster management can be achieved.

A permanent node monitor is created in the ZooKeeper, and the permanent node monitor is used as a root node for monitoring the current monitoring process.

After the monitoring process is started, a temporary node/monitors/monitor-process ID-is created under the permanent node monitors, namely, a child node of the root node, and meanwhile, the child node is set as a sequence value self-increment node, and the ZooKeeper automatically adds a sequence value to the last of the name of the child node, for example, the sequence value of the child node in the root node is the sequence value of the previous child node plus 1.

Step S3: according to a preset rule, the distributed processing framework distributes monitoring authority to child nodes under a root node, and the child nodes with the monitoring authority monitor the current monitoring process.

In the embodiment of the present invention, the preset rule for allocating the monitoring right is:

acquiring sequence values of all child nodes under a root node, and judging whether the sequence value of the current child node is minimum in the sequence values of all the child nodes;

if so, the distributed processing framework allocates a monitoring authority to the current child node with the minimum sequence value, and the current child node starts a monitoring current process after acquiring the monitoring authority;

if not, setting a state monitoring callback in the previous child node by the current child node, for example, if the sequence value of the current child node is 2, setting the state monitoring callback on the child node with the sequence value of 1; judging whether the sequence value of the previous child node is the minimum in the sequence values of all the child nodes; until determining the child node with the minimum sequence value under the root node as the current child node; and the distributed processing framework allocates monitoring authority to the smallest child node in the sequence value, and the child node starts a monitoring current monitoring process after acquiring the monitoring authority.

Step S4: the current monitoring process monitors server performance data of a plurality of servers and/or service processing information of a plurality of service modules in the cloud access controller.

In an embodiment of the present invention, a service (Server) node is created under the root node, and a sub-node is created under the service node, where the sub-node includes a first sub-node and a second sub-node.

Specifically, after the server performance acquisition program is started, a temporary node, that is, a first sub-node of the service node, is created under the service node, and the cloud access controller writes server information into the first sub-node, so that the first sub-node monitors server performance data of the server, where the server information may include, but is not limited to: host name, performance indexes such as CPU, memory, disk, etc. After the service module is started, a temporary node, that is, a second sub-node of the service node, is created under the service node, and the cloud access controller writes service module information into the second sub-node, so that the second sub-node monitors service processing information of the service module, where the service module information may include, but is not limited to: host name, module name, traffic handling in time period, etc.

And the current monitoring process acquires server information and service module information for monitoring.

Specifically, after the monitoring process is started, all the sub-nodes under the service node are added into the cache and the states of all the sub-nodes are monitored. And if the state of the sub-node is changed, acquiring all the sub-nodes under the service node. Comparing all sub-nodes under the acquired service node with all sub-nodes in the cache, if the sub-nodes in the cache do not exist in the acquired sub-nodes, the monitoring process considers that the non-existing sub-nodes are abnormal, and the monitoring process sends an abnormal message to the cloud access controller; and if the subnode in the cache exists in the acquired subnode, the monitoring process acquires the server performance data and/or the service processing information in the subnode and sends the server performance data and/or the service processing information to the cloud access controller for processing.

Step S5: if the current monitoring process is abnormal, the distributed processing framework deletes the child node monitoring the current monitoring process, determines a new current monitoring process from the backup monitoring process, and determines a new child node from all child nodes under the root node to monitor the new current monitoring process.

In the embodiment of the present invention, by monitoring the child nodes of the current monitoring process, the current monitoring process performs information interaction with the root node, such as server performance data and/or service processing information. In addition, the distributed processing framework presets a time interval, for example, 5 minutes, according to which the current monitoring process periodically sends server performance data and/or traffic processing information to the root node.

If the root node does not receive the information sent by the current monitoring process within the time interval, namely the monitoring process terminates the information interaction with the root node, the distributed processing framework deletes the child node monitoring the current monitoring process and sends the node deletion information to the child node of the next sequence value.

Because the child node sets a state monitoring callback in the previous child node, when the child node of the next sequence value receives the node deletion information of the previous child node, the current monitoring process is considered to be abnormal, and a new monitoring process is selected from other backup monitoring processes through an algorithm to take over the operation. The specific algorithm for selecting a new monitoring process from the backup monitoring processes to take over the operation is not limited in the embodiment of the present invention.

In addition, the distributed processing framework acquires the sequence values of all the child nodes under the root node again, and determines the child node with the minimum sequence value as a new child node to monitor a new current monitoring process according to the preset rule.

As shown in fig. 2, an embodiment of the present invention provides a system for distributed monitoring of an operation state of a cloud access controller, including:

the cloud access controller is a distributed system consisting of a plurality of servers and a plurality of service modules and is used for setting a plurality of monitoring processes, wherein one monitoring process is a current monitoring process, and the rest monitoring processes are backup monitoring processes;

the distributed processing framework is used for creating a permanent node and creating a temporary node under the permanent node, wherein the permanent node is a root node for monitoring the current monitoring process, and the temporary node is a child node of the root node; distributing monitoring authority to child nodes under a root node according to a preset rule, and monitoring the current monitoring process by the child nodes with the monitoring authority; the current monitoring process monitors server performance data of a plurality of servers and/or service processing information of a plurality of service modules in the cloud access controller, if the current monitoring process is abnormal, child nodes monitoring the current monitoring process are deleted, a new current monitoring process is determined from the backup monitoring process, and new child nodes are determined from all child nodes under the root node to monitor the new current monitoring process.

Wherein,

the distributed processing framework allocates monitoring authority for child nodes under the root node, and specifically comprises the following steps:

acquiring the sequence values of all child nodes under the root node, and judging whether the sequence value of the current child node is minimum in the sequence values of all child nodes; if so, allocating monitoring authority to the current child node with the minimum sequence value; if not, setting a state monitoring callback in the previous child node by the current child node, and judging whether the sequence value of the previous child node is the minimum in the sequence values of all the child nodes; and until the child node with the minimum sequence value is determined to distribute the monitoring authority to the child node with the minimum sequence value under the root node.

The distributed processing framework is further specifically configured to:

creating a service node under the root node and creating sub-nodes under the service node, wherein the sub-nodes comprise a first sub-node and a second sub-node, the first sub-node is used for monitoring server performance data of a server, and the second sub-node is used for monitoring service processing information of a service module; after the current monitoring process is started, all sub-nodes under the service node are added into the cache and the states of all the sub-nodes are monitored, when the states of the sub-nodes are changed, all the sub-nodes under the service node are obtained, and all the obtained sub-nodes are compared with all the sub-nodes in the cache; if the sub-node in the cache does not exist in the acquired sub-node, the current monitoring process judges that the non-existing sub-node is abnormal and sends abnormal information to the cloud access controller; and if the subnode in the cache exists in the acquired subnode, the current monitoring process acquires the server performance data and/or the service processing information monitored by the existing subnode and sends the server performance data and/or the service processing information to the cloud access controller for processing.

The current monitoring process is abnormal, and specifically comprises the following steps:

the current monitoring process periodically sends server performance data and/or service processing information according to a preset time interval; and if the server performance data and/or the service processing information sent by the current monitoring process are not received in the time interval, the current monitoring process is abnormal.

The distributed processing framework is further specifically configured to:

if the current monitoring process is abnormal, the distributed processing framework deletes the child node monitoring the current monitoring process and sends node deletion information to the child node of the next sequence value; when the child node of the next sequence value receives the node deletion information, the abnormal information of the current monitoring process and the child node monitoring the current monitoring process is sent to the root node, and then the distributed processing framework determines a new current monitoring process from the backup monitoring process and determines a new child node from all child nodes under the root node to monitor the new current monitoring process according to the preset rule.

The specific technical details of the system for the distributed monitoring of the operating state of the cloud access controller are similar to those of the method for the distributed monitoring of the operating state of the cloud access controller, and therefore detailed description is omitted.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments.

Finally, it should be noted that: the foregoing description of various embodiments of the invention is provided to those skilled in the art for the purpose of illustration. It is not intended to be exhaustive or to limit the invention to a single disclosed embodiment. Various alternatives and modifications of the invention, as described above, will be apparent to those skilled in the art. Thus, while some alternative embodiments have been discussed in detail, other embodiments will be apparent or relatively easy to derive by those of ordinary skill in the art. The present invention is intended to embrace all such alternatives, modifications, and variances which have been discussed herein, and other embodiments which fall within the spirit and scope of the above application.

Claims

1. A method for distributed monitoring of an operation state of a cloud access controller is characterized in that the cloud access controller is a distributed system composed of a plurality of servers and a plurality of service modules, and the method comprises the following steps:

the cloud access controller sets a plurality of monitoring processes, wherein one monitoring process is a current monitoring process, and the rest monitoring processes are backup monitoring processes;

the distributed processing framework establishes a root node for monitoring the current monitoring process, establishes child nodes under the root node, distributes monitoring authority to the child nodes under the root node according to a preset rule, and the child nodes with the monitoring authority monitor the current monitoring process;

the current monitoring process monitors server performance data of a plurality of servers and/or service processing information of a plurality of service modules in the cloud access controller;

if the current monitoring process is abnormal, the distributed processing framework deletes the child node monitoring the current monitoring process, determines a new current monitoring process from the backup monitoring process, and determines a new child node from all child nodes under the root node to monitor the new current monitoring process.

2. The method according to claim 1, wherein the child nodes are identified by child node names and sequence values, wherein the sequence value of a child node in the root node is the sequence value of its previous child node plus 1;

according to the preset rule, the distributed processing framework distributes monitoring authority for the child nodes under the root node, and the method comprises the following steps:

acquiring the sequence values of all child nodes under the root node, and judging whether the sequence value of the current child node is minimum in the sequence values of all child nodes;

if so, the distributed processing framework allocates monitoring authority to the current child node with the minimum sequence value;

if not, setting a state monitoring callback in the previous child node by the current child node, and judging whether the sequence value of the previous child node is the minimum in the sequence values of all the child nodes; until determining a child node with the minimum sequence value under the root node; and the distributed processing framework allocates monitoring authority to the child node with the minimum sequence value.

3. The method according to claim 1, wherein the monitoring a current monitoring process monitors server performance data of a plurality of servers and/or service processing information of a plurality of service modules in the cloud access controller, and comprises:

creating a service node under the root node and creating sub-nodes under the service node, wherein the sub-nodes comprise a first sub-node and a second sub-node, the first sub-node is used for monitoring server performance data of the plurality of servers, and the second sub-node is used for monitoring service processing information of the plurality of service modules;

after the current monitoring process is started, all sub-nodes under the service node are added into the cache and the states of all the sub-nodes are monitored, when the states of the sub-nodes are changed, all the sub-nodes under the service node are obtained, and all the obtained sub-nodes are compared with all the sub-nodes in the cache;

if the sub-node in the cache does not exist in the acquired sub-node, the current monitoring process judges that the non-existing sub-node is abnormal and sends abnormal information to the cloud access controller;

and if the subnode in the cache exists in the acquired subnode, the current monitoring process acquires the server performance data and/or the service processing information monitored by the existing subnode and sends the server performance data and/or the service processing information to the cloud access controller for processing.

4. The method for distributed monitoring of the operating state of the cloud access controller according to claim 3, wherein the occurrence of an exception in the current monitoring process includes:

the current monitoring process periodically sends server performance data and/or service processing information according to a preset time interval;

and if the server performance data and/or the service processing information sent by the current monitoring process are not received in the time interval, the current monitoring process is abnormal.

5. The method of distributed monitoring of operating states of a cloud access controller according to claim 4, wherein the distributed processing framework deletes the child node monitoring the current monitoring process, determines a new current monitoring process from the backup monitoring processes, and determines a new child node from all child nodes under the root node to monitor the new current monitoring process, comprising:

the distributed processing framework deletes the child node monitoring the current monitoring process and sends the node deletion information to the child node of the next sequence value;

when the child node of the next sequence value receives the node deletion information, the abnormal information of the current monitoring process and the child node monitoring the current monitoring process is sent to the root node, and then the distributed processing framework determines a new current monitoring process from the backup monitoring process and determines a new child node from all child nodes under the root node to monitor the new current monitoring process according to the preset rule.

6. A system for distributed monitoring of operation states of a cloud access controller is characterized by comprising:

7. The system according to claim 6, wherein the child nodes are identified by child node names and sequence values, wherein the sequence value of a child node in the root node is the sequence value of its previous child node plus 1;

if so, allocating monitoring authority to the current child node with the minimum sequence value;

if not, setting a state monitoring callback in the previous child node by the current child node, and judging whether the sequence value of the previous child node is the minimum in the sequence values of all the child nodes; and until the child node with the minimum sequence value is determined to distribute the monitoring authority to the child node with the minimum sequence value under the root node.

8. The system for distributed monitoring of the operating state of a cloud access controller according to claim 7, wherein the distributed processing framework is further specifically configured to:

creating a service node under the root node and creating sub-nodes under the service node, wherein the sub-nodes comprise a first sub-node and a second sub-node, the first sub-node is used for monitoring server performance data of a server, and the second sub-node is used for monitoring service processing information of a service module;

9. The system for distributed monitoring of the operating state of the cloud access controller according to claim 8, wherein the occurrence of an abnormality in the current monitoring process specifically includes:

10. The system for distributed monitoring of the operating state of a cloud access controller according to claim 9, wherein the distributed processing framework is further specifically configured to: