[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2021169064A1 - Edge network-based anomaly processing method and apparatus - Google Patents

Edge network-based anomaly processing method and apparatus Download PDF

Info

Publication number
WO2021169064A1
WO2021169064A1 PCT/CN2020/091867 CN2020091867W WO2021169064A1 WO 2021169064 A1 WO2021169064 A1 WO 2021169064A1 CN 2020091867 W CN2020091867 W CN 2020091867W WO 2021169064 A1 WO2021169064 A1 WO 2021169064A1
Authority
WO
WIPO (PCT)
Prior art keywords
service
abnormal
edge node
node
monitoring event
Prior art date
Application number
PCT/CN2020/091867
Other languages
French (fr)
Chinese (zh)
Inventor
朱少武
Original Assignee
网宿科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 网宿科技股份有限公司 filed Critical 网宿科技股份有限公司
Publication of WO2021169064A1 publication Critical patent/WO2021169064A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers

Definitions

  • the present invention relates to the technical field of network security, in particular to an abnormal processing method and device based on an edge network.
  • each edge node collects its own service data and reports it to the central node, and then the central node analyzes whether each edge node is abnormal based on these service data. If there is an abnormality, the operation and maintenance personnel are notified to go Repair abnormal edge nodes.
  • the problem with this method is that there is a large amount of service data in each edge node, and uploading a large amount of service data to the central node for centralized anomaly analysis usually requires the central node to spend a lot of time and cost, which leads to the central node’s failure. The pressure is great, and it will also reduce the real-time performance of exception handling.
  • the present invention provides an abnormality processing method and device based on an edge network, which is used to solve the technical problems of high pressure on the central node and untimely processing of abnormalities caused by the centralized analysis of the abnormality of each edge node by the central node in the prior art.
  • the present invention provides an abnormality processing method based on an edge network, the edge network including a central node and at least one edge node; the method includes:
  • Any edge node analyzes service data using anomaly analysis rules to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service; further, the edge node determines the first service After the service is abnormal, if there is an exception handling rule for the first service in the edge node, use the exception handling rule to repair the first service; if the first service does not exist in the edge node The exception handling rules of, are reported to the central node.
  • the central node determines the exception handling rule of the first service and sends it to the edge node; accordingly, the edge node Receiving the exception handling rule of the first service sent by the central node; the edge node uses the exception handling rule of the first service to repair the first service.
  • the edge node when the edge node cannot handle the exception, the exception is reported to the central node, and the central node issues exception handling rules, so that the edge node can handle the exception according to the exception handling rule set by the central node, and improve the exception handling. Accuracy and comprehensiveness.
  • any edge node before any edge node analyzes service data using anomaly analysis rules, it also sends a registration request to the central node; the registration request is used to establish between the central node and the edge node Communication connection; In this way, after the edge node establishes a communication connection with the central node, it obtains the self-closed loop strategy corresponding to various services from the central node; the various services include the first service; the self-closing strategy corresponding to any service
  • the closed-loop strategy includes exception analysis rules for the service, or also includes exception handling rules for the service.
  • the self-closed loop strategy can be configured on the central node side instead of separately in each edge node Separate configuration, thereby improving the flexibility and convenience of self-closed-loop strategy configuration; and, by using the service as a unit to configure self-closed-loop strategy, it can make the abnormal identification process more targeted, better reflect the true service capabilities of the service, and improve Accuracy of anomaly recognition and anomaly handling.
  • the self-closed loop strategy corresponding to the various services is obtained by the following method: the central node obtains and analyzes the abnormal monitoring after detecting that the user enters the abnormal monitoring configuration information in the abnormal monitoring configuration interface The configuration information is obtained, and the self-closed loop strategy corresponding to various services is obtained and stored in the local database of the central node.
  • the self-closed loop strategy corresponding to various services can be set by the user on the abnormal monitoring configuration interface of the central node, and the self-closed loop strategy of the service can be decoupled from the business.
  • the service is configured with different self-closed-loop strategies to improve the flexibility of exception handling; moreover, configuring each self-closed-loop strategy through the configuration interface can also simplify operations, reduce manual operation and maintenance costs and events, and improve the efficiency of exception handling.
  • the anomaly analysis rule of any service includes an anomaly analysis rule corresponding to each monitoring event in the service; the any edge node analyzes the service data using the anomaly analysis rule to determine that the edge node Whether the first service is abnormal includes: for any monitoring event in the first service, the edge node parses out the service data of the monitoring event from the service data of the first service, and calls and The abnormal analysis algorithm that matches the type of the service data of the monitoring event analyzes the service data of the monitoring event, and if the analysis result meets the first abnormal condition corresponding to the monitoring event, it is determined that the monitoring event is abnormal, at least according to the The monitoring event determines whether the first service is abnormal; if the analysis result does not meet the first abnormal condition corresponding to the monitoring event, it is determined that the first service is not abnormal.
  • the edge node determines whether the first service is abnormal at least according to the monitoring event, including: if the edge node determines that the abnormal condition corresponding to the monitoring event only includes the first abnormal condition , It is determined that the first service is abnormal; if it is determined that the abnormal condition corresponding to the monitoring event also includes a second abnormal condition, and the second abnormal condition is the impact time, then when the abnormal duration of the monitoring event is less than the When the impact time, it is determined that the first service is not abnormal, and when the abnormal duration of the monitoring event is greater than or equal to the impact time, it is determined that the first service is abnormal.
  • the method further includes: if the edge node determines that the second abnormal condition is that the associated monitoring event is abnormal at the same time, determining whether other monitoring events associated with the monitoring event are abnormal, when When the other monitoring event is also abnormal, it is determined that the first service is abnormal, and when at least one other monitoring event is normal, it is determined that the first service is not abnormal.
  • the present invention provides an abnormality processing device based on an edge network.
  • the edge network includes a central node and at least one edge node; the device includes:
  • An anomaly analysis module configured to analyze service data using anomaly analysis rules to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service;
  • the exception processing module is configured to, after determining that the first service is abnormal, if there is an exception handling rule for the first service in the edge node, use the exception handling rule to repair the first service; If there is no exception handling rule for the first service in the edge node, it is reported to the central node.
  • the central node determines the exception handling rule of the first service and sends it to the edge node; the device further It includes a transceiver module, the transceiver module is configured to: receive the exception handling rule of the first service sent by the central node; accordingly, the exception handling module is also configured to: use the exception handling rule of the first service Repair the first service.
  • the device further includes a transceiver module; before the abnormality analysis module analyzes the service data using abnormal analysis rules, the transceiver module is configured to: send a registration request to the central node; the registration The request is used for the central node to establish a communication connection with the edge node; and, after the communication connection is established with the central node, obtain self-closed loop strategies corresponding to various services from the central node; the various services include the first A service; the self-closed loop strategy corresponding to any service includes the exception analysis rules of the service, or also includes the exception handling rules of the service.
  • the self-closed loop strategy corresponding to the various services is obtained by the following method: the central node obtains and analyzes the abnormal monitoring after detecting that the user enters the abnormal monitoring configuration information in the abnormal monitoring configuration interface The configuration information is obtained, and the self-closed loop strategy corresponding to various services is obtained and stored in the local database of the central node.
  • the anomaly analysis rule of any service includes an anomaly analysis rule corresponding to each monitoring event in the service; the anomaly analysis module is specifically configured to: target any one of the first services The monitoring event, analyzing the service data of the monitoring event from the service data of the first service, and invoking an abnormality analysis algorithm that matches the type of the service data of the monitoring event to analyze the service data of the monitoring event, If the analysis result meets the first abnormal condition corresponding to the monitoring event, determine that the monitoring event is abnormal, and determine whether the first service is abnormal at least according to the monitoring event; if the analysis result does not meet the monitoring event corresponding If the first abnormal condition is found, it is determined that the first service is not abnormal.
  • the abnormality analysis module is specifically configured to: if it is determined that the abnormal condition corresponding to the monitoring event only includes the first abnormal condition, determine that the first service is abnormal; if it is determined that the monitoring event The corresponding abnormal condition also includes a second abnormal condition, and the second abnormal condition is the impact time, when the abnormal duration of the monitoring event is less than the impact time, it is determined that the first service is not abnormal, and when the When the abnormal duration of the monitoring event is greater than or equal to the impact time, it is determined that the first service is abnormal.
  • the abnormality analysis module is further configured to: if it is determined that the second abnormal condition is that the associated monitoring event is abnormal at the same time, determine whether other monitoring events associated with the monitoring event are abnormal, and when all When the other monitoring event is also abnormal, it is determined that the first service is abnormal, and when at least one other monitoring event is normal, it is determined that the first service is not abnormal.
  • a computing device includes at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processor Perform any of the methods described in the first aspect above.
  • the present invention provides a computer-readable storage medium that stores a computer program executable by a computing device.
  • the program runs on the computing device, the computing device executes the first aspect described above. Any of the methods described.
  • FIG. 1 is a schematic diagram of a system architecture of an edge network provided by an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a corresponding process flow of an edge network-based exception handling method provided by an embodiment of the present invention
  • FIG. 3 is a schematic diagram of the overall interaction flow corresponding to an exception handling method provided by an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a monitoring device provided by an embodiment of the present invention.
  • Fig. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a system architecture of an edge network provided by an embodiment of the present invention.
  • the edge network includes a central node 110 and at least one edge node, such as an edge node 121, an edge node 122, and an edge node 123.
  • the central node 110 may be connected to any edge node, for example, it may be connected in a wired manner, or may be connected in a wireless manner, which is not specifically limited.
  • the central node 110 is a remote device, and each edge node is a near-end device, and any edge node can also be connected to a client (not shown in FIG. 1) to provide a near-end service to the client.
  • the edge node 121 can be connected to the client 131 and the client 132 and provide near-end services to the client 131 and the client 132;
  • the edge node 122 can be connected to the client 133 and provide the client 133 provides near-end services;
  • the edge node 123 can be connected to the client 134 and the client 135, and provide near-end services to the client 134 and the client 135.
  • the client can be any terminal device, such as a notebook computer, an IPad, a mobile phone, a router, and other hardware devices with communication interaction functions, which are not limited.
  • the central node 110 can pre-deliver business data to each edge node.
  • the client when the client has a data access request, the client can send a data access request to the central node 110, and the data access request arrives in advance.
  • the edge node adjacent to the client the edge node detects whether the service data corresponding to the data access request is stored locally according to the data access request. If so, the service data can be directly responded to the client; if not, the data access request can be forwarded to the central node 110.
  • the architecture in Figure 1 is only an exemplary description, and does not constitute a limitation to the solution; in specific implementation, multiple layers (ie, two or more layers) can also be deployed in the edge network
  • the client's data access request first reaches the lowest edge node. If the bottom edge node stores the corresponding business data locally, the bottom edge node responds to the corresponding business data to the client. If the bottom edge node's local If the corresponding business data is not stored, the bottom edge node forwards the data access request to the next level edge node, and the next level edge node performs the data response operation until the corresponding business data is responded to the client.
  • edge node in the embodiment of the present invention may be an edge device, an edge device cluster deployed according to a cluster, or a process in an edge device, which is not limited.
  • Fig. 2 is a schematic diagram of the process corresponding to an edge network-based exception handling method provided by an embodiment of the present invention.
  • the method is applicable to any edge node in the edge network, and the method includes:
  • Step 201 The edge node analyzes the service data using an abnormality analysis rule to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service.
  • Step 202 After determining that the first service is abnormal, the edge node determines whether there is an exception handling rule for the first service in the edge node, and if so, uses the exception handling rule to perform the first service Repair, if not, report the first service exception to the central node.
  • the edge node by placing service abnormality identification and abnormality repair on the edge node side for execution instead of uniformly reporting to the central node for execution, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved;
  • the edge node performs self-closed loop processing of its own abnormalities, and can also discover and handle abnormalities in time, which not only improves the efficiency of abnormal identification and processing, but also restores service availability in time.
  • the anomaly analysis rule can be configured in the edge node based on the anomaly monitoring configuration information.
  • the anomaly monitoring configuration information can be pre-configured on the edge node side by the operation and maintenance personnel, or it can be synchronized to the central node after being configured by the business personnel.
  • the edge node may also be obtained by the edge node from a third-party interface device, and the specifics are not limited.
  • the abnormal monitoring configuration information can be configured in the edge node through the following steps:
  • Step a The central node receives the abnormal monitoring configuration information input by the user.
  • the central node can provide users with an abnormal monitoring configuration interface. After detecting that the user inputs abnormal monitoring configuration information in the abnormal monitoring configuration interface, it can obtain and analyze the abnormal monitoring configuration information, and use the service as a unit to configure the abnormal monitoring The abnormal monitoring configuration information belonging to the same service is extracted from the information, so as to obtain the abnormal monitoring configuration information corresponding to various services. Further, the central node can parse the abnormal monitoring configuration information corresponding to any service, obtain the self-closed loop strategy corresponding to the service, and store it in the local database of the central node. Wherein, the self-closed loop strategy corresponding to any service may include the exception analysis rule of the service, and may also include the exception handling rule of the service and/or the acquisition rule of service data, which is not limited.
  • the self-closed-loop strategy refers to a strategy for self-closed-loop processing of abnormal conditions of the service, including various rules related to self-closed-loop processing, such as exception analysis rules, exception handling rules, data acquisition rules, abnormal conditions, and so on.
  • the self-closed-loop strategy is actually obtained by extracting various rules from the abnormal monitoring configuration information of the service, and belongs to the collective name of the various rules for self-closed-loop processing of the same service, not the processing method.
  • the exception analysis rule corresponding to any service may include the exception analysis rule for each monitoring event in the service, and the exception handling rule corresponding to any service may include the exception processing rule for each monitoring event in the service.
  • Table 1 illustrates a schematic table of a self-closed loop strategy corresponding to each service.
  • any service can correspond to one monitoring event or multiple monitoring events, and each monitoring event can be set with corresponding abnormal conditions and abnormal handling rules.
  • the concurrent service corresponds to two monitoring events, namely the concurrent volume event and the concurrent error rate event.
  • the concurrent volume is greater than or equal to 10,000
  • the concurrent volume event is determined to be abnormal, so the concurrent service process can be added to restore the concurrent service in the edge node
  • the concurrent error rate is greater than 45%
  • it is determined that the concurrent error rate event is abnormal so the concurrent service can be restarted to restore the accuracy of the concurrent service in the edge node.
  • a resource service corresponds to a monitoring event, that is, a resource occupancy event. When the resource occupancy is greater than or equal to 95% for more than 5 minutes, it is determined that the resource service is abnormal, so the cache of the resource service can be cleaned to restore the edge node Availability of resource services.
  • the central node can also support the user to create new self-closed-loop strategies, clear existing self-closed-loop strategies, modify existing self-closed-loop strategies, or query existing self-closed-loop strategies and other update operations, and the self-closed loop is detected After the strategy is updated, the central node can also automatically load the updated abnormal self-closed loop strategy to improve the accuracy of abnormal handling. Take the clearing of the existing self-closed loop strategy as an example.
  • the central node can also display the existing self-closed loop strategy to the user, and the user can Directly select the self-closed-loop strategy to be cleared for deletion, or modify the state of the self-closed-loop strategy to be cleared from the effective state to the invalid state to delete the self-closed-loop strategy to be cleared.
  • the self-closed-loop strategy of the service can be decoupled from the business, and users can support different services according to their respective business needs.
  • Configure different self-closing-loop strategies to improve the flexibility of exception handling; moreover, configuring each self-closing-loop strategy through the configuration interface can also simplify operations, reduce manual operation and maintenance costs and events, and improve the efficiency of exception handling.
  • Step b The edge node sends a registration request to the central node when it is started.
  • Step c The central node verifies the registration request of the edge node. If the verification is successful, it establishes a communication connection with the edge node (used to allow the edge node to obtain a self-closed loop strategy corresponding to various services), and sends a successful registration to the edge node In the response message, if the verification fails, it refuses to establish a communication connection with the edge node, and sends a registration failure response message to the edge node.
  • Step d If the edge node receives the response message of successful registration, it can obtain the self-closed loop strategy corresponding to various services from the central node, and store the self-closed loop strategy corresponding to various services in the local database. Correspondingly, if the edge node does not receive the response message, or receives the response message of the registration failure, it can periodically send the registration request to the central node repeatedly, and if the registration is not successful after the set number of repeated transmissions, it will give up Register and generate warning messages.
  • various services can be services deployed on edge nodes, or all services stored in central nodes, without limitation.
  • the edge node can obtain concurrency from the central node
  • the self-closed loop strategy corresponding to the volume service and the self-closed loop strategy corresponding to the port service are stored in the local database of the edge node.
  • the edge node can send an obtain request to the central node, and the obtain request carries the identifier of the concurrent service and the identifier of the port service, so that the central node corresponds to the concurrent service according to the obtain request.
  • the self-closed loop strategy and the self-closed loop strategy corresponding to the port service are returned to the edge node.
  • the central node can upload the self-closed loop strategy corresponding to all services to the set location, and authorize the access rights of the set location to the edge node, so that the edge node can automatically go to the set location to obtain the self-closed loop corresponding to the concurrent service Strategies and self-closed loop strategies corresponding to port services, etc.
  • the edge node can also periodically obtain the self-closed loop strategy corresponding to various services from the central node to ensure that the self-closed loop strategy corresponding to any service is in the configuration side (that is, the central node).
  • the consistency of the node) and the executor (that is, the edge node) improves the accuracy of exception handling.
  • the central node can also monitor the local database in real time. Once it detects that the user has updated the self-closed loop strategy corresponding to a certain service, it can issue an update instruction to the edge node corresponding to the service, so that the edge node can obtain it in real time.
  • the updated self-closed-loop strategy ensures the consistency of the self-closed-loop strategy corresponding to the service in the configuration side and the execution side, and improves the accuracy of abnormal handling of the service.
  • the self-closed loop strategy can be configured on the central node side instead of separately in each edge node Separate configuration, thereby improving the flexibility and convenience of self-closed-loop strategy configuration; and, by using the service as a unit to configure self-closed-loop strategy, it can make the abnormal identification process more targeted, better reflect the true service capabilities of the service, and improve Accuracy of anomaly recognition and anomaly handling.
  • a service process of any service (such as the first service) is set in the edge node, and the edge node provides the first service to the client or other devices through the service process of the first service.
  • the edge node After the edge node stores the self-closed loop strategy corresponding to the first service in the local database, the edge node can also obtain the service data of the first service by invoking the service process of the first service.
  • an obtaining request can be sent to the service process of the first service, and
  • the acquisition request carries the identifier of the monitoring event, so that the service process of the first service returns the service data corresponding to the monitoring event in real time, or the acquisition request can be sent to the service process of the first service according to the set period, so that the first service
  • the service process returns the service data corresponding to the monitoring event according to the set period, etc., which are not limited.
  • the edge node can obtain the service data of the first service in the following way: the self-closed loop strategy also includes the data source interface corresponding to each monitoring event in the first service, and the data source interface is pre-encapsulated in The internal function function of the edge node, the data source interface can record the service data corresponding to the monitoring event during the process of the service process providing the first service. In this way, for any monitoring event in the first service, the edge node can first determine the data source interface corresponding to the monitoring event from the self-closed loop strategy, and then obtain the corresponding monitoring event by calling the data source interface corresponding to the monitoring event Service data.
  • a first service process is set in the edge node, and the first service process is used to provide port services to the Internet Protocol (IP) address 127.0.0.1.
  • IP Internet Protocol
  • the edge node may request call number data corresponding to the event source interface to the first service providing server process port number acquisition requesting access port IP address 127.0.0.1 is set in the period (i.e., data service).
  • the self-closed loop strategy may also include other configuration information required to call the data source interface, such as environment variables and communication protocol conventions, which are not limited.
  • the acquisition operation may be performed by the monitoring process set in the edge node, and socket communication is adopted between the monitoring process and the service process to improve the efficiency and accuracy of communication.
  • the edge node can directly call the data source interface corresponding to the monitoring event to obtain the corresponding service data without manual configuration.
  • the operation is simple, easy to implement, and can also improve the efficiency of service data acquisition.
  • the abnormality analysis rule corresponding to the monitoring event may include one or more abnormal conditions, and each monitoring event may correspond to its own first abnormal condition, and the first abnormal condition is used to indicate whether the monitoring event is abnormal. If the monitoring event only corresponds to the first abnormal condition, the first abnormal condition can not only indicate the abnormality of the monitoring event, but also the abnormality of the service corresponding to the monitoring event; if the monitoring event corresponds to the first abnormal condition and at least one second abnormal condition at the same time Abnormal conditions, the first abnormal condition is used to indicate the abnormality of the monitoring event, and the first abnormal condition and the at least one second abnormal condition together indicate the abnormality of the service corresponding to the monitoring event.
  • the at least one second abnormal condition can be set by those skilled in the art based on experience, or can also be set according to actual needs, which is not specifically limited.
  • the abnormality analysis rule corresponding to the monitoring event only includes the first abnormal condition
  • the service data corresponding to the monitoring event meets the first abnormal condition
  • the exception handling rules corresponding to the monitoring event can be directly invoked to process the edge node, so as to restore the service corresponding to the monitoring event in the central node. If the service data corresponding to the monitoring event does not meet the first abnormal condition, it can be determined that the monitoring event is in a normal state in the edge node, and therefore, no processing is required.
  • the concurrent volume events and concurrent error rate events in the concurrent service only correspond to the first abnormal condition, and the concurrent volume events and concurrent error rate events correspond to their respective exception handling rules. Therefore, when concurrent In the event of an exception in any of the quantitative event and the concurrent error rate event, the concurrent service exception can be determined, so that the exception handling rule corresponding to the abnormal monitoring event can be used to process the concurrent service in the edge node.
  • the abnormality analysis rule corresponding to the monitoring event also includes at least one second abnormal condition
  • the service corresponding to the monitoring event is explained
  • the edge node is in an abnormal state, so that the abnormal handling rules corresponding to the monitoring event can be called to process the edge node, so as to restore the service corresponding to the monitoring event in the center node.
  • the service data corresponding to the monitoring event only meets the first abnormal condition and does not meet at least one second abnormal condition, it means that the monitoring event is abnormal in the edge node, and the service corresponding to the monitoring event is not abnormal in the edge node, so no processing is required. .
  • the second abnormal condition may include the associated monitoring event and/or impact time, and the second abnormal condition may be determined based on the actual failure scenario of the service. Specifically, for any service, you can first obtain the historical service data corresponding to each monitoring event when the service fails, and then combine the historical service data corresponding to each monitoring event to analyze the characteristic factors that caused the service failure, and set according to the characteristic factors The second abnormal condition. For example, if the characteristic factor is that both a certain monitoring event and other monitoring events are abnormal and the service is truly abnormal, then the second abnormal condition corresponding to the monitoring event can be set to be associated with other monitoring events, and the monitoring event can be associated with other monitoring events. Corresponding to the same exception handling rule, if the characteristic factor is that the duration of a certain monitoring event abnormality is greater than the impact time, the service is truly abnormal, and the second abnormal condition corresponding to the monitoring event can be set as the impact time.
  • the second abnormal condition corresponding to the abnormal status code event in the port service is the associated request count event.
  • the first abnormal condition corresponding to the abnormal status code event is used to determine that the abnormal status code event is abnormal, it can be Determine whether the request count event associated with the abnormal status code event is abnormal. If the request count event is also abnormal, it can be determined that the port service is abnormal, so that the port service can be corrected using the exception handling rules corresponding to the abnormal status code event. If it is not abnormal, it can be determined that the port service is not abnormal, so it is not necessary to deal with it.
  • the second abnormal condition corresponding to the resource occupancy event in the resource service is the impact time ( ⁇ 5 minutes).
  • each monitoring event can also correspond to three or more abnormal conditions, for example, it can also correspond to The third abnormal condition, the third abnormal condition is used to indicate the abnormal level of the service. Only when the abnormal level of the service exceeds the abnormal level indicated by the third abnormal condition, the abnormal handling rule corresponding to the monitoring event is used for repair, or can also be set.
  • the fourth abnormal condition the fourth abnormal condition is used to indicate the combined abnormal situation of the service. Only when the services indicated by the fourth abnormal condition are abnormal, the abnormal handling rules corresponding to the monitoring event are used for repair, etc., and the specific is not limited .
  • setting the second abnormal condition for the monitoring event by combining the real failure scenario can reduce the probability of detecting false abnormal services and improve the accuracy of detection; and, by setting the second abnormal condition to affect time and/or The associated monitoring events are abnormal at the same time, and the abnormality of the service can be comprehensively judged based on the abnormal duration characteristics and/or the abnormal quantity characteristics, and the accuracy of abnormal judgment can be improved.
  • the abnormality analysis rule corresponding to the monitoring event can also include the abnormality analysis algorithm corresponding to the monitoring event.
  • the same type of monitoring event can correspond to the same type of abnormality analysis algorithm, because the abnormality analysis rule corresponding to the monitoring event includes anomaly Analyze algorithms and abnormal conditions, so the abnormal analysis rules corresponding to each monitoring event can be unique.
  • the corresponding abnormality analysis algorithm can be called according to the type of the service data to calculate the service data, so as to filter out the abnormal judgment data in the service data, and then judge whether the abnormal judgment data meets the monitoring requirements. If the abnormal condition corresponding to the event is met, it is determined that the monitoring event is abnormal, and if it is not met, it is determined that the monitoring event is not abnormal.
  • the abnormality analysis algorithm may include any one or more of log keyword analysis method, service health value analysis method, threshold value analysis method, and service self-defined analysis method. The following are respectively analyzed:
  • the log keyword analysis method is used for abnormal analysis of the service data of the log data type.
  • the service data of the log data type includes batch processing time, batch processing success amount, etc.
  • the service data can be segmented based on preset log fields to obtain each monitoring log field, and then multiple pattern matching algorithms (such as Aho-Corasick algorithm, wu-manber algorithm, etc.) can be used to match each monitoring log field.
  • the successfully matched monitoring log field is used as the abnormality judgment data and compared with the preset log field in the abnormal condition to determine whether the monitoring event is abnormal.
  • the service health value analysis method is used to perform abnormal analysis on the service data of the operation data type.
  • the service data of the operation data type includes status code, bandwidth, number of requests, resource occupancy rate, etc.
  • Threshold analysis method is used for abnormal analysis of service data of indicator data type.
  • Service data of indicator data type includes the number of requests, the number of alarms, and so on.
  • the monitoring value of the monitoring event under each specific indicator can be extracted from the service data according to the specific indicator of the service, and the monitoring value under the specific indicator is used as the abnormal judgment data, and the threshold value under the specific indicator in the abnormal condition Make a comparison to determine whether the monitoring event is abnormal.
  • the service custom analysis method is used to perform anomaly analysis on service data of unknown data types or users who require custom anomaly analysis algorithms.
  • the edge node can provide the user with a general interface so that the user can upload the custom anomaly analysis algorithm through the general interface.
  • the edge node can also load the anomaly analysis algorithm, and use the loaded anomaly analysis algorithm to calculate the service data corresponding to the monitoring event to obtain the abnormality judgment data.
  • the user can also customize the abnormal conditions at the same time. After the abnormality judgment data is calculated, the edge node can also compare the abnormality judgment data with the user-defined abnormal conditions to determine whether the monitoring event is abnormal.
  • the log keyword analysis method can be called to analyze the abnormality of the service data. If the service data type is determined to be the operational data type, the service health value analysis method can be used to analyze the service data. If the service data type is determined to be the indicator data type, the threshold value analysis method can be used to analyze the service data abnormality. If the type of service data is determined to be other data types or the user has a need for a custom anomaly analysis algorithm, then the service custom analysis method can be called to perform anomaly analysis on the service data.
  • the abnormality analysis method can be decoupled from the actual business, and the flexibility of abnormality analysis can be improved.
  • the corresponding abnormal analysis algorithm is set for each monitoring event, which reduces the difficulty of development and further improves the flexibility of abnormal analysis.
  • the above method also supports user-defined anomaly analysis algorithms, which can not only continuously supplement new anomaly analysis algorithms according to user settings, improve the applicable scenarios of anomaly analysis, but also meet the needs of different users and improve the versatility of anomaly analysis.
  • the edge node can query the local database to determine whether there is an exception handling rule for the first service. If so, it can directly call the exception handling rule of the first service to perform the first service. Repair, if it does not exist, an exception message can be generated and reported to the central node 110.
  • the abnormal message carries related abnormal data of the first service, such as the identifier of the abnormal monitoring event in the first service, the abnormal field, the abnormal time, and the abnormal level in the service data corresponding to the abnormal monitoring event.
  • the central node 110 can first parse the abnormal message to obtain the abnormal field in the service data corresponding to the abnormal monitoring event, and then calculate the abnormal field and each prediction in the operation and maintenance knowledge base. Set the matching degree of the abnormal event, and use the preset abnormal event with the matching degree greater than the preset matching degree as the preset abnormal event corresponding to the monitoring event. If there is a preset abnormal event with a matching degree greater than the preset matching degree, the central node 110 may analyze the matched preset abnormal event to generate a corresponding abnormal handling rule, and send the abnormal handling rule to the edge node. If there is no preset abnormal event with a matching degree greater than the preset matching degree, the central node 110 may push the exception message to the user, and the user sets the corresponding exception handling rule, and sends the set exception handling rule to the edge node.
  • the edge node can not only use the exception handling rule to repair the first service, but also use the abnormal monitoring event in the first service and the exception handling rule of the first service to update the local database.
  • the central node 110 may also display the service status of each edge node to the user, so that the user can check the abnormal status and distribution status of various services in a timely manner.
  • the displayed information can include the abnormal situation of any service in each edge node, the abnormal situation of each monitoring event in any service, the processing result of the abnormal monitoring event, the distribution of abnormal monitoring events, and the correlation of each monitoring event. Any one or more of.
  • the central node 110 may be displayed to the user in the form of a holographic view, or may be displayed to the user in the form of a table, which is not limited.
  • the edge node by placing service abnormality identification and abnormality repair on the side of the edge node for execution, instead of uniformly reporting to the central node, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved; and this solution
  • the edge node performs self-closed loop processing on its own anomalies, and can also discover and handle anomalies in time, which not only improves the efficiency of anomaly identification and processing, but also restores service availability in time.
  • Fig. 3 is a schematic diagram of the overall interaction flow corresponding to an exception handling method provided by an embodiment of the present invention. As shown in Fig. 3, the method includes:
  • Step 301 After detecting that the user inputs abnormal monitoring configuration information in the abnormal monitoring configuration interface, the central node acquires and stores the abnormal monitoring configuration information.
  • the abnormality monitoring configuration information may include the self-closed-loop strategy corresponding to each service.
  • the self-closed-loop strategy corresponding to any service may include the abnormal analysis rules of the service, and may also include the abnormal handling rules of the service and/or the acquisition of service data. rule.
  • Step 302 The edge node sends a registration request to the central node when it is started.
  • step 303 the central node verifies the registration request. If the verification is successful, step 304 is executed, and if the verification fails, step 315 is executed.
  • Step 304 The central node sends a response message of successful registration to the edge node.
  • Step 305 The edge node obtains the self-closed loop strategy corresponding to various services from the central node and stores it in the local database of the edge node; the various services include the first service.
  • Step 306 The edge node invokes the data source interface corresponding to the first service to obtain the service data of the first service from the service process of the first service.
  • the edge node makes the abnormality analysis rule of the first service analyze the service data of the first service to determine whether the first service is abnormal.
  • step 308 the edge node queries the local database to determine whether there is an exception handling rule for the first service, if not, execute step 309, and if yes, execute step 312.
  • Step 309 The edge node sends an abnormal message to the central node, and the abnormal message carries related abnormal data of the first service.
  • Step 310 The central node sets an exception handling rule for the first service based on the parsed related exception data of the first service.
  • Step 311 The central node sends the exception handling rule of the first service to the edge node.
  • Step 312 The edge node uses the exception handling rule of the first service to repair the first service.
  • Step 313 If the central node determines that the exception handling rule of the first service is not stored in the local database, it updates the local database using the exception handling rule of the first service.
  • step 314 the edge node repeatedly sends a registration request to the central node, and after repeatedly sending a set number of times, if the registration is not successful, an alarm message is generated.
  • any edge node analyzes service data using an abnormality analysis rule to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service; further, After the edge node determines that the first service is abnormal, if an exception handling rule of the first service exists in the edge node, use the exception handling rule to repair the first service; if the edge node If there is no exception handling rule for the first service in, it is reported to the central node.
  • the edge node by placing service abnormality identification and abnormality repair on the edge node side for execution instead of uniformly reporting to the central node for execution, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved;
  • the edge node performs self-closed loop processing of its own abnormalities, and can also discover and handle abnormalities in time, which not only improves the efficiency of abnormal identification and processing, but also restores service availability in time.
  • an embodiment of the present invention also provides an edge network-based exception handling device, and the specific content of the device can be implemented with reference to the foregoing method.
  • Fig. 4 is a schematic structural diagram of an abnormality processing device based on an edge network provided by an embodiment of the present invention.
  • the edge network includes a central node and at least one edge node; the device includes:
  • Anomaly analysis 401 configured to analyze service data using anomaly analysis rules to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service;
  • the exception processing module 402 is configured to, after determining that the first service is abnormal, if an exception handling rule of the first service exists in the edge node, use the exception handling rule to repair the first service; if If there is no exception handling rule for the first service in the edge node, it is reported to the central node.
  • the central node determines the exception handling rule of the first service and issues it to the edge node;
  • the device also includes a transceiver module 403, which is configured to: receive the exception handling rule of the first service sent by the central node;
  • the exception handling module 402 is further configured to: use the exception handling rule of the first service to repair the first service.
  • the device further includes a transceiver module 403; before the abnormality analysis module 401 analyzes the service data using anomaly analysis rules, the transceiver module 403 is configured to:
  • the self-closed-loop strategy of the service includes the first service; the self-closed-loop strategy corresponding to any service includes the exception analysis rules of the service, or also includes the exception handling rules of the service.
  • the self-closed loop strategy corresponding to the various services is obtained in the following manner:
  • the central node When the central node detects that the user enters the abnormal monitoring configuration information in the abnormal monitoring configuration interface, it obtains and analyzes the abnormal monitoring configuration information, obtains the self-closed loop strategy corresponding to various services, and stores it in the local database of the central node .
  • the anomaly analysis rule of any service includes an anomaly analysis rule corresponding to each monitoring event in the service;
  • the abnormality analysis module 401 is specifically used for:
  • the service data of the monitoring event is parsed from the service data of the first service, and an abnormality analysis algorithm that matches the type of the service data of the monitoring event is invoked Analyze the service data of the monitoring event, and if the analysis result meets the first abnormal condition corresponding to the monitoring event, determine that the monitoring event is abnormal, and determine whether the first service is abnormal at least according to the monitoring event; if If the analysis result does not satisfy the first abnormal condition corresponding to the monitoring event, it is determined that the first service is not abnormal.
  • the abnormality analysis module 401 is specifically configured to:
  • the abnormal condition corresponding to the monitoring event only includes the first abnormal condition, it is determined that the first service is abnormal; if it is determined that the abnormal condition corresponding to the monitoring event also includes a second abnormal condition, and the second abnormal condition Is the impact time, when the abnormal duration of the monitoring event is less than the impact time, it is determined that the first service is not abnormal, and when the abnormal duration of the monitoring event is greater than or equal to the impact time, the first service is determined to be A service is abnormal.
  • the abnormality analysis module 401 is further configured to:
  • the second abnormal condition is that the associated monitoring event is abnormal at the same time, it is determined whether the other monitoring event associated with the monitoring event is abnormal. When the other monitoring event is also abnormal, it is determined that the first service is abnormal. When at least one other monitoring event is normal, it is determined that the first service is not abnormal.
  • any edge node analyzes service data using an abnormality analysis rule to determine whether the first service in the edge node is abnormal; the service data includes the corresponding information for the first service Service data; further, after the edge node determines that the first service is abnormal, if there is an exception handling rule for the first service in the edge node, the exception handling rule is used to perform the first service Repair; if there is no exception handling rule for the first service in the edge node, report to the central node.
  • the edge node by placing service abnormality identification and abnormality repair on the edge node side for execution instead of uniformly reporting to the central node for execution, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved;
  • the edge node performs self-closed loop processing of its own abnormalities, and can also discover and handle abnormalities in time, which not only improves the efficiency of abnormal identification and processing, but also restores service availability in time.
  • an embodiment of the present invention also provides a computing device, as shown in FIG. 5, including at least one processor 501 and a memory 502 connected to the at least one processor.
  • the embodiment of the present invention does not limit the processor
  • the connection between the processor 501 and the memory 502 in FIG. 5 is taken as an example.
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the memory 502 stores instructions that can be executed by at least one processor 501.
  • the at least one processor 501 can execute the aforementioned edge network-based exception handling method included step.
  • the processor 501 is the control center of the computing device, which can use various interfaces and lines to connect various parts of the computing device, and realize data by running or executing instructions stored in the memory 502 and calling data stored in the memory 502. deal with.
  • the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor.
  • the application processor mainly processes the operating system, user interface, and application programs.
  • the adjustment processor mainly handles issuing instructions. It can be understood that the foregoing modem processor may not be integrated into the processor 501.
  • the processor 501 and the memory 502 may be implemented on the same chip, and in some embodiments, they may also be implemented on separate chips.
  • the processor 501 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (ASIC), a field programmable gate array or other programmable logic devices, discrete gates or transistors Logic devices and discrete hardware components can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present invention.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the exception handling based on the edge network can be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • the memory 502 can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules.
  • the memory 502 may include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), magnetic memory, disk , CD, etc.
  • the memory 502 is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto.
  • the memory 502 in the embodiment of the present invention may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
  • embodiments of the present invention also provide a computer-readable storage medium that stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes Figure 2 or Figure 3 arbitrarily described an edge network-based exception handling method.
  • the embodiments of the present invention can be provided as a method or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

Disclosed in the present invention are an edge network-based anomaly processing method and apparatus, being used for solving the technical problems in the prior art of the high pressure of a central node and the untimely processing of an anomaly caused due to the centralized anomaly processing of the central node. The method comprises: an edge node analyzes service data by using an anomaly analysis rule, and after it is determined that a first service in the edge node is abnormal, if the edge node has the anomaly processing rule of the first service, the first service is repaired by using the anomaly processing rule; if the edge node has no the anomaly processing rule of the first service, report to the central node. By configuring the anomaly identification and anomaly repair of a service at an edge node side instead of reporting to the central node in a unified mode, the present invention can effectively reduce the working pressure of the central node, and reduce network overhead and a time cost; moreover, the edge node performs self-closed-loop processing on the anomaly of the edge node, and thus, the anomaly can be found and processed in a timely fashion, thereby improving the anomaly processing efficiency.

Description

一种基于边缘网络的异常处理方法及装置An abnormal processing method and device based on edge network 技术领域Technical field
本发明涉及网络安全技术领域,尤其涉及一种基于边缘网络的异常处理方法及装置。The present invention relates to the technical field of network security, in particular to an abnormal processing method and device based on an edge network.
背景技术Background technique
现阶段,在向用户提供服务时,通常需要监控服务的状态,一旦监控到服务状态异常,则需要及时修复服务,以提高服务的可用性和服务能力。At this stage, when providing services to users, it is usually necessary to monitor the status of the service. Once an abnormal service status is monitored, the service needs to be repaired in time to improve the availability and service capability of the service.
在一种现有的自闭环策略中,各个边缘节点采集各自的服务数据上报给中心节点,进而由中心节点基于这些服务数据集中分析各个边缘节点是否异常,若存在异常,则通知运维人员去修复异常的边缘节点。然而,该种方式存在的问题是:各个边缘节点中存在海量的服务数据,将海量的服务数据上传给中心节点集中做异常分析,通常需要中心节点耗费大量的时间和成本,从而导致中心节点的压力较大,且还会降低异常处理的实时性。In an existing self-closed loop strategy, each edge node collects its own service data and reports it to the central node, and then the central node analyzes whether each edge node is abnormal based on these service data. If there is an abnormality, the operation and maintenance personnel are notified to go Repair abnormal edge nodes. However, the problem with this method is that there is a large amount of service data in each edge node, and uploading a large amount of service data to the central node for centralized anomaly analysis usually requires the central node to spend a lot of time and cost, which leads to the central node’s failure. The pressure is great, and it will also reduce the real-time performance of exception handling.
综上,目前亟需一种基于边缘网络的异常处理方法,用以解决现有技术由中心节点集中分析各个边缘节点的异常所导致的中心节点的压力大、异常处理不及时的技术问题。In summary, there is an urgent need for an abnormality processing method based on an edge network to solve the technical problems of high pressure on the central node and untimely processing of abnormalities caused by the centralized analysis of the abnormality of each edge node by the central node in the prior art.
发明内容Summary of the invention
本发明提供一种基于边缘网络的异常处理方法及装置,用以解决现有技术由中心节点集中分析各个边缘节点的异常所导致的中心节点的压力大、异常处理不及时的技术问题。The present invention provides an abnormality processing method and device based on an edge network, which is used to solve the technical problems of high pressure on the central node and untimely processing of abnormalities caused by the centralized analysis of the abnormality of each edge node by the central node in the prior art.
第一方面,本发明提供的一种基于边缘网络的异常处理方法,所述边缘网络包括中心节点和至少一个边缘节点;所述方法包括:In the first aspect, the present invention provides an abnormality processing method based on an edge network, the edge network including a central node and at least one edge node; the method includes:
任一边缘节点用异常分析规则分析服务数据,确定所述边缘节点中第一服务是否异常;所述服务数据中包括第一服务对应的服务数据;进一步地,所述边缘节点确定所述第一服务异常后,若所述边缘节点中存在所述第一服务的异常处理规则,则使用所述异常处理规则对所述第一服务进行修复;若所述边缘节点中不存在所述第一服务的异常处理规则,则上报至所述中心节点。Any edge node analyzes service data using anomaly analysis rules to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service; further, the edge node determines the first service After the service is abnormal, if there is an exception handling rule for the first service in the edge node, use the exception handling rule to repair the first service; if the first service does not exist in the edge node The exception handling rules of, are reported to the central node.
本发明中,通过将服务的异常识别和异常修复放置在边缘节点侧执行,而不统一上报给中心节点,可以有效降低中心节点的工作压力,节省网络开销和时间成本;且,该方案由边缘节点对自身的异常进行自闭环处理,还能够及时发现异常、处理异常,不仅提高异常识别和处理的效率,还能及时恢复服务的可用性。In the present invention, by placing service abnormality identification and abnormality repair on the side of the edge node for execution, instead of reporting to the central node uniformly, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved; Nodes perform self-closed loop processing of their own exceptions, and can also discover and handle exceptions in time, which not only improves the efficiency of exception identification and processing, but also restores service availability in time.
在一种可能的实现方式中,所述上报至所述中心节点后,由所述中心节点确定所述第一服务的异常处理规则并下发给所述边缘节点;相应地,所述边缘节点接收所述中心节点发送的所述第一服务的异常处理规则;所述边缘节点使用所述第一服务的异常处理规则对所述第一服务进行修复。In a possible implementation manner, after the reporting to the central node, the central node determines the exception handling rule of the first service and sends it to the edge node; accordingly, the edge node Receiving the exception handling rule of the first service sent by the central node; the edge node uses the exception handling rule of the first service to repair the first service.
在上述实现方式中,当边缘节点无法处理异常时,通过上报异常给中心节点,并由中心节点下发异常处理规则,可以使得边缘节点按照中心节点设定的异常处理规则处理异常,提高异常处理的准确性和全面性。In the above implementation, when the edge node cannot handle the exception, the exception is reported to the central node, and the central node issues exception handling rules, so that the edge node can handle the exception according to the exception handling rule set by the central node, and improve the exception handling. Accuracy and comprehensiveness.
在一种可能的实现方式中,所述任一边缘节点用异常分析规则分析服务数据前,还向所述中心节点发送注册请求;所述注册请求用于所述中心节点与所述边缘节点建立通信连接;如此,所述边缘节点与所述中心节点建立通信连接后,从所述中心节点获取各种服务对应的自闭环策略;所述各种服务包括第一服务;任一服务对应的自闭环策略包括所述服务的异常分析规则,或者还包括所述服务的异常处理规则。In a possible implementation manner, before any edge node analyzes service data using anomaly analysis rules, it also sends a registration request to the central node; the registration request is used to establish between the central node and the edge node Communication connection; In this way, after the edge node establishes a communication connection with the central node, it obtains the self-closed loop strategy corresponding to various services from the central node; the various services include the first service; the self-closing strategy corresponding to any service The closed-loop strategy includes exception analysis rules for the service, or also includes exception handling rules for the service.
在上述实现方式中,通过由中心节点统一管理并由各边缘节点从中心节点中获取各种服务对应的自闭环策略,可以集中在中心节点侧配置自闭环策略,而无需分别在各个边缘节点中单独配置,从而提高自闭环策略配置的灵活性和 便利性;且,通过以服务为单元进行自闭环策略的配置,能够使得异常识别过程更具有针对性,更能体现服务的真实服务能力,提高异常识别和异常处理的准确性。In the above implementation, through the central node unified management and each edge node obtains the self-closed loop strategy corresponding to various services from the central node, the self-closed loop strategy can be configured on the central node side instead of separately in each edge node Separate configuration, thereby improving the flexibility and convenience of self-closed-loop strategy configuration; and, by using the service as a unit to configure self-closed-loop strategy, it can make the abnormal identification process more targeted, better reflect the true service capabilities of the service, and improve Accuracy of anomaly recognition and anomaly handling.
在一种可能的实现方式中,所述各种服务对应的自闭环策略通过如下方式得到:所述中心节点当检测到用户在异常监控配置界面中输入异常监控配置信息后,获取并解析异常监控配置信息,得到各种服务对应的自闭环策略,并存储在所述中心节点的本地数据库中。In a possible implementation manner, the self-closed loop strategy corresponding to the various services is obtained by the following method: the central node obtains and analyzes the abnormal monitoring after detecting that the user enters the abnormal monitoring configuration information in the abnormal monitoring configuration interface The configuration information is obtained, and the self-closed loop strategy corresponding to various services is obtained and stored in the local database of the central node.
在上述实现方式中,通过用户在中心节点的异常监控配置界面上设置各种服务对应的自闭环策略,可以将服务的自闭环策略与业务进行解耦,支持用户根据各自的业务需求对不同的服务配置不同的自闭环策略,提高异常处理的灵活性;且,通过配置界面来配置各个自闭环策略,还能够简化操作,降低人工运维的成本和事件,提高异常处理的效率。In the above implementation, the self-closed loop strategy corresponding to various services can be set by the user on the abnormal monitoring configuration interface of the central node, and the self-closed loop strategy of the service can be decoupled from the business. The service is configured with different self-closed-loop strategies to improve the flexibility of exception handling; moreover, configuring each self-closed-loop strategy through the configuration interface can also simplify operations, reduce manual operation and maintenance costs and events, and improve the efficiency of exception handling.
在一种可能的实现方式中,任一服务的异常分析规则包括所述服务中各个监控事件对应的异常分析规则;所述任一边缘节点用异常分析规则分析服务数据,确定所述边缘节点中第一服务是否异常,包括:所述边缘节点针对于所述第一服务中的任一监控事件,从所述第一服务的服务数据中解析出所述监控事件的服务数据,调用与所述监控事件的服务数据的类型匹配的异常分析算法对所述监控事件的服务数据进行分析,若分析结果满足所述监控事件对应的第一异常条件,则确定所述监控事件异常,至少根据所述监控事件确定所述第一服务是否异常;若所述分析结果不满足所述监控事件对应的第一异常条件,则确定所述第一服务未异常。In a possible implementation, the anomaly analysis rule of any service includes an anomaly analysis rule corresponding to each monitoring event in the service; the any edge node analyzes the service data using the anomaly analysis rule to determine that the edge node Whether the first service is abnormal includes: for any monitoring event in the first service, the edge node parses out the service data of the monitoring event from the service data of the first service, and calls and The abnormal analysis algorithm that matches the type of the service data of the monitoring event analyzes the service data of the monitoring event, and if the analysis result meets the first abnormal condition corresponding to the monitoring event, it is determined that the monitoring event is abnormal, at least according to the The monitoring event determines whether the first service is abnormal; if the analysis result does not meet the first abnormal condition corresponding to the monitoring event, it is determined that the first service is not abnormal.
在上述实现方式中,通过设置同种类型的监控事件通用的异常分析算法,并由异常条件标识不同的监控事件,可以不用再为每个监控事件设置对应的算法,降低开发的难度,提高异常分析的灵活性。In the above implementation, by setting a common anomaly analysis algorithm for the same type of monitoring events, and identifying different monitoring events by the abnormal conditions, it is no longer necessary to set the corresponding algorithm for each monitoring event, reducing the difficulty of development and improving the anomaly Flexibility of analysis.
在一种可能的实现方式中,所述边缘节点至少根据所述监控事件确定所述第一服务是否异常,包括:所述边缘节点若确定所述监控事件对应的异常条件 仅包括第一异常条件,则确定所述第一服务异常;若确定所述监控事件对应的异常条件还包括第二异常条件,且所述第二异常条件为影响时间,则当所述监控事件的异常时长小于所述影响时间时,确定所述第一服务未异常,当所述监控事件的异常时长大于或等于所述影响时间时,确定所述第一服务异常。In a possible implementation manner, the edge node determines whether the first service is abnormal at least according to the monitoring event, including: if the edge node determines that the abnormal condition corresponding to the monitoring event only includes the first abnormal condition , It is determined that the first service is abnormal; if it is determined that the abnormal condition corresponding to the monitoring event also includes a second abnormal condition, and the second abnormal condition is the impact time, then when the abnormal duration of the monitoring event is less than the When the impact time, it is determined that the first service is not abnormal, and when the abnormal duration of the monitoring event is greater than or equal to the impact time, it is determined that the first service is abnormal.
在一种可能的实现方式中,所述方法还包括:所述边缘节点若确定所述第二异常条件为关联的监控事件同时异常,则确定所述监控事件关联的其他监控事件是否异常,当所述其它监控事件也异常时,确定所述第一服务异常,当存在至少一个其它监控事件正常时,确定所述第一服务未异常。In a possible implementation, the method further includes: if the edge node determines that the second abnormal condition is that the associated monitoring event is abnormal at the same time, determining whether other monitoring events associated with the monitoring event are abnormal, when When the other monitoring event is also abnormal, it is determined that the first service is abnormal, and when at least one other monitoring event is normal, it is determined that the first service is not abnormal.
在上述实现方式中,通过设置关联监控事件或影响时间,能够准确判断真正异常的服务,降低误判概率,相应提高异常识别和异常处理的准确性。In the foregoing implementation manner, by setting the associated monitoring event or impact time, it is possible to accurately determine the truly abnormal service, reduce the probability of misjudgment, and correspondingly improve the accuracy of abnormality recognition and abnormality processing.
第二方面,本发明提供的一种基于边缘网络的异常处理装置,所述边缘网络包括中心节点和至少一个边缘节点;所述装置包括:In the second aspect, the present invention provides an abnormality processing device based on an edge network. The edge network includes a central node and at least one edge node; the device includes:
异常分析模块,用于使用异常分析规则分析服务数据,确定边缘节点中第一服务是否异常;所述服务数据中包括第一服务对应的服务数据;An anomaly analysis module, configured to analyze service data using anomaly analysis rules to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service;
异常处理模块,用于确定所述第一服务异常后,若所述边缘节点中存在所述第一服务的异常处理规则,则使用所述异常处理规则对所述第一服务进行修复;若所述边缘节点中不存在所述第一服务的异常处理规则,则上报至所述中心节点。The exception processing module is configured to, after determining that the first service is abnormal, if there is an exception handling rule for the first service in the edge node, use the exception handling rule to repair the first service; If there is no exception handling rule for the first service in the edge node, it is reported to the central node.
在一种可能的实现方式中,所述异常处理模块上报至所述中心节点后,由所述中心节点确定所述第一服务的异常处理规则并下发给所述边缘节点;所述装置还包括收发模块,所述收发模块用于:接收所述中心节点发送的所述第一服务的异常处理规则;相应地,所述异常处理模块还用于:使用所述第一服务的异常处理规则对所述第一服务进行修复。In a possible implementation, after the exception handling module reports to the central node, the central node determines the exception handling rule of the first service and sends it to the edge node; the device further It includes a transceiver module, the transceiver module is configured to: receive the exception handling rule of the first service sent by the central node; accordingly, the exception handling module is also configured to: use the exception handling rule of the first service Repair the first service.
在一种可能的实现方式中,所述装置还包括收发模块;所述异常分析模块用异常分析规则分析服务数据前,所述收发模块用于:向所述中心节点发送注册请求;所述注册请求用于所述中心节点与所述边缘节点建立通信连接;以及, 与所述中心节点建立通信连接后,从所述中心节点获取各种服务对应的自闭环策略;所述各种服务包括第一服务;任一服务对应的自闭环策略包括所述服务的异常分析规则,或者还包括所述服务的异常处理规则。In a possible implementation, the device further includes a transceiver module; before the abnormality analysis module analyzes the service data using abnormal analysis rules, the transceiver module is configured to: send a registration request to the central node; the registration The request is used for the central node to establish a communication connection with the edge node; and, after the communication connection is established with the central node, obtain self-closed loop strategies corresponding to various services from the central node; the various services include the first A service; the self-closed loop strategy corresponding to any service includes the exception analysis rules of the service, or also includes the exception handling rules of the service.
在一种可能的实现方式中,所述各种服务对应的自闭环策略通过如下方式得到:所述中心节点当检测到用户在异常监控配置界面中输入异常监控配置信息后,获取并解析异常监控配置信息,得到各种服务对应的自闭环策略,并存储在所述中心节点的本地数据库中。In a possible implementation manner, the self-closed loop strategy corresponding to the various services is obtained by the following method: the central node obtains and analyzes the abnormal monitoring after detecting that the user enters the abnormal monitoring configuration information in the abnormal monitoring configuration interface The configuration information is obtained, and the self-closed loop strategy corresponding to various services is obtained and stored in the local database of the central node.
在一种可能的实现方式中,任一服务的异常分析规则包括所述服务中各个监控事件对应的异常分析规则;所述异常分析模块具体用于:针对于所述第一服务中的任一监控事件,从所述第一服务的服务数据中解析出所述监控事件的服务数据,调用与所述监控事件的服务数据的类型匹配的异常分析算法对所述监控事件的服务数据进行分析,若分析结果满足所述监控事件对应的第一异常条件,则确定所述监控事件异常,至少根据所述监控事件确定所述第一服务是否异常;若所述分析结果不满足所述监控事件对应的第一异常条件,则确定所述第一服务未异常。In a possible implementation, the anomaly analysis rule of any service includes an anomaly analysis rule corresponding to each monitoring event in the service; the anomaly analysis module is specifically configured to: target any one of the first services The monitoring event, analyzing the service data of the monitoring event from the service data of the first service, and invoking an abnormality analysis algorithm that matches the type of the service data of the monitoring event to analyze the service data of the monitoring event, If the analysis result meets the first abnormal condition corresponding to the monitoring event, determine that the monitoring event is abnormal, and determine whether the first service is abnormal at least according to the monitoring event; if the analysis result does not meet the monitoring event corresponding If the first abnormal condition is found, it is determined that the first service is not abnormal.
在一种可能的实现方式中,所述异常分析模块具体用于:若确定所述监控事件对应的异常条件仅包括第一异常条件,则确定所述第一服务异常;若确定所述监控事件对应的异常条件还包括第二异常条件,且所述第二异常条件为影响时间,则当所述监控事件的异常时长小于所述影响时间时,确定所述第一服务未异常,当所述监控事件的异常时长大于或等于所述影响时间时,确定所述第一服务异常。In a possible implementation manner, the abnormality analysis module is specifically configured to: if it is determined that the abnormal condition corresponding to the monitoring event only includes the first abnormal condition, determine that the first service is abnormal; if it is determined that the monitoring event The corresponding abnormal condition also includes a second abnormal condition, and the second abnormal condition is the impact time, when the abnormal duration of the monitoring event is less than the impact time, it is determined that the first service is not abnormal, and when the When the abnormal duration of the monitoring event is greater than or equal to the impact time, it is determined that the first service is abnormal.
在一种可能的实现方式中,所述异常分析模块还用于:若确定所述第二异常条件为关联的监控事件同时异常,则确定所述监控事件关联的其他监控事件是否异常,当所述其它监控事件也异常时,确定所述第一服务异常,当存在至少一个其它监控事件正常时,确定所述第一服务未异常。In a possible implementation manner, the abnormality analysis module is further configured to: if it is determined that the second abnormal condition is that the associated monitoring event is abnormal at the same time, determine whether other monitoring events associated with the monitoring event are abnormal, and when all When the other monitoring event is also abnormal, it is determined that the first service is abnormal, and when at least one other monitoring event is normal, it is determined that the first service is not abnormal.
第三方面,本发明提供的一种计算设备,包括至少一个处理器以及至少一 个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行上述第一方面任意所述的方法。In a third aspect, a computing device provided by the present invention includes at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processor Perform any of the methods described in the first aspect above.
第四方面,本发明提供的一种计算机可读存储介质,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行上述第一方面任意所述的方法。In a fourth aspect, the present invention provides a computer-readable storage medium that stores a computer program executable by a computing device. When the program runs on the computing device, the computing device executes the first aspect described above. Any of the methods described.
本发明的这些方面或其他方面在以下实施例的描述中会更加简明易懂。These and other aspects of the present invention will be more concise and understandable in the description of the following embodiments.
附图说明Description of the drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in the embodiments of the present invention more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.
图1为本发明实施例提供的一种边缘网络的系统架构示意图;FIG. 1 is a schematic diagram of a system architecture of an edge network provided by an embodiment of the present invention;
图2为本发明实施例提供的一种基于边缘网络的异常处理方法对应的流程示意图;2 is a schematic diagram of a corresponding process flow of an edge network-based exception handling method provided by an embodiment of the present invention;
图3为本发明实施例提供的一种异常处理方法对应的整体交互流程示意图;3 is a schematic diagram of the overall interaction flow corresponding to an exception handling method provided by an embodiment of the present invention;
图4为本发明实施例提供的一种监控装置的结构示意图;4 is a schematic structural diagram of a monitoring device provided by an embodiment of the present invention;
图5为本发明实施例提供的一种计算设备的结构示意图。Fig. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
图1为本发明实施例提供的一种边缘网络的系统架构示意图,如图1所示,边缘网络中包括中心节点110和至少一个边缘节点,比如边缘节点121、边缘 节点122和边缘节点123。其中,中心节点110可以与任一边缘节点连接,比如可以通过有线方式连接,也可以通过无线方式连接,具体不作限定。FIG. 1 is a schematic diagram of a system architecture of an edge network provided by an embodiment of the present invention. As shown in FIG. 1, the edge network includes a central node 110 and at least one edge node, such as an edge node 121, an edge node 122, and an edge node 123. Wherein, the central node 110 may be connected to any edge node, for example, it may be connected in a wired manner, or may be connected in a wireless manner, which is not specifically limited.
本发明实施例中,中心节点110为远端设备,而各个边缘节点为近端设备,任一边缘节点还可以与客户端连接(图1未进行示意),以向客户端提供近端服务。比如,如图1所示,边缘节点121可以与客户端131和客户端132连接,并向客户端131和客户端132提供近端服务;边缘节点122可以与客户端133连接,并向客户端133提供近端服务;边缘节点123可以与客户端134和客户端135连接,并向客户端134和客户端135提供近端服务。其中,客户端可以为任意的终端设备,比如笔记本电脑、IPad、手机、路由器等具有通信交互功能的硬件设备,不作限定。In the embodiment of the present invention, the central node 110 is a remote device, and each edge node is a near-end device, and any edge node can also be connected to a client (not shown in FIG. 1) to provide a near-end service to the client. For example, as shown in Figure 1, the edge node 121 can be connected to the client 131 and the client 132 and provide near-end services to the client 131 and the client 132; the edge node 122 can be connected to the client 133 and provide the client 133 provides near-end services; the edge node 123 can be connected to the client 134 and the client 135, and provide near-end services to the client 134 and the client 135. Among them, the client can be any terminal device, such as a notebook computer, an IPad, a mobile phone, a router, and other hardware devices with communication interaction functions, which are not limited.
具体实施中,中心节点110可以预先将业务数据下发给各个边缘节点,如此,当客户端存在数据访问需求时,客户端可以向中心节点110发送数据访问请求,而该数据访问请求预先到达与客户端邻近的边缘节点。相应地,边缘节点根据数据访问请求检测本地是否存储有数据访问请求对应的业务数据,若是,则可以将业务数据直接响应给客户端,若否,则可以将数据访问请求转发给中心节点110。In specific implementation, the central node 110 can pre-deliver business data to each edge node. In this way, when the client has a data access request, the client can send a data access request to the central node 110, and the data access request arrives in advance. The edge node adjacent to the client. Correspondingly, the edge node detects whether the service data corresponding to the data access request is stored locally according to the data access request. If so, the service data can be directly responded to the client; if not, the data access request can be forwarded to the central node 110.
需要说明的是,图1中的架构仅是一种示例性的说明,并不构成对本方案的限定;在具体实施中,边缘网络中也可以部署有多层(即两层或两层以上)边缘节点,客户端的数据访问请求首先到达最低层边缘节点,若最底层边缘节点的本地存储有对应的业务数据,则最底层边缘节点响应对应的业务数据给客户端,若最底层边缘节点的本地未存储有对应的业务数据,则最底层边缘节点向下一级边缘节点转发数据访问请求,由下一级边缘节点执行数据响应操作,直至对应的业务数据被响应给客户端为止。It should be noted that the architecture in Figure 1 is only an exemplary description, and does not constitute a limitation to the solution; in specific implementation, multiple layers (ie, two or more layers) can also be deployed in the edge network At the edge node, the client's data access request first reaches the lowest edge node. If the bottom edge node stores the corresponding business data locally, the bottom edge node responds to the corresponding business data to the client. If the bottom edge node's local If the corresponding business data is not stored, the bottom edge node forwards the data access request to the next level edge node, and the next level edge node performs the data response operation until the corresponding business data is responded to the client.
需要说明的是,本发明实施例中的边缘节点可以为边缘设备,也可以为按照集群部署的边缘设备集群,还可以为边缘设备中的进程,不作限定。It should be noted that the edge node in the embodiment of the present invention may be an edge device, an edge device cluster deployed according to a cluster, or a process in an edge device, which is not limited.
基于图1所示意的边缘网络,图2为本发明实施例提供的一种基于边缘网 络的异常处理方法对应的流程示意图,该方法适用于边缘网络中的任一边缘节点,该方法包括:Based on the edge network illustrated in Fig. 1, Fig. 2 is a schematic diagram of the process corresponding to an edge network-based exception handling method provided by an embodiment of the present invention. The method is applicable to any edge node in the edge network, and the method includes:
步骤201,边缘节点用异常分析规则分析服务数据,确定所述边缘节点中第一服务是否异常;所述服务数据中包括第一服务对应的服务数据。Step 201: The edge node analyzes the service data using an abnormality analysis rule to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service.
步骤202,所述边缘节点确定所述第一服务异常后,判断所述边缘节点中是否存在所述第一服务的异常处理规则,若是,则使用所述异常处理规则对所述第一服务进行修复,若否,则将第一服务异常上报至所述中心节点。Step 202: After determining that the first service is abnormal, the edge node determines whether there is an exception handling rule for the first service in the edge node, and if so, uses the exception handling rule to perform the first service Repair, if not, report the first service exception to the central node.
本发明实施例中,通过将服务的异常识别和异常修复放置在边缘节点侧执行,而不统一上报给中心节点执行,可以有效降低中心节点的工作压力,节省网络开销和时间成本;且,该方案由边缘节点对自身的异常进行自闭环处理,还能够及时发现异常、处理异常,不仅提高异常识别和处理的效率,还能及时恢复服务的可用性。In the embodiment of the present invention, by placing service abnormality identification and abnormality repair on the edge node side for execution instead of uniformly reporting to the central node for execution, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved; In the solution, the edge node performs self-closed loop processing of its own abnormalities, and can also discover and handle abnormalities in time, which not only improves the efficiency of abnormal identification and processing, but also restores service availability in time.
在步骤201中,异常分析规则可以基于异常监控配置信息配置在边缘节点中,异常监控配置信息可以由运维人员预先在边缘节点侧进行配置,也可以支持业务人员在中心节点侧配置后同步到边缘节点,还可以由边缘节点从第三方接口设备中获取,具体不作限定。In step 201, the anomaly analysis rule can be configured in the edge node based on the anomaly monitoring configuration information. The anomaly monitoring configuration information can be pre-configured on the edge node side by the operation and maintenance personnel, or it can be synchronized to the central node after being configured by the business personnel. The edge node may also be obtained by the edge node from a third-party interface device, and the specifics are not limited.
作为一种可能的实现方式,异常监控配置信息可以通过如下各个步骤配置在边缘节点中:As a possible implementation, the abnormal monitoring configuration information can be configured in the edge node through the following steps:
步骤a,中心节点接收用户输入的异常监控配置信息。Step a: The central node receives the abnormal monitoring configuration information input by the user.
具体实施中,中心节点可以向用户提供异常监控配置界面,当检测到用户在异常监控配置界面中输入异常监控配置信息后,可以获取并解析异常监控配置信息,并以服务为单元从异常监控配置信息中抽取出属于同一服务的异常监控配置信息,从而得到各种服务对应的异常监控配置信息。进一步的,中心节点可以解析任一服务对应的异常监控配置信息,得到该服务对应的自闭环策略,并存储在中心节点的本地数据库中。其中,任一服务对应的自闭环策略可以包括该服务的异常分析规则,还可以包括该服务的异常处理规则和/或服务数据的 获取规则,不作限定。In specific implementation, the central node can provide users with an abnormal monitoring configuration interface. After detecting that the user inputs abnormal monitoring configuration information in the abnormal monitoring configuration interface, it can obtain and analyze the abnormal monitoring configuration information, and use the service as a unit to configure the abnormal monitoring The abnormal monitoring configuration information belonging to the same service is extracted from the information, so as to obtain the abnormal monitoring configuration information corresponding to various services. Further, the central node can parse the abnormal monitoring configuration information corresponding to any service, obtain the self-closed loop strategy corresponding to the service, and store it in the local database of the central node. Wherein, the self-closed loop strategy corresponding to any service may include the exception analysis rule of the service, and may also include the exception handling rule of the service and/or the acquisition rule of service data, which is not limited.
需要说明的是,自闭环策略是指对服务的异常情况进行自闭环处理的策略,包括与自闭环处理相关的各个规则,比如异常分析规则、异常处理规则、数据获取规则、异常条件等等。也就是说,自闭环策略实际上是从服务的异常监控配置信息中抽取各个规则得到的,属于对同一服务进行自闭环处理的各个规则的统称,而不是处理方法。It should be noted that the self-closed-loop strategy refers to a strategy for self-closed-loop processing of abnormal conditions of the service, including various rules related to self-closed-loop processing, such as exception analysis rules, exception handling rules, data acquisition rules, abnormal conditions, and so on. In other words, the self-closed-loop strategy is actually obtained by extracting various rules from the abnormal monitoring configuration information of the service, and belongs to the collective name of the various rules for self-closed-loop processing of the same service, not the processing method.
在一个示例中,任一服务对应的异常分析规则可以包括该服务中各个监控事件的异常分析规则,任一服务对应的异常处理规则可以包括该服务中各个监控事件的异常处理规则。In an example, the exception analysis rule corresponding to any service may include the exception analysis rule for each monitoring event in the service, and the exception handling rule corresponding to any service may include the exception processing rule for each monitoring event in the service.
表1示意了一种各个服务对应的自闭环策略的示意表。Table 1 illustrates a schematic table of a self-closed loop strategy corresponding to each service.
Figure PCTCN2020091867-appb-000001
Figure PCTCN2020091867-appb-000001
表1Table 1
如表1所示,任一服务可以对应一个监控事件,也可以对应多个监控事件,每个监控事件可以设置有对应的异常条件和异常处理规则。比如并发服务对应两个监控事件,即并发量事件和并发错误率事件,当并发量大于或等于10000条时,确定并发量事件异常,因此可以新增并发服务进程,以恢复边缘节点中并发服务的可用性;当并发错误率大于45%时,确定并发错误率事件异常,因此可以重启并发服务,以恢复边缘节点中并发服务的准确性。又比如,资源服务对应一个监控事件,即资源占用量事件,当资源占用量大于或等于95%的时 间超过5分钟时,确定资源服务异常,因此可以清理资源服务的缓存,以恢复边缘节点中资源服务的可用性。As shown in Table 1, any service can correspond to one monitoring event or multiple monitoring events, and each monitoring event can be set with corresponding abnormal conditions and abnormal handling rules. For example, the concurrent service corresponds to two monitoring events, namely the concurrent volume event and the concurrent error rate event. When the concurrent volume is greater than or equal to 10,000, the concurrent volume event is determined to be abnormal, so the concurrent service process can be added to restore the concurrent service in the edge node When the concurrent error rate is greater than 45%, it is determined that the concurrent error rate event is abnormal, so the concurrent service can be restarted to restore the accuracy of the concurrent service in the edge node. For another example, a resource service corresponds to a monitoring event, that is, a resource occupancy event. When the resource occupancy is greater than or equal to 95% for more than 5 minutes, it is determined that the resource service is abnormal, so the cache of the resource service can be cleaned to restore the edge node Availability of resource services.
在一个示例中,中心节点还可以支持用户创建新的自闭环策略、清除已有的自闭环策略、修改已有的自闭环策略或查询已有的自闭环策略等更新操作,且检测到自闭环策略更新后,中心节点还可以自动加载更新后的异常自闭环策略,以提高异常处理的准确性。以清除已有的自闭环策略为例,当检测到用户在异常监控配置界面中触发已有的自闭环策略的修改指示后,中心节点还可以向用户显示已有的各个自闭环策略,用户可以直接选择待清除的自闭环策略进行删除操作,也可以将待清除的自闭环策略的状态由有效状态修改为失效状态,以删除待清除的自闭环策略。In an example, the central node can also support the user to create new self-closed-loop strategies, clear existing self-closed-loop strategies, modify existing self-closed-loop strategies, or query existing self-closed-loop strategies and other update operations, and the self-closed loop is detected After the strategy is updated, the central node can also automatically load the updated abnormal self-closed loop strategy to improve the accuracy of abnormal handling. Take the clearing of the existing self-closed loop strategy as an example. When it is detected that the user triggers the modification instruction of the existing self-closed loop strategy in the abnormal monitoring configuration interface, the central node can also display the existing self-closed loop strategy to the user, and the user can Directly select the self-closed-loop strategy to be cleared for deletion, or modify the state of the self-closed-loop strategy to be cleared from the effective state to the invalid state to delete the self-closed-loop strategy to be cleared.
在上述示例中,通过用户在中心节点的异常监控配置页面上设置各种服务对应的自闭环策略,可以将服务的自闭环策略与业务进行解耦,支持用户根据各自的业务需求对不同的服务配置不同的自闭环策略,提高异常处理的灵活性;且,通过配置界面来配置各个自闭环策略,还能够简化操作,降低人工运维的成本和事件,提高异常处理的效率。In the above example, by setting the self-closed-loop strategy corresponding to various services on the abnormal monitoring configuration page of the central node, the self-closed-loop strategy of the service can be decoupled from the business, and users can support different services according to their respective business needs. Configure different self-closing-loop strategies to improve the flexibility of exception handling; moreover, configuring each self-closing-loop strategy through the configuration interface can also simplify operations, reduce manual operation and maintenance costs and events, and improve the efficiency of exception handling.
步骤b,边缘节点在启动时向中心节点发送注册请求。Step b: The edge node sends a registration request to the central node when it is started.
步骤c,中心节点对边缘节点的注册请求进行验证,若验证成功,则与边缘节点建立通信连接(用于允许边缘节点获取各种服务对应的自闭环策略),并向边缘节点发送注册成功的响应消息,若验证失败,则拒绝与边缘节点建立通信连接,并向边缘节点发送注册失败的响应消息。Step c: The central node verifies the registration request of the edge node. If the verification is successful, it establishes a communication connection with the edge node (used to allow the edge node to obtain a self-closed loop strategy corresponding to various services), and sends a successful registration to the edge node In the response message, if the verification fails, it refuses to establish a communication connection with the edge node, and sends a registration failure response message to the edge node.
步骤d,边缘节点若接收到注册成功的响应消息,则可以从中心节点中获取各种服务对应的自闭环策略,并将各种服务对应的自闭环策略存储在本地数据库中。相应地,边缘节点若未接收到响应消息,或者接收到注册失败的响应消息,则可以周期性地向中心节点重复发送注册请求,若在设定次数的重复发送后还未注册成功,则放弃注册,并生成告警消息。Step d: If the edge node receives the response message of successful registration, it can obtain the self-closed loop strategy corresponding to various services from the central node, and store the self-closed loop strategy corresponding to various services in the local database. Correspondingly, if the edge node does not receive the response message, or receives the response message of the registration failure, it can periodically send the registration request to the central node repeatedly, and if the registration is not successful after the set number of repeated transmissions, it will give up Register and generate warning messages.
其中,各种服务可以为边缘节点上部署的服务,也可以为中心节点中存储 的全部服务,不作限定。Among them, various services can be services deployed on edge nodes, or all services stored in central nodes, without limitation.
以各种服务为边缘节点上部署的服务为例,基于表1,当接收到注册成功的响应消息后,若确定本地部署有并发量服务和端口服务,则边缘节点可以从中心节点中获取并发量服务对应的自闭环策略和端口服务对应的自闭环策略,并存储在边缘节点的本地数据库中。其中,获取方式可以有多种,比如可以由边缘节点向中心节点发送获取请求,并在获取请求中携带并发量服务的标识和端口服务的标识,以使中心节点根据获取请求将并发量服务对应的自闭环策略和端口服务对应的自闭环策略返回给边缘节点。或者,也可以由中心节点将全部服务对应的自闭环策略上传至设定位置,并向边缘节点授权设定位置的访问权限,以使边缘节点自动去设定位置获取并发量服务对应的自闭环策略和端口服务对应的自闭环策略,等等。Taking various services as the services deployed on the edge node as an example, based on Table 1, after receiving a successful registration response message, if it is determined that there are concurrent services and port services deployed locally, the edge node can obtain concurrency from the central node The self-closed loop strategy corresponding to the volume service and the self-closed loop strategy corresponding to the port service are stored in the local database of the edge node. Among them, there can be many ways to obtain, for example, the edge node can send an obtain request to the central node, and the obtain request carries the identifier of the concurrent service and the identifier of the port service, so that the central node corresponds to the concurrent service according to the obtain request. The self-closed loop strategy and the self-closed loop strategy corresponding to the port service are returned to the edge node. Alternatively, the central node can upload the self-closed loop strategy corresponding to all services to the set location, and authorize the access rights of the set location to the edge node, so that the edge node can automatically go to the set location to obtain the self-closed loop corresponding to the concurrent service Strategies and self-closed loop strategies corresponding to port services, etc.
作为一种示例,当成功在中心节点中注册后,边缘节点还可以周期性地从中心节点中获取各种服务对应的自闭环策略,保证任一服务对应的自闭环策略在配置方(即中心节点)和执行方(即边缘节点)的一致性,提高异常处理的准确性。作为另一种示例,中心节点还可以实时监控本地数据库,一旦检测到用户更新了某一服务对应的自闭环策略,则可以向该服务对应的边缘节点下发更新指令,以使边缘节点实时获取更新的自闭环策略,保证服务对应的自闭环策略在配置方和执行方的一致性,提高对服务进行异常处理的准确性。As an example, after successfully registering in the central node, the edge node can also periodically obtain the self-closed loop strategy corresponding to various services from the central node to ensure that the self-closed loop strategy corresponding to any service is in the configuration side (that is, the central node). The consistency of the node) and the executor (that is, the edge node) improves the accuracy of exception handling. As another example, the central node can also monitor the local database in real time. Once it detects that the user has updated the self-closed loop strategy corresponding to a certain service, it can issue an update instruction to the edge node corresponding to the service, so that the edge node can obtain it in real time. The updated self-closed-loop strategy ensures the consistency of the self-closed-loop strategy corresponding to the service in the configuration side and the execution side, and improves the accuracy of abnormal handling of the service.
在上述实现方式中,通过由中心节点统一管理并由各边缘节点从中心节点中获取各种服务对应的自闭环策略,可以集中在中心节点侧配置自闭环策略,而无需分别在各个边缘节点中单独配置,从而提高自闭环策略配置的灵活性和便利性;且,通过以服务为单元进行自闭环策略的配置,能够使得异常识别过程更具有针对性,更能体现服务的真实服务能力,提高异常识别和异常处理的准确性。In the above implementation, through the central node unified management and each edge node obtains the self-closed loop strategy corresponding to various services from the central node, the self-closed loop strategy can be configured on the central node side instead of separately in each edge node Separate configuration, thereby improving the flexibility and convenience of self-closed-loop strategy configuration; and, by using the service as a unit to configure self-closed-loop strategy, it can make the abnormal identification process more targeted, better reflect the true service capabilities of the service, and improve Accuracy of anomaly recognition and anomaly handling.
本发明实施例中,边缘节点中设置有任一服务(比如第一服务)的服务进程,边缘节点通过第一服务的服务进程向客户端或其它设备提供第一服务。当 边缘节点将第一服务对应的自闭环策略存储到本地数据库后,边缘节点还可以通过调用第一服务的服务进程来获取第一服务的服务数据。其中,获取方式可以有多种,比如可以在监听到第一服务的服务进程中执行了与第一服务中的任一监控事件相关的服务后,向第一服务的服务进程发送获取请求,并在获取请求中携带监控事件的标识,以使第一服务的服务进程实时返回监控事件对应的服务数据,或者也可以按照设定周期向第一服务的服务进程发送获取请求,以使第一服务的服务进程按照设定周期返回监控事件对应的服务数据,等等,不作限定。In the embodiment of the present invention, a service process of any service (such as the first service) is set in the edge node, and the edge node provides the first service to the client or other devices through the service process of the first service. After the edge node stores the self-closed loop strategy corresponding to the first service in the local database, the edge node can also obtain the service data of the first service by invoking the service process of the first service. Among them, there can be many ways to obtain, for example, after the service related to any monitoring event in the first service is executed in the service process that listens to the first service, an obtaining request can be sent to the service process of the first service, and The acquisition request carries the identifier of the monitoring event, so that the service process of the first service returns the service data corresponding to the monitoring event in real time, or the acquisition request can be sent to the service process of the first service according to the set period, so that the first service The service process returns the service data corresponding to the monitoring event according to the set period, etc., which are not limited.
在一种可能的实现方式中,边缘节点可以通过如下方式获取第一服务的服务数据:自闭环策略中还包括第一服务中每个监控事件对应的数据源接口,数据源接口为预先封装在边缘节点内部的功能函数,数据源接口能够在服务进程提供第一服务的过程中记录监控事件对应的服务数据。如此,针对于第一服务中的任一监控事件,边缘节点可以先从自闭环策略中确定出该监控事件对应的数据源接口,再通过调用该监控事件对应的数据源接口获取该监控事件对应的服务数据。In a possible implementation, the edge node can obtain the service data of the first service in the following way: the self-closed loop strategy also includes the data source interface corresponding to each monitoring event in the first service, and the data source interface is pre-encapsulated in The internal function function of the edge node, the data source interface can record the service data corresponding to the monitoring event during the process of the service process providing the first service. In this way, for any monitoring event in the first service, the edge node can first determine the data source interface corresponding to the monitoring event from the self-closed loop strategy, and then obtain the corresponding monitoring event by calling the data source interface corresponding to the monitoring event Service data.
举例来说,边缘节点中设置有第一服务进程,第一服务进程用于向国际互联协议(Internet Protocol,IP)地址127.0.0.1提供端口服务,针对于本地数据库中存储的请求数事件,边缘节点可以调用请求数事件对应的数据源接口 以在第一服务进程提供端口服务时获取设定时段内访问IP地址127.0.0.1的端口的请求数量(即服务数据)。 For example, a first service process is set in the edge node, and the first service process is used to provide port services to the Internet Protocol (IP) address 127.0.0.1. For the number of requests stored in the local database, the edge node may request call number data corresponding to the event source interface to the first service providing server process port number acquisition requesting access port IP address 127.0.0.1 is set in the period (i.e., data service).
需要说明的是,自闭环策略中还可以包括调用数据源接口所需的其它配置信息,比如环境变量和通信协议约定,不作限定。It should be noted that the self-closed loop strategy may also include other configuration information required to call the data source interface, such as environment variables and communication protocol conventions, which are not limited.
本发明实施例中,获取操作可以由边缘节点中设置的监控进程执行,监控进程与服务进程之间采用socket通信,以提高通信的效率和准确性。In the embodiment of the present invention, the acquisition operation may be performed by the monitoring process set in the edge node, and socket communication is adopted between the monitoring process and the service process to improve the efficiency and accuracy of communication.
在上述实现方式中,通过在异常处理规则中设置监控事件对应的数据源接口,使得边缘节点直接调用监控事件对应的数据源接口即可获取到对应的服务 数据,而无需再由人为配置,从而操作简单,便于实现,还可以提高服务数据获取的效率。In the above implementation, by setting the data source interface corresponding to the monitoring event in the exception handling rule, the edge node can directly call the data source interface corresponding to the monitoring event to obtain the corresponding service data without manual configuration. The operation is simple, easy to implement, and can also improve the efficiency of service data acquisition.
本发明实施例中,监控事件对应的异常分析规则可以包括一个或多个异常条件,每个监控事件可以对应有各自的第一异常条件,第一异常条件用于指示监控事件是否异常。若监控事件仅对应第一异常条件,则第一异常条件不仅能指示监控事件的异常性,还能指示监控事件对应的服务的异常性;若监控事件同时对应第一异常条件和至少一个第二异常条件,则第一异常条件用于指示监控事件的异常性,而第一异常条件和至少一个第二异常条件共同指示监控事件对应的服务的异常性。其中,至少一个第二异常条件可以由本领域技术人员根据经验进行设置,或者也可以根据实际需要进行设置,具体不作限定。In the embodiment of the present invention, the abnormality analysis rule corresponding to the monitoring event may include one or more abnormal conditions, and each monitoring event may correspond to its own first abnormal condition, and the first abnormal condition is used to indicate whether the monitoring event is abnormal. If the monitoring event only corresponds to the first abnormal condition, the first abnormal condition can not only indicate the abnormality of the monitoring event, but also the abnormality of the service corresponding to the monitoring event; if the monitoring event corresponds to the first abnormal condition and at least one second abnormal condition at the same time Abnormal conditions, the first abnormal condition is used to indicate the abnormality of the monitoring event, and the first abnormal condition and the at least one second abnormal condition together indicate the abnormality of the service corresponding to the monitoring event. Among them, the at least one second abnormal condition can be set by those skilled in the art based on experience, or can also be set according to actual needs, which is not specifically limited.
具体实施中,若监控事件对应的异常分析规则仅包括第一异常条件,则当监控事件对应的服务数据符合第一异常条件时,说明监控事件对应的服务在边缘节点中处于异常状态,如此,可以直接调用监控事件对应的异常处理规则对边缘节点进行处理,以恢复中心节点中监控事件对应的服务。若监控事件对应的服务数据不符合第一异常条件,则可以确定监控事件在边缘节点中处于正常状态,因此可以不作处理。举例来说,如表1所示,并发服务中的并发量事件和并发错误率事件均只对应第一异常条件,并发量事件和并发错误率事件分别对应各自的异常处理规则,因此,当并发量事件和并发错误率事件中的任意一个异常时,均可以确定并发服务异常,从而可以使用异常的监控事件对应的异常处理规则对边缘节点中的并发服务进行处理。In specific implementation, if the abnormality analysis rule corresponding to the monitoring event only includes the first abnormal condition, when the service data corresponding to the monitoring event meets the first abnormal condition, it means that the service corresponding to the monitoring event is in an abnormal state in the edge node, so, The exception handling rules corresponding to the monitoring event can be directly invoked to process the edge node, so as to restore the service corresponding to the monitoring event in the central node. If the service data corresponding to the monitoring event does not meet the first abnormal condition, it can be determined that the monitoring event is in a normal state in the edge node, and therefore, no processing is required. For example, as shown in Table 1, the concurrent volume events and concurrent error rate events in the concurrent service only correspond to the first abnormal condition, and the concurrent volume events and concurrent error rate events correspond to their respective exception handling rules. Therefore, when concurrent In the event of an exception in any of the quantitative event and the concurrent error rate event, the concurrent service exception can be determined, so that the exception handling rule corresponding to the abnormal monitoring event can be used to process the concurrent service in the edge node.
相应地,若监控事件对应的异常分析规则还包括至少一个第二异常条件,则当监控事件对应的服务数据同时符合第一异常条件和至少一个第二异常条件时,才说明监控事件对应的服务在边缘节点中处于异常状态,从而可以调用监控事件对应的异常处理规则对边缘节点进行处理,以恢复中心节点中监控事件对应的服务。当监控事件对应的服务数据只符合第一异常条件而不符合至少一个第二异常条件时,说明监控事件在边缘节点中异常,而监控事件对应的服 务在边缘节点中未异常,因此可以不作处理。Correspondingly, if the abnormality analysis rule corresponding to the monitoring event also includes at least one second abnormal condition, only when the service data corresponding to the monitoring event meets the first abnormal condition and at least one second abnormal condition at the same time, the service corresponding to the monitoring event is explained The edge node is in an abnormal state, so that the abnormal handling rules corresponding to the monitoring event can be called to process the edge node, so as to restore the service corresponding to the monitoring event in the center node. When the service data corresponding to the monitoring event only meets the first abnormal condition and does not meet at least one second abnormal condition, it means that the monitoring event is abnormal in the edge node, and the service corresponding to the monitoring event is not abnormal in the edge node, so no processing is required. .
在一个示例中,第二异常条件可以包括关联监控事件和/或影响时间,第二异常条件可以基于服务的真实故障场景进行确定。具体地说,针对于任一服务,可以先获取该服务在真实故障时各个监控事件对应的历史服务数据,然后联合各个监控事件对应的历史服务数据分析造成服务故障的特征因子,根据特征因子设置第二异常条件。比如,若特征因子为某一监控事件与其它监控事件均异常服务才真正异常,则可以将该监控事件对应的第二异常条件设置为关联其它监控事件,该监控事件和关联的其它监控事件可以对应同一异常处理规则,若特征因子为某一监控事件异常的时长大于影响时间时服务才真正异常,则可以将该监控事件对应的第二异常条件设置为影响时间。In an example, the second abnormal condition may include the associated monitoring event and/or impact time, and the second abnormal condition may be determined based on the actual failure scenario of the service. Specifically, for any service, you can first obtain the historical service data corresponding to each monitoring event when the service fails, and then combine the historical service data corresponding to each monitoring event to analyze the characteristic factors that caused the service failure, and set according to the characteristic factors The second abnormal condition. For example, if the characteristic factor is that both a certain monitoring event and other monitoring events are abnormal and the service is truly abnormal, then the second abnormal condition corresponding to the monitoring event can be set to be associated with other monitoring events, and the monitoring event can be associated with other monitoring events. Corresponding to the same exception handling rule, if the characteristic factor is that the duration of a certain monitoring event abnormality is greater than the impact time, the service is truly abnormal, and the second abnormal condition corresponding to the monitoring event can be set as the impact time.
举例来说,基于表1,端口服务中的异常状态码事件对应的第二异常条件为关联请求数事件,当使用异常状态码事件对应的第一异常条件确定异常状态码事件异常后,还可以确定异常状态码事件所关联的请求数事件是否异常,若请求数事件也异常,则可以确定端口服务异常,从而可以使用异常状态码事件对应的异常处理规则对端口服务进行修正,若请求数事件不异常,则可以确定端口服务未异常,因此可以不作处理。又比如,资源服务中的资源占用量事件对应的第二异常条件为影响时间(≥5分钟),当资源占用量超过95%的时段小于5分钟时,虽然资源占用量事件异常,但是资源服务能够很快恢复正常,资源服务未真正异常,因此可以不作处理;而当资源占用量超过95%的时段大于或等于5分钟时,资源服务无法快速恢复正常,资源服务真正异常,从而可以使用资源占用量事件对应的异常处理规则对资源服务进行修正。For example, based on Table 1, the second abnormal condition corresponding to the abnormal status code event in the port service is the associated request count event. When the first abnormal condition corresponding to the abnormal status code event is used to determine that the abnormal status code event is abnormal, it can be Determine whether the request count event associated with the abnormal status code event is abnormal. If the request count event is also abnormal, it can be determined that the port service is abnormal, so that the port service can be corrected using the exception handling rules corresponding to the abnormal status code event. If it is not abnormal, it can be determined that the port service is not abnormal, so it is not necessary to deal with it. For another example, the second abnormal condition corresponding to the resource occupancy event in the resource service is the impact time (≥5 minutes). When the time period when the resource occupancy exceeds 95% is less than 5 minutes, although the resource occupancy event is abnormal, the resource service It can quickly return to normal, and the resource service is not really abnormal, so it can be left untreated; and when the time period when the resource usage exceeds 95% is greater than or equal to 5 minutes, the resource service cannot quickly return to normal, and the resource service is truly abnormal, so the resource can be used The exception handling rules corresponding to the occupancy event amend the resource service.
需要说明的是,表1仅是一种示例性的说明,并不构成对本方案的限定,在具体实施中,每个监控事件也可以对应三个或三个以上的异常条件,比如还可以对应第三异常条件,第三异常条件用于指示服务的异常等级,只有当服务的异常等级超过第三异常条件指示的异常等级时,才使用监控事件对应的异常处理规则进行修复,或者还可以设置第四异常条件,第四异常条件用于指示服 务的联合异常情况,只有当第四异常条件指示的各个服务均异常时,才使用监控事件对应的异常处理规则进行修复,等等,具体不作限定。It should be noted that Table 1 is only an exemplary description, and does not constitute a limitation to the solution. In specific implementation, each monitoring event can also correspond to three or more abnormal conditions, for example, it can also correspond to The third abnormal condition, the third abnormal condition is used to indicate the abnormal level of the service. Only when the abnormal level of the service exceeds the abnormal level indicated by the third abnormal condition, the abnormal handling rule corresponding to the monitoring event is used for repair, or can also be set The fourth abnormal condition, the fourth abnormal condition is used to indicate the combined abnormal situation of the service. Only when the services indicated by the fourth abnormal condition are abnormal, the abnormal handling rules corresponding to the monitoring event are used for repair, etc., and the specific is not limited .
在上述示例中,通过联合真实故障场景对监控事件设置第二异常条件,能够降低检测到假异常的服务的概率,提高检测的准确性;且,通过设置第二异常条件为影响时间和/或关联的监控事件同时异常,能够基于异常时长特征和/或异常数量特征综合判断服务的异常情况,提高异常判断的准确性。In the above example, setting the second abnormal condition for the monitoring event by combining the real failure scenario can reduce the probability of detecting false abnormal services and improve the accuracy of detection; and, by setting the second abnormal condition to affect time and/or The associated monitoring events are abnormal at the same time, and the abnormality of the service can be comprehensively judged based on the abnormal duration characteristics and/or the abnormal quantity characteristics, and the accuracy of abnormal judgment can be improved.
在一种可能的实现方式中,监控事件对应的异常分析规则还可以包括监控事件对应的异常分析算法,同一类型的监控事件可以对应同一种异常分析算法,由于监控事件对应的异常分析规则包括异常分析算法和异常条件,因此每个监控事件对应的异常分析规则可以具有唯一性。如此,在获取到监控事件对应的服务数据后,可以根据服务数据的类型调用对应的异常分析算法对服务数据进行计算,以筛选出服务数据中的异常判断数据,然后判断异常判断数据是否满足监控事件对应的异常条件,若满足,则确定监控事件异常,若不满足,则确定监控事件未异常。In a possible implementation, the abnormality analysis rule corresponding to the monitoring event can also include the abnormality analysis algorithm corresponding to the monitoring event. The same type of monitoring event can correspond to the same type of abnormality analysis algorithm, because the abnormality analysis rule corresponding to the monitoring event includes anomaly Analyze algorithms and abnormal conditions, so the abnormal analysis rules corresponding to each monitoring event can be unique. In this way, after the service data corresponding to the monitoring event is obtained, the corresponding abnormality analysis algorithm can be called according to the type of the service data to calculate the service data, so as to filter out the abnormal judgment data in the service data, and then judge whether the abnormal judgment data meets the monitoring requirements. If the abnormal condition corresponding to the event is met, it is determined that the monitoring event is abnormal, and if it is not met, it is determined that the monitoring event is not abnormal.
本发明实施例中,异常分析算法可以包括日志关键字分析法、服务健康值分析法、阈值分析法和服务自定义分析法中的任意一种或任意多种。下面分别进行分析:In the embodiment of the present invention, the abnormality analysis algorithm may include any one or more of log keyword analysis method, service health value analysis method, threshold value analysis method, and service self-defined analysis method. The following are respectively analyzed:
日志关键字分析法用于对日志数据类型的服务数据进行异常分析,日志数据类型的服务数据包括批量处理时间、批量处理成功量等。具体实施中,可以先基于预设日志字段对服务数据进行分割,得到各个监控日志字段,再使用多模式匹配算法(比如Aho-Corasick算法、wu-manber算法等)对各个监控日志字段进行匹配,将匹配成功的监控日志字段作为异常判断数据,与异常条件中的预设日志字段进行对比,确定监控事件是否异常。The log keyword analysis method is used for abnormal analysis of the service data of the log data type. The service data of the log data type includes batch processing time, batch processing success amount, etc. In specific implementation, the service data can be segmented based on preset log fields to obtain each monitoring log field, and then multiple pattern matching algorithms (such as Aho-Corasick algorithm, wu-manber algorithm, etc.) can be used to match each monitoring log field. The successfully matched monitoring log field is used as the abnormality judgment data and compared with the preset log field in the abnormal condition to determine whether the monitoring event is abnormal.
服务健康值分析法用于对运营数据类型的服务数据进行异常分析,运营数据类型的服务数据包括状态码、带宽、请求数、资源占用率等。具体实施中,可以先根据历史服务数据训练得到任一指标对应的监控模型,然后使用该指标 对应的监控模型对该指标下的服务数据进行预测,得到服务数据在该指标下的预测分值,再将预测分值与自定义的指标分值进行对比,根据对比结果确定健康程度,将健康程度作为异常判断数据,与异常条件中的预设健康程度进行对比,确定监控事件是否异常。The service health value analysis method is used to perform abnormal analysis on the service data of the operation data type. The service data of the operation data type includes status code, bandwidth, number of requests, resource occupancy rate, etc. In specific implementation, you can first train the monitoring model corresponding to any indicator based on historical service data, and then use the monitoring model corresponding to the indicator to predict the service data under the indicator to obtain the predicted score of the service data under the indicator. Then compare the predicted score with the self-defined index score, determine the health level based on the comparison result, use the health level as the abnormality judgment data, and compare it with the preset health level in the abnormal condition to determine whether the monitoring event is abnormal.
阈值分析法用于对指标数据类型的服务数据进行异常分析,指标数据类型的服务数据包括请求数量、告警数量等。具体实施中,可以根据服务的特定指标,从服务数据中提取得到监控事件在每个特定指标下的监控值,将特定指标下的监控值作为异常判断数据,与异常条件中特定指标下的阈值进行对比,确定监控事件是否异常。Threshold analysis method is used for abnormal analysis of service data of indicator data type. Service data of indicator data type includes the number of requests, the number of alarms, and so on. In specific implementation, the monitoring value of the monitoring event under each specific indicator can be extracted from the service data according to the specific indicator of the service, and the monitoring value under the specific indicator is used as the abnormal judgment data, and the threshold value under the specific indicator in the abnormal condition Make a comparison to determine whether the monitoring event is abnormal.
服务自定义分析法用于对未知数据类型或用户需要自定义异常分析算法的服务数据进行异常分析。具体实施中,检测到用户存在服务自定义分析法的需求后,边缘节点可以向用户提供通用接口,以便于用户通过通用接口上传自定义的异常分析算法。相应地,边缘节点在接收到自定义的异常分析算法后,还可以加载该异常分析算法,并使用加载后的异常分析算法对监控事件对应的服务数据进行计算,得到异常判断数据。且,用户还可以同时自定义异常条件,当计算得到异常判断数据后,边缘节点还可以将异常判断数据与用户自定义的异常条件进行对比,确定监控事件是否异常。The service custom analysis method is used to perform anomaly analysis on service data of unknown data types or users who require custom anomaly analysis algorithms. In specific implementation, after detecting that the user has a need for a custom analysis method for the service, the edge node can provide the user with a general interface so that the user can upload the custom anomaly analysis algorithm through the general interface. Correspondingly, after receiving the customized anomaly analysis algorithm, the edge node can also load the anomaly analysis algorithm, and use the loaded anomaly analysis algorithm to calculate the service data corresponding to the monitoring event to obtain the abnormality judgment data. Moreover, the user can also customize the abnormal conditions at the same time. After the abnormality judgment data is calculated, the edge node can also compare the abnormality judgment data with the user-defined abnormal conditions to determine whether the monitoring event is abnormal.
基于上述几种异常分析算法,具体实施中,在获取到监控事件对应的服务数据后,若确定服务数据的类型为日志数据类型,则可以调用日志关键字分析法对服务数据进行异常分析,若确定服务数据的类型为运营数据类型,则可以调用服务健康值分析法对服务数据进行异常分析,若确定服务数据的类型为指标数据类型,则可以调用阈值分析法对服务数据进行异常分析,若确定服务数据的类型为其它数据类型或用户存在自定义异常分析算法的需求,则可以调用服务自定义分析法对服务数据进行异常分析。Based on the above several abnormal analysis algorithms, in specific implementation, after obtaining the service data corresponding to the monitoring event, if the type of the service data is determined to be the log data type, the log keyword analysis method can be called to analyze the abnormality of the service data. If the service data type is determined to be the operational data type, the service health value analysis method can be used to analyze the service data. If the service data type is determined to be the indicator data type, the threshold value analysis method can be used to analyze the service data abnormality. If the type of service data is determined to be other data types or the user has a need for a custom anomaly analysis algorithm, then the service custom analysis method can be called to perform anomaly analysis on the service data.
本发明实施例中,通过设置统一的异常分析算法,并设置各个监控事件对应的各自的异常条件,能够将异常分析方法与实际业务进行解耦,提高异常分 析的灵活性,还可以不用再为每个监控事件设置对应的异常分析算法,降低开发的难度,进一步提高异常分析的灵活性。且,上述方式还支持用户自定义异常分析算法,从而不仅能够根据用户的设置不断补充新的异常分析算法,提高异常分析的适用场景,还能够满足不同用户的需求,提高异常分析的通用性。In the embodiment of the present invention, by setting a unified abnormality analysis algorithm and setting respective abnormal conditions corresponding to each monitoring event, the abnormality analysis method can be decoupled from the actual business, and the flexibility of abnormality analysis can be improved. The corresponding abnormal analysis algorithm is set for each monitoring event, which reduces the difficulty of development and further improves the flexibility of abnormal analysis. Moreover, the above method also supports user-defined anomaly analysis algorithms, which can not only continuously supplement new anomaly analysis algorithms according to user settings, improve the applicable scenarios of anomaly analysis, but also meet the needs of different users and improve the versatility of anomaly analysis.
在步骤202中,当确定第一服务异常时,边缘节点可以查询本地数据库,确定是否存在第一服务的异常处理规则,若存在,则可以直接调用第一服务的异常处理规则对第一服务进行修复,若不存在,则可以生成异常消息,并上报给中心节点110。其中,异常消息中携带有第一服务的相关异常数据,比如第一服务中异常的监控事件的标识、异常的监控事件对应的服务数据中的异常字段、异常时间、异常等级等。In step 202, when it is determined that the first service is abnormal, the edge node can query the local database to determine whether there is an exception handling rule for the first service. If so, it can directly call the exception handling rule of the first service to perform the first service. Repair, if it does not exist, an exception message can be generated and reported to the central node 110. Wherein, the abnormal message carries related abnormal data of the first service, such as the identifier of the abnormal monitoring event in the first service, the abnormal field, the abnormal time, and the abnormal level in the service data corresponding to the abnormal monitoring event.
在一种可能的实现方式中,中心节点110接收到异常消息后,可以先解析异常消息得到异常的监控事件对应的服务数据中的异常字段,然后计算异常字段与运维知识库中每个预设异常事件的匹配程度,并将匹配程度大于预设匹配程度的预设异常事件作为监控事件对应的预设异常事件。若存在匹配程度大于预设匹配程度的预设异常事件,则中心节点110可以基于匹配的预设异常事件分析生成对应的异常处理规则,并将异常处理规则发送给边缘节点。若不存在匹配程度大于预设匹配程度的预设异常事件,则中心节点110可以将异常消息推送给用户,由用户设置对应的异常处理规则,并将设置好的异常处理规则发送给边缘节点。In a possible implementation, after receiving the abnormal message, the central node 110 can first parse the abnormal message to obtain the abnormal field in the service data corresponding to the abnormal monitoring event, and then calculate the abnormal field and each prediction in the operation and maintenance knowledge base. Set the matching degree of the abnormal event, and use the preset abnormal event with the matching degree greater than the preset matching degree as the preset abnormal event corresponding to the monitoring event. If there is a preset abnormal event with a matching degree greater than the preset matching degree, the central node 110 may analyze the matched preset abnormal event to generate a corresponding abnormal handling rule, and send the abnormal handling rule to the edge node. If there is no preset abnormal event with a matching degree greater than the preset matching degree, the central node 110 may push the exception message to the user, and the user sets the corresponding exception handling rule, and sends the set exception handling rule to the edge node.
相应地,边缘节点接收到异常处理规则后,除了可以使用该异常处理规则对第一服务进行修复之外,还可以使用第一服务中异常的监控事件和第一服务的异常处理规则更新本地数据库中存储的第一服务对应的自闭环策略,以不断充实本地数据库。如此,当再次出现第一服务异常后,可以直接调用本地数据库中第一服务的异常处理规则修复异常,而无需再发送给中心节点,从而提高边缘节点的异常处理能力。Correspondingly, after receiving the exception handling rule, the edge node can not only use the exception handling rule to repair the first service, but also use the abnormal monitoring event in the first service and the exception handling rule of the first service to update the local database. The self-closed loop strategy corresponding to the first service stored in, to continuously enrich the local database. In this way, when the first service exception occurs again, the exception handling rule of the first service in the local database can be directly called to repair the exception without sending it to the central node, thereby improving the exception handling capability of the edge node.
在一个示例中,中心节点110还可以向用户展示各个边缘节点的服务情况, 以便于用户及时查看各种服务的异常情况及分布情况。展示的信息可以包括各个边缘节点中任一服务的异常情况、任一服务中各个监控事件的异常情况、异常的监控事件的处理结果、异常的监控事件的分布情况以及各个监控事件的关联关系中的任意一项或任意多项。且,中心节点110可以以全息视图的形式展示给用户,也可以以表格的形式展示给用户,不作限定。In an example, the central node 110 may also display the service status of each edge node to the user, so that the user can check the abnormal status and distribution status of various services in a timely manner. The displayed information can include the abnormal situation of any service in each edge node, the abnormal situation of each monitoring event in any service, the processing result of the abnormal monitoring event, the distribution of abnormal monitoring events, and the correlation of each monitoring event. Any one or more of. Moreover, the central node 110 may be displayed to the user in the form of a holographic view, or may be displayed to the user in the form of a table, which is not limited.
本发明实施例中,通过将服务的异常识别和异常修复放置在边缘节点侧执行,而不统一上报给中心节点,可以有效降低中心节点的工作压力,节省网络开销和时间成本;且,该方案由边缘节点对自身的异常进行自闭环处理,还能够及时发现异常、处理异常,不仅提高异常识别和处理的效率,还能及时恢复服务的可用性。In the embodiment of the present invention, by placing service abnormality identification and abnormality repair on the side of the edge node for execution, instead of uniformly reporting to the central node, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved; and this solution The edge node performs self-closed loop processing on its own anomalies, and can also discover and handle anomalies in time, which not only improves the efficiency of anomaly identification and processing, but also restores service availability in time.
图3为本发明实施例提供的一种异常处理方法对应的整体交互流程示意图,如图3所示,该方法包括:Fig. 3 is a schematic diagram of the overall interaction flow corresponding to an exception handling method provided by an embodiment of the present invention. As shown in Fig. 3, the method includes:
步骤301,中心节点检测到用户在异常监控配置界面中输入异常监控配置信息后,获取并存储异常监控配置信息。Step 301: After detecting that the user inputs abnormal monitoring configuration information in the abnormal monitoring configuration interface, the central node acquires and stores the abnormal monitoring configuration information.
其中,异常监控配置信息中可以包括各个服务对应的自闭环策略,任一服务对应的自闭环策略可以包括该服务的异常分析规则,还可以包括该服务的异常处理规则和/或服务数据的获取规则。Wherein, the abnormality monitoring configuration information may include the self-closed-loop strategy corresponding to each service. The self-closed-loop strategy corresponding to any service may include the abnormal analysis rules of the service, and may also include the abnormal handling rules of the service and/or the acquisition of service data. rule.
步骤302,边缘节点在启动时向中心节点发送注册请求。Step 302: The edge node sends a registration request to the central node when it is started.
步骤303,中心节点对注册请求进行验证,若验证成功,则执行步骤304,若验证失败,则执行步骤315。In step 303, the central node verifies the registration request. If the verification is successful, step 304 is executed, and if the verification fails, step 315 is executed.
步骤304,中心节点向边缘节点发送注册成功的响应消息。Step 304: The central node sends a response message of successful registration to the edge node.
步骤305,边缘节点从中心节点中获取各种服务对应的自闭环策略,并存储在边缘节点的本地数据库中;各种服务包括第一服务。Step 305: The edge node obtains the self-closed loop strategy corresponding to various services from the central node and stores it in the local database of the edge node; the various services include the first service.
步骤306,边缘节点调用第一服务对应的数据源接口从第一服务的服务进程中获取第一服务的服务数据。Step 306: The edge node invokes the data source interface corresponding to the first service to obtain the service data of the first service from the service process of the first service.
步骤307,边缘节点使第一服务的异常分析规则对第一服务的服务数据进 行分析,确定第一服务是否异常,若异常,则执行步骤308,若未异常,则执行步骤306。In step 307, the edge node makes the abnormality analysis rule of the first service analyze the service data of the first service to determine whether the first service is abnormal.
步骤308,边缘节点查询本地数据库判断是否存在第一服务的异常处理规则,若否,则执行步骤309,若是,则执行步骤312。In step 308, the edge node queries the local database to determine whether there is an exception handling rule for the first service, if not, execute step 309, and if yes, execute step 312.
步骤309,边缘节点将异常消息发送给中心节点,异常消息中携带有第一服务的相关异常数据。Step 309: The edge node sends an abnormal message to the central node, and the abnormal message carries related abnormal data of the first service.
步骤310,中心节点基于解析得到的第一服务的相关异常数据设置第一服务的异常处理规则。Step 310: The central node sets an exception handling rule for the first service based on the parsed related exception data of the first service.
步骤311,中心节点将第一服务的异常处理规则发送给边缘节点。Step 311: The central node sends the exception handling rule of the first service to the edge node.
步骤312,边缘节点使用第一服务的异常处理规则对第一服务进行修复。Step 312: The edge node uses the exception handling rule of the first service to repair the first service.
步骤313,中心节点若确定第一服务的异常处理规则未存储在本地数据库中,则使用第一服务的异常处理规则更新本地数据库。Step 313: If the central node determines that the exception handling rule of the first service is not stored in the local database, it updates the local database using the exception handling rule of the first service.
步骤314,边缘节点重复向中心节点发送注册请求,在重复发送设定次数后,若还未成功注册,则生成告警消息。In step 314, the edge node repeatedly sends a registration request to the central node, and after repeatedly sending a set number of times, if the registration is not successful, an alarm message is generated.
本发明的上述实施例中,任一边缘节点用异常分析规则分析服务数据,确定所述边缘节点中第一服务是否异常;所述服务数据中包括第一服务对应的服务数据;进一步地,所述边缘节点确定所述第一服务异常后,若所述边缘节点中存在所述第一服务的异常处理规则,则使用所述异常处理规则对所述第一服务进行修复;若所述边缘节点中不存在所述第一服务的异常处理规则,则上报至所述中心节点。本发明实施例中,通过将服务的异常识别和异常修复放置在边缘节点侧执行,而不统一上报给中心节点执行,可以有效降低中心节点的工作压力,节省网络开销和时间成本;且,该方案由边缘节点对自身的异常进行自闭环处理,还能够及时发现异常、处理异常,不仅提高异常识别和处理的效率,还能及时恢复服务的可用性。In the above-mentioned embodiment of the present invention, any edge node analyzes service data using an abnormality analysis rule to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service; further, After the edge node determines that the first service is abnormal, if an exception handling rule of the first service exists in the edge node, use the exception handling rule to repair the first service; if the edge node If there is no exception handling rule for the first service in, it is reported to the central node. In the embodiment of the present invention, by placing service abnormality identification and abnormality repair on the edge node side for execution instead of uniformly reporting to the central node for execution, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved; In the solution, the edge node performs self-closed loop processing of its own abnormalities, and can also discover and handle abnormalities in time, which not only improves the efficiency of abnormal identification and processing, but also restores service availability in time.
针对上述方法流程,本发明实施例还提供一种基于边缘网络的异常处理装置,该装置的具体内容可以参照上述方法实施。In view of the foregoing method flow, an embodiment of the present invention also provides an edge network-based exception handling device, and the specific content of the device can be implemented with reference to the foregoing method.
图4为本发明实施例提供的一种基于边缘网络的异常处理装置的结构示意图,所述边缘网络包括中心节点和至少一个边缘节点;所述装置包括:Fig. 4 is a schematic structural diagram of an abnormality processing device based on an edge network provided by an embodiment of the present invention. The edge network includes a central node and at least one edge node; the device includes:
异常分析401,用于使用异常分析规则分析服务数据,确定边缘节点中第一服务是否异常;所述服务数据中包括第一服务对应的服务数据; Anomaly analysis 401, configured to analyze service data using anomaly analysis rules to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service;
异常处理模块402,用于确定所述第一服务异常后,若所述边缘节点中存在所述第一服务的异常处理规则,则使用所述异常处理规则对所述第一服务进行修复;若所述边缘节点中不存在所述第一服务的异常处理规则,则上报至所述中心节点。The exception processing module 402 is configured to, after determining that the first service is abnormal, if an exception handling rule of the first service exists in the edge node, use the exception handling rule to repair the first service; if If there is no exception handling rule for the first service in the edge node, it is reported to the central node.
可选的,所述异常处理模块402上报至所述中心节点后,由所述中心节点确定所述第一服务的异常处理规则并下发给所述边缘节点;Optionally, after the exception handling module 402 reports to the central node, the central node determines the exception handling rule of the first service and issues it to the edge node;
所述装置还包括收发模块403,所述收发模块403用于:接收所述中心节点发送的所述第一服务的异常处理规则;The device also includes a transceiver module 403, which is configured to: receive the exception handling rule of the first service sent by the central node;
所述异常处理模块402还用于:使用所述第一服务的异常处理规则对所述第一服务进行修复。The exception handling module 402 is further configured to: use the exception handling rule of the first service to repair the first service.
可选的,所述装置还包括收发模块403;所述异常分析模块401用异常分析规则分析服务数据前,所述收发模块403用于:Optionally, the device further includes a transceiver module 403; before the abnormality analysis module 401 analyzes the service data using anomaly analysis rules, the transceiver module 403 is configured to:
向所述中心节点发送注册请求;所述注册请求用于所述中心节点与所述边缘节点建立通信连接;以及,与所述中心节点建立通信连接后,从所述中心节点获取各种服务对应的自闭环策略;所述各种服务包括第一服务;任一服务对应的自闭环策略包括所述服务的异常分析规则,或者还包括所述服务的异常处理规则。Send a registration request to the central node; the registration request is used for the central node to establish a communication connection with the edge node; and, after the communication connection is established with the central node, various service correspondences are obtained from the central node The self-closed-loop strategy of the service; the various services include the first service; the self-closed-loop strategy corresponding to any service includes the exception analysis rules of the service, or also includes the exception handling rules of the service.
可选的,所述各种服务对应的自闭环策略通过如下方式得到:Optionally, the self-closed loop strategy corresponding to the various services is obtained in the following manner:
所述中心节点当检测到用户在异常监控配置界面中输入异常监控配置信息后,获取并解析异常监控配置信息,得到各种服务对应的自闭环策略,并存储在所述中心节点的本地数据库中。When the central node detects that the user enters the abnormal monitoring configuration information in the abnormal monitoring configuration interface, it obtains and analyzes the abnormal monitoring configuration information, obtains the self-closed loop strategy corresponding to various services, and stores it in the local database of the central node .
可选的,任一服务的异常分析规则包括所述服务中各个监控事件对应的异 常分析规则;Optionally, the anomaly analysis rule of any service includes an anomaly analysis rule corresponding to each monitoring event in the service;
所述异常分析模块401具体用于:The abnormality analysis module 401 is specifically used for:
针对于所述第一服务中的任一监控事件,从所述第一服务的服务数据中解析出所述监控事件的服务数据,调用与所述监控事件的服务数据的类型匹配的异常分析算法对所述监控事件的服务数据进行分析,若分析结果满足所述监控事件对应的第一异常条件,则确定所述监控事件异常,至少根据所述监控事件确定所述第一服务是否异常;若所述分析结果不满足所述监控事件对应的第一异常条件,则确定所述第一服务未异常。For any monitoring event in the first service, the service data of the monitoring event is parsed from the service data of the first service, and an abnormality analysis algorithm that matches the type of the service data of the monitoring event is invoked Analyze the service data of the monitoring event, and if the analysis result meets the first abnormal condition corresponding to the monitoring event, determine that the monitoring event is abnormal, and determine whether the first service is abnormal at least according to the monitoring event; if If the analysis result does not satisfy the first abnormal condition corresponding to the monitoring event, it is determined that the first service is not abnormal.
可选的,所述异常分析模块401具体用于:Optionally, the abnormality analysis module 401 is specifically configured to:
若确定所述监控事件对应的异常条件仅包括第一异常条件,则确定所述第一服务异常;若确定所述监控事件对应的异常条件还包括第二异常条件,且所述第二异常条件为影响时间,则当所述监控事件的异常时长小于所述影响时间时,确定所述第一服务未异常,当所述监控事件的异常时长大于或等于所述影响时间时,确定所述第一服务异常。If it is determined that the abnormal condition corresponding to the monitoring event only includes the first abnormal condition, it is determined that the first service is abnormal; if it is determined that the abnormal condition corresponding to the monitoring event also includes a second abnormal condition, and the second abnormal condition Is the impact time, when the abnormal duration of the monitoring event is less than the impact time, it is determined that the first service is not abnormal, and when the abnormal duration of the monitoring event is greater than or equal to the impact time, the first service is determined to be A service is abnormal.
可选的,所述异常分析模块401还用于:Optionally, the abnormality analysis module 401 is further configured to:
若确定所述第二异常条件为关联的监控事件同时异常,则确定所述监控事件关联的其它监控事件是否异常,当所述其它监控事件也异常时,确定所述第一服务异常,当存在至少一个其他监控事件正常时,确定所述第一服务未异常。If it is determined that the second abnormal condition is that the associated monitoring event is abnormal at the same time, it is determined whether the other monitoring event associated with the monitoring event is abnormal. When the other monitoring event is also abnormal, it is determined that the first service is abnormal. When at least one other monitoring event is normal, it is determined that the first service is not abnormal.
从上述内容可以看出:本发明的上述实施例中,任一边缘节点用异常分析规则分析服务数据,确定所述边缘节点中第一服务是否异常;所述服务数据中包括第一服务对应的服务数据;进一步地,所述边缘节点确定所述第一服务异常后,若所述边缘节点中存在所述第一服务的异常处理规则,则使用所述异常处理规则对所述第一服务进行修复;若所述边缘节点中不存在所述第一服务的异常处理规则,则上报至所述中心节点。本发明实施例中,通过将服务的异常识别和异常修复放置在边缘节点侧执行,而不统一上报给中心节点执行,可以有效降低中心节点的工作压力,节省网络开销和时间成本;且,该方案由边缘 节点对自身的异常进行自闭环处理,还能够及时发现异常、处理异常,不仅提高异常识别和处理的效率,还能及时恢复服务的可用性。It can be seen from the above content that in the above-mentioned embodiment of the present invention, any edge node analyzes service data using an abnormality analysis rule to determine whether the first service in the edge node is abnormal; the service data includes the corresponding information for the first service Service data; further, after the edge node determines that the first service is abnormal, if there is an exception handling rule for the first service in the edge node, the exception handling rule is used to perform the first service Repair; if there is no exception handling rule for the first service in the edge node, report to the central node. In the embodiment of the present invention, by placing service abnormality identification and abnormality repair on the edge node side for execution instead of uniformly reporting to the central node for execution, the work pressure of the central node can be effectively reduced, and network overhead and time cost can be saved; In the solution, the edge node performs self-closed loop processing of its own abnormalities, and can also discover and handle abnormalities in time, which not only improves the efficiency of abnormal identification and processing, but also restores service availability in time.
基于同一发明构思,本发明实施例还提供了一种计算设备,如图5所示,包括至少一个处理器501,以及与至少一个处理器连接的存储器502,本发明实施例中不限定处理器501与存储器502之间的具体连接介质,图5中处理器501和存储器502之间通过总线连接为例。总线可以分为地址总线、数据总线、控制总线等。Based on the same inventive concept, an embodiment of the present invention also provides a computing device, as shown in FIG. 5, including at least one processor 501 and a memory 502 connected to the at least one processor. The embodiment of the present invention does not limit the processor For the specific connection medium between the 501 and the memory 502, the connection between the processor 501 and the memory 502 in FIG. 5 is taken as an example. The bus can be divided into address bus, data bus, control bus and so on.
在本发明实施例中,存储器502存储有可被至少一个处理器501执行的指令,至少一个处理器501通过执行存储器502存储的指令,可以执行前述的基于边缘网络的异常处理方法中所包括的步骤。In the embodiment of the present invention, the memory 502 stores instructions that can be executed by at least one processor 501. By executing the instructions stored in the memory 502, the at least one processor 501 can execute the aforementioned edge network-based exception handling method included step.
其中,处理器501是计算设备的控制中心,可以利用各种接口和线路连接计算设备的各个部分,通过运行或执行存储在存储器502内的指令以及调用存储在存储器502内的数据,从而实现数据处理。可选的,处理器501可包括一个或多个处理单元,处理器501可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理下发指令。可以理解的是,上述调制解调处理器也可以不集成到处理器501中。在一些实施例中,处理器501和存储器502可以在同一芯片上实现,在一些实施例中,它们也可以在独立的芯片上分别实现。Among them, the processor 501 is the control center of the computing device, which can use various interfaces and lines to connect various parts of the computing device, and realize data by running or executing instructions stored in the memory 502 and calling data stored in the memory 502. deal with. Optionally, the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor. The application processor mainly processes the operating system, user interface, and application programs. The adjustment processor mainly handles issuing instructions. It can be understood that the foregoing modem processor may not be integrated into the processor 501. In some embodiments, the processor 501 and the memory 502 may be implemented on the same chip, and in some embodiments, they may also be implemented on separate chips.
处理器501可以是通用处理器,例如中央处理器(CPU)、数字信号处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本发明实施例中公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合基于边缘网络的异常处理实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。The processor 501 may be a general-purpose processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (ASIC), a field programmable gate array or other programmable logic devices, discrete gates or transistors Logic devices and discrete hardware components can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of the present invention. The general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the exception handling based on the edge network can be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
存储器502作为一种非易失性计算机可读存储介质,可用于存储非易失性 软件程序、非易失性计算机可执行程序以及模块。存储器502可以包括至少一种类型的存储介质,例如可以包括闪存、硬盘、多媒体卡、卡型存储器、随机访问存储器(Random Access Memory,RAM)、静态随机访问存储器(Static Random Access Memory,SRAM)、可编程只读存储器(Programmable Read Only Memory,PROM)、只读存储器(Read Only Memory,ROM)、带电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性存储器、磁盘、光盘等等。存储器502是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本发明实施例中的存储器502还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。As a non-volatile computer-readable storage medium, the memory 502 can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The memory 502 may include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), magnetic memory, disk , CD, etc. The memory 502 is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 502 in the embodiment of the present invention may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
基于同一发明构思,本发明实施例还提供了一种计算机可读存储介质,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行图2或图3任意所述的基于边缘网络的异常处理方法。Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium that stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes Figure 2 or Figure 3 arbitrarily described an edge network-based exception handling method.
本领域内的技术人员应明白,本发明的实施例可提供为方法、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention can be provided as a method or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设 备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims (10)

  1. 一种基于边缘网络的异常处理方法,其特征在于,所述边缘网络包括中心节点和至少一个边缘节点;所述方法包括:An abnormality processing method based on an edge network, wherein the edge network includes a central node and at least one edge node; the method includes:
    任一边缘节点用异常分析规则分析服务数据,确定所述边缘节点中第一服务是否异常;所述服务数据中包括第一服务对应的服务数据;Any edge node analyzes service data using an abnormality analysis rule to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service;
    所述边缘节点确定所述第一服务异常后,若所述边缘节点中存在所述第一服务的异常处理规则,则使用所述异常处理规则对所述第一服务进行修复;若所述边缘节点中不存在所述第一服务的异常处理规则,则上报至所述中心节点。After the edge node determines that the first service is abnormal, if there is an exception handling rule for the first service in the edge node, the abnormal handling rule is used to repair the first service; if the edge If there is no exception handling rule for the first service in the node, it is reported to the central node.
  2. 根据权利要求1所述的方法,其特征在于,所述上报至所述中心节点后,由所述中心节点确定所述第一服务的异常处理规则并下发给所述边缘节点;The method according to claim 1, wherein after the reporting to the central node, the central node determines the exception handling rule of the first service and sends it to the edge node;
    所述边缘节点接收所述中心节点发送的所述第一服务的异常处理规则;Receiving, by the edge node, the exception handling rule of the first service sent by the central node;
    所述边缘节点使用所述第一服务的异常处理规则对所述第一服务进行修复。The edge node uses the exception handling rule of the first service to repair the first service.
  3. 根据权利要求1所述的方法,其特征在于,所述任一边缘节点用异常分析规则分析服务数据前,还包括:The method according to claim 1, characterized in that, before any edge node analyzes the service data using an abnormality analysis rule, the method further comprises:
    所述边缘节点向所述中心节点发送注册请求;所述注册请求用于所述中心节点与所述边缘节点建立通信连接;The edge node sends a registration request to the central node; the registration request is used for the central node to establish a communication connection with the edge node;
    所述边缘节点与所述中心节点建立通信连接后,从所述中心节点获取各种服务对应的自闭环策略;所述各种服务包括第一服务;任一服务对应的自闭环策略包括所述服务的异常分析规则,或者还包括所述服务的异常处理规则。After the edge node establishes a communication connection with the central node, the self-closed loop strategy corresponding to various services is obtained from the central node; the various services include the first service; the self-closed loop strategy corresponding to any service includes the The exception analysis rule of the service, or the exception handling rule of the service.
  4. 根据权利要求3所述的方法,其特征在于,所述各种服务对应的自闭环策略通过如下方式得到:The method according to claim 3, wherein the self-closed loop strategies corresponding to the various services are obtained in the following manner:
    所述中心节点当检测到用户在异常监控配置界面中输入异常监控配置信息后,获取并解析异常监控配置信息,得到各种服务对应的自闭环策略,并存储在所述中心节点的本地数据库中。When the central node detects that the user enters the abnormal monitoring configuration information in the abnormal monitoring configuration interface, it obtains and analyzes the abnormal monitoring configuration information, obtains the self-closed loop strategy corresponding to various services, and stores it in the local database of the central node .
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,任一服务的异 常分析规则包括所述服务中各个监控事件对应的异常分析规则;The method according to any one of claims 1 to 4, wherein the anomaly analysis rule of any service includes an anomaly analysis rule corresponding to each monitoring event in the service;
    所述任一边缘节点用异常分析规则分析服务数据,确定所述边缘节点中第一服务是否异常,包括:The analysis of service data by any edge node using an abnormality analysis rule to determine whether the first service in the edge node is abnormal includes:
    所述边缘节点针对于所述第一服务中的任一监控事件,从所述第一服务的服务数据中解析出所述监控事件的服务数据,调用与所述监控事件的服务数据的类型匹配的异常分析算法对所述监控事件的服务数据进行分析,若分析结果满足所述监控事件对应的第一异常条件,则确定所述监控事件异常,至少根据所述监控事件确定所述第一服务是否异常;若所述分析结果不满足所述监控事件对应的第一异常条件,则确定所述第一服务未异常。For any monitoring event in the first service, the edge node parses out the service data of the monitoring event from the service data of the first service, and invokes the type of service data matching the monitoring event The abnormal analysis algorithm of the monitoring event analyzes the service data of the monitoring event, and if the analysis result meets the first abnormal condition corresponding to the monitoring event, it is determined that the monitoring event is abnormal, and the first service is determined at least according to the monitoring event Whether it is abnormal; if the analysis result does not meet the first abnormal condition corresponding to the monitoring event, it is determined that the first service is not abnormal.
  6. 根据权利要求5所述的方法,其特征在于,所述边缘节点至少根据所述监控事件确定所述第一服务是否异常,包括:The method according to claim 5, wherein the edge node determining whether the first service is abnormal at least according to the monitoring event comprises:
    所述边缘节点若确定所述监控事件对应的异常条件仅包括第一异常条件,则确定所述第一服务异常;若确定所述监控事件对应的异常条件还包括第二异常条件,且所述第二异常条件为影响时间,则当所述监控事件的异常时长小于所述影响时间时,确定所述第一服务未异常,当所述监控事件的异常时长大于或等于所述影响时间时,确定所述第一服务异常。If the edge node determines that the abnormal condition corresponding to the monitoring event only includes the first abnormal condition, it determines that the first service is abnormal; if it determines that the abnormal condition corresponding to the monitoring event also includes the second abnormal condition, and the The second abnormal condition is the impact time. When the abnormal duration of the monitoring event is less than the impact time, it is determined that the first service is not abnormal, and when the abnormal duration of the monitoring event is greater than or equal to the impact time, It is determined that the first service is abnormal.
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:The method according to claim 6, wherein the method further comprises:
    所述边缘节点若确定所述第二异常条件为关联的监控事件同时异常,则确定所述监控事件关联的其他监控事件是否异常,当所述其它监控事件也异常时,确定所述第一服务异常,当存在至少一个其它监控事件正常时,确定所述第一服务未异常。If the edge node determines that the second abnormal condition is that the associated monitoring event is abnormal at the same time, it determines whether the other monitoring event associated with the monitoring event is abnormal, and when the other monitoring event is also abnormal, determines the first service Abnormal, when at least one other monitoring event is normal, it is determined that the first service is not abnormal.
  8. 一种基于边缘网络的异常处理装置,其特征在于,所述边缘网络包括中心节点和至少一个边缘节点;所述装置包括:An abnormality processing device based on an edge network, wherein the edge network includes a central node and at least one edge node; the device includes:
    异常分析模块,用于使用异常分析规则分析服务数据,确定边缘节点中第一服务是否异常;所述服务数据中包括第一服务对应的服务数据;An anomaly analysis module, configured to analyze service data using anomaly analysis rules to determine whether the first service in the edge node is abnormal; the service data includes service data corresponding to the first service;
    异常处理模块,用于确定所述第一服务异常后,若所述边缘节点中存在所 述第一服务的异常处理规则,则使用所述异常处理规则对所述第一服务进行修复;若所述边缘节点中不存在所述第一服务的异常处理规则,则上报至所述中心节点。The exception processing module is configured to, after determining that the first service is abnormal, if there is an exception handling rule for the first service in the edge node, use the exception handling rule to repair the first service; If there is no exception handling rule for the first service in the edge node, it is reported to the central node.
  9. 一种计算设备,其特征在于,包括至少一个处理器以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行权利要求1~7任一权利要求所述的方法。A computing device, comprising at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processor executes claim 1 ~7 The method of any one of claims.
  10. 一种计算机可读存储介质,其特征在于,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行权利要求1~7任一权利要求所述的方法。A computer-readable storage medium, characterized in that it stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes any one of claims 1-7 Require the described method.
PCT/CN2020/091867 2020-02-25 2020-05-22 Edge network-based anomaly processing method and apparatus WO2021169064A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010115008.8A CN111355610A (en) 2020-02-25 2020-02-25 Exception handling method and device based on edge network
CN202010115008.8 2020-02-25

Publications (1)

Publication Number Publication Date
WO2021169064A1 true WO2021169064A1 (en) 2021-09-02

Family

ID=71197132

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/091867 WO2021169064A1 (en) 2020-02-25 2020-05-22 Edge network-based anomaly processing method and apparatus

Country Status (2)

Country Link
CN (1) CN111355610A (en)
WO (1) WO2021169064A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806225A (en) * 2021-09-24 2021-12-17 上海淇玥信息技术有限公司 Method and device for identifying service abnormal node and electronic equipment
CN118413562A (en) * 2024-07-04 2024-07-30 中钢集团武汉安全环保研究院有限公司 Edge computing method, device and system

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111800299A (en) * 2020-07-08 2020-10-20 广州市品高软件股份有限公司 Operation maintenance system and method of edge cloud
CN112073231B (en) * 2020-08-31 2023-08-18 深圳市国电科技通信有限公司 Local area network linkage protection method, device, computer equipment and storage medium
CN112492632B (en) * 2020-11-09 2023-02-17 厦门亿联网络技术股份有限公司 Anomaly monitoring method and system based on roaming system
CN112583898B (en) 2020-11-30 2023-08-15 北京百度网讯科技有限公司 Business process arrangement method, device and readable medium
CN114666075B (en) * 2020-12-08 2023-04-07 上海交通大学 Distributed network anomaly detection method and system based on depth feature coarse coding
CN114765555A (en) * 2021-01-12 2022-07-19 华为技术有限公司 Network threat processing method and communication device
CN112988327A (en) * 2021-03-04 2021-06-18 杭州谐云科技有限公司 Container safety management method and system based on cloud edge cooperation
CN113013990A (en) * 2021-03-18 2021-06-22 华润电力技术研究院有限公司 Generator set fault early warning method, system and related equipment
CN113887749A (en) * 2021-08-23 2022-01-04 国网江苏省电力有限公司信息通信分公司 Cloud edge cooperation-based multi-dimensional monitoring and disposal method, device and platform for power internet of things
CN113806092A (en) * 2021-09-18 2021-12-17 济南浪潮数据技术有限公司 Storage device management method, system, device and medium
CN114640709B (en) * 2022-03-31 2023-07-25 苏州浪潮智能科技有限公司 Edge node processing method, device and medium
CN115297124B (en) * 2022-07-25 2023-08-04 天翼云科技有限公司 System operation and maintenance management method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015088851A1 (en) * 2013-12-09 2015-06-18 Cisco Technology, Inc. Repair of failed network routing arcs using data plane protocol
CN106375328A (en) * 2016-09-19 2017-02-01 中国人民解放军国防科学技术大学 Adaptive optimization operation method of large-scale data distribution system
CN109769023A (en) * 2019-01-16 2019-05-17 网宿科技股份有限公司 A kind of data transmission method, associated server and storage medium
CN109889569A (en) * 2019-01-03 2019-06-14 网宿科技股份有限公司 CDN service dispatching method and system
CN110430071A (en) * 2019-07-19 2019-11-08 云南电网有限责任公司信息中心 Service node fault self-recovery method, apparatus, computer equipment and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101883016B (en) * 2009-05-05 2014-11-05 中兴通讯股份有限公司 System and method for generating deep packet inspection equipment linkage strategy
CN101583024B (en) * 2009-06-04 2011-06-22 中兴通讯股份有限公司 Distributed network video monitoring system and registration control method thereof
CN101790156B (en) * 2009-11-19 2011-10-26 北京邮电大学 Strategy optimization based method and device for repairing fault of terminal software
CN103166778A (en) * 2011-12-13 2013-06-19 成都勤智数码科技有限公司 Method and device for automatically and intelligently processing malfunction
CN104299659B (en) * 2013-07-16 2017-08-04 中广核工程有限公司 Nuclear power station running state monitoring method, apparatus and system
CN103838637A (en) * 2014-03-03 2014-06-04 江苏智联天地科技有限公司 Terminal automatic fault diagnosis and restoration method on basis of data mining
CN107026865A (en) * 2017-04-14 2017-08-08 北京奇虎科技有限公司 Anomalous event processing method and system, client and service end
CN108595333B (en) * 2018-04-26 2021-08-03 Oppo广东移动通信有限公司 Health examination method and device for application process in PaaS platform
CN109639516B (en) * 2018-10-17 2022-05-17 平安科技(深圳)有限公司 Monitoring method, device, equipment and storage medium of distributed network system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015088851A1 (en) * 2013-12-09 2015-06-18 Cisco Technology, Inc. Repair of failed network routing arcs using data plane protocol
CN106375328A (en) * 2016-09-19 2017-02-01 中国人民解放军国防科学技术大学 Adaptive optimization operation method of large-scale data distribution system
CN109889569A (en) * 2019-01-03 2019-06-14 网宿科技股份有限公司 CDN service dispatching method and system
CN109769023A (en) * 2019-01-16 2019-05-17 网宿科技股份有限公司 A kind of data transmission method, associated server and storage medium
CN110430071A (en) * 2019-07-19 2019-11-08 云南电网有限责任公司信息中心 Service node fault self-recovery method, apparatus, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUANJIE LIU, LIU ZHENG: "Research on the Application of New Smart City Based on Edge Intelligence Collaboration", SCIENCE AND TECHNOLOGY INNOVATION HERALD, SHIJIE ZHISHI CHUBANSHE, CN, vol. 17, no. 1, 1 January 2020 (2020-01-01), CN, pages 143 - 146,148, XP055840163, ISSN: 1674-098X, DOI: 10.16660/j.cnki.1674-098X.2020.01.143 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806225A (en) * 2021-09-24 2021-12-17 上海淇玥信息技术有限公司 Method and device for identifying service abnormal node and electronic equipment
CN113806225B (en) * 2021-09-24 2024-06-07 上海淇玥信息技术有限公司 Business abnormal node identification method and device and electronic equipment
CN118413562A (en) * 2024-07-04 2024-07-30 中钢集团武汉安全环保研究院有限公司 Edge computing method, device and system

Also Published As

Publication number Publication date
CN111355610A (en) 2020-06-30

Similar Documents

Publication Publication Date Title
WO2021169064A1 (en) Edge network-based anomaly processing method and apparatus
CN111049705B (en) Method and device for monitoring distributed storage system
US20190018667A1 (en) Systems and Methods of Constructing a Network Topology
WO2023071761A1 (en) Anomaly positioning method and device
CN111641524A (en) Monitoring data processing method, device, equipment and storage medium
CN108390793A (en) A kind of method and device of analysis system stability
CN118138471A (en) Knowledge-graph-based network model construction method, device and storage medium
CN109409948B (en) Transaction abnormity detection method, device, equipment and computer readable storage medium
CN109426597B (en) Application performance monitoring method, device, equipment, system and storage medium
CN117151726A (en) Fault repairing method, repairing device, electronic equipment and storage medium
CN111897643B (en) Thread pool configuration system, method, device and storage medium
CN111782456B (en) Anomaly detection method, device, computer equipment and storage medium
CN107885634B (en) Method and device for processing abnormal information in monitoring
WO2023273461A1 (en) Robot operating state monitoring system, and method
CN109714214B (en) Server exception handling method and management equipment
CN104780062A (en) Method for quickly acquiring IP address of BMC management network interface
CN114553682A (en) Real-time alarm method, system, computer equipment and storage medium
CN115049493A (en) Block chain data tracking method and device and electronic equipment
CN114185743A (en) Data processing method and device, computer equipment and storage medium
WO2012088761A1 (en) Data analysis-based security information exchange monitoring system and method
CN112910733A (en) Full link monitoring system and method based on big data
CN107612755A (en) The management method and its device of a kind of cloud resource
CN116089446A (en) Optimization control method and device for structured query statement
CN115098505A (en) Method and device for changing table structure of database and electronic equipment
CN115705259A (en) Fault processing method, related device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20920994

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20920994

Country of ref document: EP

Kind code of ref document: A1