[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112511339A - Container monitoring alarm method, system, equipment and storage medium based on multiple clusters - Google Patents

Container monitoring alarm method, system, equipment and storage medium based on multiple clusters Download PDF

Info

Publication number
CN112511339A
CN112511339A CN202011251413.9A CN202011251413A CN112511339A CN 112511339 A CN112511339 A CN 112511339A CN 202011251413 A CN202011251413 A CN 202011251413A CN 112511339 A CN112511339 A CN 112511339A
Authority
CN
China
Prior art keywords
cluster
alarm
monitoring
index
prometheus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011251413.9A
Other languages
Chinese (zh)
Other versions
CN112511339B (en
Inventor
叶奕珺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baofu Network Technology Shanghai Co ltd
Original Assignee
Baofu Network Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baofu Network Technology Shanghai Co ltd filed Critical Baofu Network Technology Shanghai Co ltd
Priority to CN202011251413.9A priority Critical patent/CN112511339B/en
Publication of CN112511339A publication Critical patent/CN112511339A/en
Application granted granted Critical
Publication of CN112511339B publication Critical patent/CN112511339B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Environmental & Geological Engineering (AREA)
  • Debugging And Monitoring (AREA)
  • Alarm Systems (AREA)

Abstract

The application discloses a container monitoring and alarming method, a system, equipment and a storage medium based on multiple clusters, wherein the method comprises the following steps: configuring capturing rules of indexes of all set resources in prometheus.yml through a monitoring module, deploying monitoring components of at least one cluster to be monitored, and periodically capturing instantaneous index data of running of each resource in the cluster by the monitoring components according to the preset capturing rules; yml, configuring alarm rules of all set resources in prometheus.yml through an alarm module, configuring alarm information through an alarm management component, and sending the alarm information to a message notification module; when the instantaneous index data of any resource operation captured by the monitoring module triggers an alarm rule, the alarm information is sent to the message notification module through the Alertmanager. The method and the device can monitor the operation index of each node of the multiple clusters and give an alarm to abnormal conditions in time.

Description

Container monitoring alarm method, system, equipment and storage medium based on multiple clusters
Technical Field
The present invention relates to a cluster technology, and in particular, to a container monitoring and warning method, system, device, and storage medium based on multiple clusters.
Background
With the popularization of container technology, more and more enterprises develop applications through a micro-service framework, deliver codes in a mirror image mode, deploy operation services in a container mode, and switch operation and maintenance monitoring from a traditional virtual machine to monitoring of containers. Currently, the mainstream container monitoring scheme adopts the modes of exporters (collection) + Prometheus (pulling and storing) + Grafana (display graph) + alert (threshold alarm).
By adopting the modes of exporters (collection), Prometheus (pulling and storing), Grafana (display chart) and Alertmanager (threshold alarm), the technical requirements of operation and maintenance personnel are high, the configuration is complicated, the technical details of Prometheus, PromQL query statements and the like need to be known, and the meanings of various running states and indexes of Kubernetes (K8 s for short) various resources need to be known. In addition, excessive storage space is wasted without simplified indexes, and monitoring and alarming in a multi-cluster environment need to maintain multiple sets of configuration. The excessive configuration greatly increases the learning and using cost of operation and maintenance personnel, and is especially useless for developers who want to customize application threshold value alarms.
Disclosure of Invention
The present invention is directed to a container monitoring and alarming method, system, device and storage medium based on multiple clusters, so as to solve the problems set forth in the foregoing technical background.
In order to achieve the purpose, the invention adopts the following technical scheme:
the first aspect of the present application provides a container monitoring and alarming method based on multiple clusters, including:
the method comprises the steps of maintaining a Prometous configuration file promemeus.yml through a monitoring module, configuring capture rules of indexes of all set resources in promemeus.yml, and deploying monitoring components of at least one cluster to be monitored, wherein the monitoring components capture instantaneous index data of running of each resource in the cluster periodically according to preset capture rules;
the method comprises the steps of maintaining a Prometous configuration file promemeus.yml through an alarm module, configuring alarm rules of all set resources in promemeus.yml, and configuring alarm information through an alarm management component Alertmanager to send the alarm information to a message notification module;
configuring account passwords of a message sending channel through a message notification module, and managing different alarm information to be sent to corresponding subscription terminals through adding a theme and the subscription terminals of the theme;
when the alarm rule is triggered by the instantaneous index data of any resource operation captured by the monitoring module, the alarm information is sent to the message notification module through the Alertmanager, and the message notification module sends the alarm information to the corresponding subscription terminal.
Preferably, the cluster is an 8ks cluster.
Preferably, the resource comprises one or more of a cluster, a host, a namespace, an application, a container.
Preferably, the index includes one or more of a CPU, a memory, a storage disk, and a network.
Preferably, the fetch rules include one or more of fetch address, fetch cycle, index re-labeling.
Preferably, deploying, by the monitoring module, the monitoring component of at least one cluster to be monitored includes: deploying an index capture storage component Prometheus and an alarm management component Alertmanager on a first cluster, deploying a host index collector node-explorer and a container index collector cAdviror respectively on each node of each cluster to be monitored, deploying a cluster state index collector club-state-metrics respectively on each cluster to be monitored, and,
and deploying a middleware collector corresponding to the specified middleware on each cluster to be monitored, wherein each middleware corresponds to an independent middleware collector.
More preferably, the instant index data running on each node (node) is collected by the host index collector node-expander and the container index collector cAdvisor into the index capture storage component Prometheus, matching the alarm rule pre-configured in the yml configuration file Prometheus. yml of Prometheus, and if the alarm rule is triggered, the alarm management component alert manager sends the alarm information to the message notification module.
More preferably, in the yml configuration file prometheus.yml of Prometheus, the fetch address of the fetch pointer includes:
index access addresses of host index collector node-expoerter deployed by each node of each cluster;
index access addresses of container index collectors cAdvisors deployed by each node of each cluster;
index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster; and the number of the first and second groups,
the pointer access address of each middleware collector deployed on each cluster.
More preferably, when at least one second cluster needs to join in monitoring, the first cluster records the grabbing address and the access token of the grabbing index of the second cluster, the grabbing address and the access token of the grabbing index of the second cluster are added to the cluster deployment file yaml, and after configuration is completed, a reloading configuration interface of Prometheus is called to enable configuration to take effect; wherein the first cluster and the second cluster are different clusters.
Preferably, the grab rule comprises: and taking cluster/host/namespace/application/container instances as resource latitude, only pulling and storing the indexes such as CPU/memory/network/storage disk and the like which are most concerned by the user, and filtering a large amount of indexes which are useless to the user.
Preferably, the method further comprises:
generating a first alarm strategy according to a strategy instruction input by a user;
updating yml configuration file Prometheus. yml of Prometheus according to the first alarm policy, wherein the updated Prometheus. yml comprises the first alarm policy; and calling a reloading configuration interface of Prometheus to enable the configuration to be effective.
Preferably, after the alarm rule is triggered by the instantaneous index data of any captured resource operation, the method further includes: and the user views the alarm information through the UI visualization module.
Preferably, the message sending channel configured by the message notification module comprises one or more of a mailbox, a short message, an enterprise WeChat, a voice telephone notification and a QQ notification.
Preferably, the method further comprises: presetting a theme subscribed by a user, wherein the theme comprises alarm information interested by the user; and when the captured instantaneous index data of any resource operation triggers an alarm rule, sending alarm information associated with the theme through a configured message sending channel.
Preferably, the alarm information includes: cluster dimension alarm items, node dimension alarm items and container group dimension alarm items.
More preferably, the cluster dimension alarm item includes at least one of: the utilization rate of the CPU exceeds 80%, the utilization rate of the memory exceeds 80%, the local storage of all nodes of the cluster exceeds 80%, the resource utilization of a namespace exceeds 80%, and the state of a cluster container group (pod) is abnormal.
More preferably, the node dimension alarm item includes at least one of: the utilization rate of the CPU of the node (node) exceeds 80%, the memory utilization rate of the node (node) exceeds 80%, and the local storage utilization condition of the node (node) exceeds 80%.
More preferably, the container group dimension alarm item includes at least one of: the CPU utilization rate of the container group (pod) exceeds 80%, and the memory utilization rate of the container group (pod) exceeds 80%.
A second aspect of the present application provides a container monitoring and warning system based on multiple clusters, including: monitoring module, alarm module and message notice module, wherein:
the monitoring module comprises:
the index capture rule maintenance unit is used for configuring capture rules of indexes of all set resources in an yml configuration file prometheus.yml of Prometheus;
the monitoring component deployment unit is used for deploying the monitoring components of at least one cluster to be monitored through a cluster deployment file yaml, and the monitoring components are used for periodically capturing instantaneous index data of running of each resource in the cluster according to a preset capturing rule;
the alarm module comprises:
the system comprises an alarm rule maintenance unit, a resource setting unit and a resource setting unit, wherein the alarm rule maintenance unit is used for configuring alarm rules of all set resources in an yml configuration file prometheus.yml of Prometheus;
the receiving unit is used for receiving the alarm information sent by the monitoring module and pushing the alarm information to an alarm management component alert manager when the monitoring module determines that the instantaneous index data captured on the cluster to be monitored triggers an alarm rule;
the sending unit is used for sending the alarm information in the alarm management component alert manager to the message notification module;
and the message notification module is used for sending the alarm information to the corresponding subscription terminal according to the preset account password of the message sending channel, the preset topic and the preset subscription terminal of the topic.
Preferably, the alarm module further comprises: and the alarm rule updating unit is used for recording a policy instruction input by a user, generating a first alarm policy, and updating an yml configuration file prometheus.yml of Prometheus according to the first alarm policy, wherein the updated prometheus.yml comprises the first alarm policy.
Preferably, the multi-cluster-based container monitoring and warning system further includes: and the UI visualization module is used for inquiring and/or displaying the alarm information sent by the alarm module and/or the instantaneous index data monitored by the monitoring module.
More preferably, the UI visualization module may be displayed through dashboard chart information.
Preferably, the message sending channel configured by the message notification module comprises one or more of a mailbox, a short message, an enterprise WeChat, a voice telephone notification and a QQ notification.
Preferably, the cluster is an 8ks cluster.
Preferably, the monitoring assembly comprises:
the index grabbing storage component Prometous is used for being deployed in the first cluster;
the alarm management component Alertmanager is used for being deployed in the first cluster;
the system comprises a host index collector node-explorer and a container index collector cAdvisor, wherein the host index collector node-explorer and the container index collector cAdvisor are used for being deployed at each node (node) of each cluster to be monitored;
the cluster state index collector kube-state-metrics is used for being deployed in each cluster to be monitored; and the number of the first and second groups,
and the middleware collector is used for being deployed in each cluster to be monitored, and each middleware collector corresponds to an independent middleware.
More preferably, in the yml configuration file prometheus.yml of Prometheus, the fetch address of the fetch pointer includes:
index access addresses of host index collector node-expoerter deployed by each node of each cluster;
index access addresses of container index collectors cAdvisors deployed by each node of each cluster;
index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster; and the number of the first and second groups,
the pointer access address of each middleware collector deployed on each cluster.
Preferably, the grab rule comprises: and taking cluster/host/namespace/application/container instances as resource latitude, only pulling and storing the indexes such as CPU/memory/network/storage disk and the like which are most concerned by the user, and filtering a large amount of indexes which are useless to the user.
Preferably, the alarm information includes: cluster dimension alarm items, node dimension alarm items and container group dimension alarm items.
More preferably, the cluster dimension alarm item includes at least one of: the utilization rate of the CPU exceeds 80%, the utilization rate of the memory exceeds 80%, the local storage of all nodes of the cluster exceeds 80%, the resource utilization of a namespace exceeds 80%, and the state of a cluster container group (pod) is abnormal.
More preferably, the node dimension alarm item includes at least one of: the utilization rate of the CPU of the node (node) exceeds 80%, the memory utilization rate of the node (node) exceeds 80%, and the local storage utilization condition of the node (node) exceeds 80%.
More preferably, the container group dimension alarm item includes at least one of: the CPU utilization rate of the container group (pod) exceeds 80%, and the memory utilization rate of the container group (pod) exceeds 80%.
The third aspect of the present application provides a container monitoring and warning device based on multiple clusters, including:
a memory having a computer program stored therein;
a processor for executing all computer programs in said memory for implementing the steps of said multi-cluster based container monitoring alarm method of the first aspect disclosed herein.
A fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the multi-cluster based container monitoring alarm method of the first aspect disclosed herein.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
the application discloses a container monitoring and alarming method, a system, equipment and a storage medium based on multiple clusters, wherein the operation indexes of each node and container of the multiple clusters can be monitored through a monitoring module and an alarming module, and abnormal conditions are alarmed in time, so that reasonable adjustment and distribution of system resources are facilitated, and the overall performance of the clusters is improved;
the container monitoring and alarming system based on the Kubernetes cluster can be automatically deployed without complex configuration;
the method simplifies and optimizes a large amount of resource monitoring indexes based on Kubernetes;
the method and the device can customize the alarm rule and the push of the alarm information, so that operation and maintenance and developers can smoothly realize monitoring and alarm of the concerned application service on the premise of completely not knowing Prometheus and Kubernetes technologies.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a block diagram of a multi-cluster based container monitoring and warning system according to a preferred embodiment of the present invention;
FIG. 2 is a flow chart of cluster deployment in a preferred embodiment of the present invention;
FIG. 3 is a diagram of cluster deployment results in accordance with a preferred embodiment of the present invention;
FIG. 4 is a flow chart of a multi-cluster based container monitoring alarm method according to a preferred embodiment of the present invention;
FIG. 5 is a flowchart of a user creating alert rules in accordance with a preferred embodiment of the present invention;
FIG. 6 is a functional block diagram of a multi-cluster based container monitoring alarm system in accordance with a preferred embodiment of the present invention;
fig. 7 is a schematic structural diagram of a container monitoring and warning device based on multiple clusters according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order, it being understood that the data so used may be interchanged under appropriate circumstances. Furthermore, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
A Kubernetes cluster (hereinafter referred to as a cluster) is composed of a plurality of host nodes. All the applications are managed by the cluster in a container form and distributed and deployed on the nodes through the cluster container orchestration function. The container monitoring and warning system can be deployed on a main cluster and supports monitoring of a plurality of clusters.
Fig. 1 is a block diagram of a container monitoring and warning system based on multiple clusters according to a preferred embodiment of the present invention. As shown in fig. 1, a multi-cluster-based container monitoring and alarming system includes: monitoring module 1, warning module 2, message notification module 3 and UI visualization module 4, wherein:
the monitoring module 1 includes:
the index capture rule maintenance unit is used for configuring capture rules of indexes of all set resources in an yml configuration file prometheus.yml of Prometheus;
the monitoring component deployment unit is used for deploying the monitoring components of at least one cluster to be monitored through a cluster deployment file yaml, and the monitoring components are used for periodically capturing instantaneous index data of running of each resource in the cluster according to a preset capturing rule;
the alarm module 2 includes:
the system comprises an alarm rule maintenance unit, a resource setting unit and a resource setting unit, wherein the alarm rule maintenance unit is used for configuring alarm rules of all set resources in an yml configuration file prometheus.yml of Prometheus;
the monitoring module 1 is used for capturing the instantaneous index data of the cluster to be monitored and sending the instantaneous index data to the receiving unit, and the receiving unit is used for receiving the alarm information sent by the monitoring module 1 and pushing the alarm information to the alarm management component alert manager when the monitoring module 1 determines that the instantaneous index data captured on the cluster to be monitored triggers the alarm rule;
the sending unit is used for sending the alarm information in the alarm management component alert manager to the message notification module 3;
the message notification module 3 is used for sending the alarm information to the corresponding subscription terminal according to the preset account password of the message sending channel, the theme and the subscription terminal of the theme;
and the UI visualization module 4 is used for inquiring and/or displaying the alarm information sent by the alarm module 2 and/or the instantaneous index data monitored by the monitoring module 1.
The monitoring component in the above includes:
1) the index grabbing storage component Prometous is used for being deployed in the main cluster;
2) the alarm management component Alertmanager is used for being deployed in the main cluster;
3) a core collector:
a host index collector node-explorer for being deployed at each node (node) of each cluster to be monitored;
a container index collector cAdvisor for being deployed at each node (node) of each cluster to be monitored;
the cluster state index collector kube-state-metrics is used for being deployed in each cluster to be monitored;
4) and (4) other collectors:
various middleware collectors corresponding to the middleware can be customized, such as collectors of MySQL, MongoDB, Redis and the like, and only the cluster deployment file yaml needs to be provided under the path specified by the monitoring module, wherein each middleware instance deploys an independent middleware collector, for example, if the cluster has three MySQL, three middleware collectors need to be deployed, and each middleware collector is responsible for one MySQL.
When a plurality of clusters need to be added into monitoring, the main cluster needs to add information such as access addresses and access tokens of other clusters so as to normally access each cluster and deploy monitoring components.
Fig. 2 is a flow chart of cluster deployment in the present application, and a deployment result chart is shown with reference to fig. 3.
As shown in fig. 2, the deployment process of the cluster is:
step S01: judging whether the basic component (namely the monitoring component) is deployed, if so, executing the step S11, otherwise, executing the step S02;
step S02: generating a main cluster deployment file yaml;
step S11: judging whether to deploy a new cluster at the same time, if so, executing the step S12, otherwise, executing the step S21;
step S12: inputting an access address (grab address) and an access token of the new cluster, and executing step S13;
step S13: judging whether the networks are connected, if so, executing the step S14, otherwise, executing the step S12;
step S14: judging whether a collector of a new cluster is deployed, if so, executing the step S15, otherwise, executing the step S21;
step S15: generating a new cluster deployment file yaml;
step S21: judging whether a new deployment file is generated, if so, executing the step S31, otherwise, ending the deployment process;
step 31: and starting to run the deployment file and ending the deployment process.
In the above, the access addresses of the grab indicators of all resources recorded by prometheus.yml specifically include:
1) index access addresses of host index collector node-expoerter deployed by each node of each cluster;
2) index access addresses of container index collectors cAdvisors deployed by each node of each cluster;
3) index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster;
4) the pointer access address of each middleware collector deployed on each cluster.
The capture rules in the above are: the indexes of various collectors are filtered and recalculated, and only the indexes of CPU/memory/network/disk and the like which are most concerned by a storage user are pulled by taking a cluster/host/namespace/application/container example as a resource latitude, so that a large number of indexes which are useless to the user are eliminated, the storage pressure is reduced, and the query performance of the user is greatly improved.
In the above, when a new cluster is added, after the main cluster records the access address and the access token of the new cluster, the monitoring module adds the index access address and the access token for accessing the new cluster collector in the configuration file, and after the configuration is completed, calls the reloading configuration interface of Prometheus to enable the configuration to take effect.
Fig. 4 is a flowchart of a container monitoring alarm method based on multiple clusters according to a preferred embodiment of the present invention. As shown in fig. 4, a container monitoring and alarming method based on multiple clusters includes:
step 01: the access address (fetch address) of the fetch target for deploying all resources and the alarm rule of all resources are installed through the yml configuration file of Prometheus.
Wherein the access address includes: recording the index access address of a host index collector node-expoerter deployed at each node of each cluster; recording the index access address of a container index collector cAdvisor deployed at each node of each cluster; recording index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster; and recording the index access address of each middleware collector deployed on each cluster.
Step 02: deploying the monitoring component of at least one cluster to be monitored through the cluster deployment file yaml, wherein the monitoring component periodically captures instantaneous index data of each resource operation in the cluster according to a preset capture rule.
Deploying, by a monitoring module, a monitoring component of at least one cluster to be monitored, comprising: the method comprises the steps of deploying an index capture storage component Prometheus and an alarm management component Alertmanager on a first cluster, deploying a host index collector node-inserter and a container index collector cAdviror on each node (node) of each cluster to be monitored respectively, deploying a cluster state index collector club-state-metrics on each cluster to be monitored respectively, deploying a middleware collector corresponding to a specified middleware on each cluster to be monitored, and enabling each middleware to correspond to an independent middleware collector.
Step 03: when the instantaneous index data of any resource operation captured by the monitoring module triggers an alarm rule, the alarm information is sent to the message notification module through the Alertmanager.
The method comprises the steps that instantaneous index data running on each node (node) are collected by a host index collector node-expander and a container index collector cAdviror, an input index grabbing storage component Prometheus is collected and matched with an alarm rule configured in an yml configuration file of the Prometheus in advance, and if the alarm rule is triggered, an alarm management component Alertmanager configures alarm information and sends the alarm information to a message notification module.
Step 04: and the message notification module sends the alarm information to the corresponding subscription terminal.
And the message notification module is configured with an account password of a message sending channel, and manages different alarm information to be sent to the corresponding subscription terminal by adding a theme and the subscription terminal of the theme. The message sending channel configured by the message notification module can be a mailbox, a short message, an enterprise WeChat, a voice telephone notification, a QQ notification and the like. The message notification module presets a topic subscribed by the user, wherein the topic comprises the warning information interested by the user. And when the captured instantaneous index data of any resource operation triggers an alarm rule, the message notification module sends alarm information associated with the theme to the subscription terminal through the configured message sending channel.
In a specific application scenario, the writing threshold of the configuration file is high, and taking the yaml file as an example, a user needs to know information such as attributes (such as names, deployment units and the like) of each container on a cluster to be monitored and meanings of various data indexes very much, so that a correct yaml file can be written, the operation is complex, and the monitoring efficiency is reduced. Therefore, in the application, a user can create an alarm rule through a UI visualization module, generate a configuration page of the alarm rule, issue a policy instruction through the configuration page to generate a first alarm policy, update the yml configuration file of Prometheus according to the first alarm policy, where the updated yml configuration file of Prometheus includes the first alarm policy, and then activate the alarm rule by using a Prometheus reload configuration file mechanism.
For example, a user may add an alarm rule through the UI visualization module, monitor all container instances (resources) under all clusters, and alarm a subscribing terminal subscribing to a specified topic when the memory usage rate (index) is greater than (condition) 80% (threshold). The alarm module records the alarm rule created by the user, modifies the Prometheus configuration file, and activates the alarm rule by using a Prometheus reloading configuration file mechanism.
In addition, after the alarm rule is triggered by the instantaneous index data of any resource operation, the user can also check alarm information through the UI visualization module.
Specifically, a flow chart of creating the alarm rule is shown in fig. 5.
In the foregoing, the alarm information includes: cluster dimension alarm items, node dimension alarm items and container group dimension alarm items.
Wherein the cluster dimension alarm item may include: the utilization rate of the CPU exceeds 80%, the utilization rate of the memory exceeds 80%, the local storage of all nodes of the cluster exceeds 80%, the resource utilization of a namespace exceeds 80%, and the state of a cluster container group (pod) is abnormal.
Wherein the node dimension alarm item may include: the utilization rate of the CPU of the node (node) exceeds 80%, the memory utilization rate of the node (node) exceeds 80%, and the local storage utilization condition of the node (node) exceeds 80%.
Wherein the container group dimension alarm item may include: the CPU utilization rate of the container group (pod) exceeds 80%, and the memory utilization rate of the container group (pod) exceeds 80%.
Referring to fig. 6, the operation principle of the container monitoring and warning system of the present application is as follows:
1) and the monitoring module maintains the index access address and the index capture rule of each cluster collector in prometheus.yml, and deploys collectors for the new cluster through the UI visualization module.
2) And the alarm module maintains an alarm rule formula in prometheus.yml, and adds and modifies the alarm rule through the UI visualization module.
3) And Prometheus loading configuration, periodically grabbing the instantaneous indexes of each collector according to the index access address and the index grabbing rule, wherein the collectors do not store data but allow the Prometheus to grab the instantaneous indexes.
4) And the Prometheus periodically calculates whether the alarm rule expression reaches the requirement index threshold value according to the alarm rule.
5) Prometheus pushes alerts to alert manager when the alert rule expression satisfies a condition, such as memory usage of a certain container instance is greater than 80%.
6) Summarizing and alarming and pushing: and after the alarm is collected into the alert manager, sending the alarm information to the message notification module according to the configuration file of the alert manager.
7) The message notification module is pre-configured with account passwords of message sending channels (short messages, mailboxes, enterprise WeChats and the like), and reasonably manages different alarms to be sent to different subscription terminals by adding themes and terminals (mobile phone numbers, mailbox addresses and the like) subscribed by the themes. Once the alarm rule is triggered, the user can receive a notification through a preset sending channel, a preset theme and a preset subscription terminal.
The present application further provides a multi-cluster-based container monitoring and alarming device, which may specifically be a client deployed with a kubernets platform, as shown in fig. 7, the container monitoring and alarming device includes a memory 31 and a processor 32, where the memory 31 stores a computer program, and the processor 32 is configured to execute all the computer programs in the memory 31, so as to implement the steps of the multi-cluster container monitoring and alarming method described above.
The present application also provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for multi-cluster container monitoring alarm as described above.
In summary, the present application discloses a container monitoring and alarming method, system, device and storage medium based on multiple clusters, which can monitor the operation index of each node of the multiple clusters through a monitoring module and an alarming module, and alarm the abnormal condition in time, thereby facilitating reasonable adjustment and allocation of system resources and improving the overall performance of the clusters; the container monitoring and alarming system based on the Kubernetes cluster can be automatically deployed without complex configuration; the method simplifies and optimizes a large amount of resource monitoring indexes based on Kubernetes; the method and the device can customize the alarm rule and the push of the alarm information, so that operation and maintenance and developers can smoothly realize monitoring and alarm of the concerned application service on the premise of completely not knowing Prometheus and Kubernetes technologies.
The embodiments of the present invention have been described in detail, but the embodiments are merely examples, and the present invention is not limited to the embodiments described above. Any equivalent modifications and substitutions to those skilled in the art are also within the scope of the present invention. Accordingly, equivalent changes and modifications made without departing from the spirit and scope of the present invention should be covered by the present invention.

Claims (10)

1. The container monitoring and alarming method based on the multiple clusters is characterized by comprising the following steps:
the method comprises the steps of maintaining a Prometous configuration file promemeus.yml through a monitoring module, configuring capture rules of indexes of all set resources in promemeus.yml, and deploying monitoring components of at least one cluster to be monitored, wherein the monitoring components capture instantaneous index data of running of each resource in the cluster periodically according to preset capture rules;
the method comprises the steps of maintaining a Prometous configuration file promemeus.yml through an alarm module, configuring alarm rules of all set resources in promemeus.yml, and configuring alarm information through an alarm management component Alertmanager to send the alarm information to a message notification module;
configuring account passwords of a message sending channel through a message notification module, and managing different alarm information to be sent to corresponding subscription terminals through adding a theme and the subscription terminals of the theme;
when the alarm rule is triggered by the instantaneous index data of any resource operation captured by the monitoring module, the alarm information is sent to the message notification module through the Alertmanager, and the message notification module sends the alarm information to the corresponding subscription terminal.
2. The multi-cluster-based container monitoring alarm method according to claim 1, wherein deploying the monitoring component of at least one cluster to be monitored by the monitoring module comprises: deploying an index capture storage component Prometheus and an alarm management component Alertmanager on a first cluster, deploying a host index collector node-inserter and a container index collector cAdviror on each node of each cluster to be monitored respectively, deploying a cluster state index collector club-state-metrics on each cluster to be monitored respectively, and,
and deploying a middleware collector corresponding to the specified middleware on each cluster to be monitored, wherein each middleware corresponds to an independent middleware collector.
3. The multi-cluster-based container monitoring alarm method according to claim 2, wherein the host index collector node-expander and the container index collector cAdvisor collect incoming index capture storage component Prometheus, match alarm rules preconfigured in yml configuration file Prometheus. yml of Prometheus, and if an alarm rule is triggered, the alarm management component alert manager sends alarm information to the message notification module.
4. The multi-cluster-based container monitoring alarm method according to claim 2, wherein in Prometheus' yml configuration file prometheus.yml, the grab address of the grab pointer comprises:
index access addresses of host index collector node-expoerter deployed by each node of each cluster;
index access addresses of container index collectors cAdvisors deployed by each node of each cluster;
index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster; and the number of the first and second groups,
the pointer access address of each middleware collector deployed on each cluster.
5. The multi-cluster-based container monitoring alarm method according to claim 2, wherein when at least one second cluster needs to join in monitoring, the first cluster records the grab address and the access token of the grab index of the second cluster, the grab address and the access token of the grab index of the second cluster are added to the cluster deployment file yaml, and after configuration is completed, a reloading configuration interface of Prometheus is called to enable configuration to take effect; wherein the first cluster and the second cluster are different clusters.
6. The multi-cluster-based container monitoring alarm method according to claim 1, further comprising:
generating a first alarm strategy according to a strategy instruction input by a user;
updating yml configuration file Prometheus. yml of Prometheus according to the first alarm policy, wherein the updated Prometheus. yml comprises the first alarm policy; and calling a reloading configuration interface of Prometheus to enable the configuration to be effective.
7. The multi-cluster-based container monitoring alarm method according to claim 1, further comprising: presetting a theme subscribed by a user, wherein the theme comprises alarm information interested by the user; and when the captured instantaneous index data of any resource operation triggers an alarm rule, sending alarm information associated with the theme through a configured message sending channel.
8. A multi-cluster based container monitoring alarm system, comprising: monitoring module, alarm module and message notice module, wherein:
the monitoring module comprises:
the index capture rule maintenance unit is used for configuring capture rules of indexes of all set resources in an yml configuration file prometheus.yml of Prometheus;
the monitoring component deployment unit is used for deploying the monitoring components of at least one cluster to be monitored through a cluster deployment file yaml, and the monitoring components are used for periodically capturing instantaneous index data of running of each resource in the cluster according to a preset capturing rule;
the alarm module comprises:
the system comprises an alarm rule maintenance unit, a resource setting unit and a resource setting unit, wherein the alarm rule maintenance unit is used for configuring alarm rules of all set resources in an yml configuration file prometheus.yml of Prometheus;
the receiving unit is used for receiving the alarm information sent by the monitoring module and pushing the alarm information to an alarm management component alert manager when the monitoring module determines that the instantaneous index data captured on the cluster to be monitored triggers an alarm rule;
the sending unit is used for sending the alarm information in the alarm management component alert manager to the message notification module;
and the message notification module is used for sending the alarm information to the corresponding subscription terminal according to the preset account password of the message sending channel, the preset topic and the preset subscription terminal of the topic.
9. A multi-cluster based container monitoring and warning device, comprising:
a memory having a computer program stored therein;
a processor for executing all computer programs in said memory for implementing the steps of the multi-cluster based container monitoring alarm method according to any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the multi-cluster based container monitoring alarm method according to any of the claims 1 to 7.
CN202011251413.9A 2020-11-09 2020-11-09 Container monitoring alarm method, system, equipment and storage medium based on multiple clusters Active CN112511339B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011251413.9A CN112511339B (en) 2020-11-09 2020-11-09 Container monitoring alarm method, system, equipment and storage medium based on multiple clusters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011251413.9A CN112511339B (en) 2020-11-09 2020-11-09 Container monitoring alarm method, system, equipment and storage medium based on multiple clusters

Publications (2)

Publication Number Publication Date
CN112511339A true CN112511339A (en) 2021-03-16
CN112511339B CN112511339B (en) 2023-04-07

Family

ID=74957795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011251413.9A Active CN112511339B (en) 2020-11-09 2020-11-09 Container monitoring alarm method, system, equipment and storage medium based on multiple clusters

Country Status (1)

Country Link
CN (1) CN112511339B (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112671602A (en) * 2020-12-14 2021-04-16 北京金山云网络技术有限公司 Data processing method, device, system, equipment and storage medium of edge node
CN112925649A (en) * 2021-03-31 2021-06-08 中国人民解放军国防科技大学 Unified monitoring method for virtual network functions
CN113242150A (en) * 2021-06-03 2021-08-10 上海天旦网络科技发展有限公司 Calico network plug-in-based data packet capturing method and system in K8s
CN113377617A (en) * 2021-06-11 2021-09-10 重庆农村商业银行股份有限公司 Monitoring system
CN113377626A (en) * 2021-08-11 2021-09-10 上海领健信息技术有限公司 Visual unified alarm method, device, equipment and medium based on service tree
CN113419818A (en) * 2021-06-23 2021-09-21 北京达佳互联信息技术有限公司 Basic component deployment method, device, server and storage medium
CN113542068A (en) * 2021-07-15 2021-10-22 中国银行股份有限公司 Redis multi-instance monitoring system and method
CN113704065A (en) * 2021-08-31 2021-11-26 平安普惠企业管理有限公司 Monitoring method, device, equipment and computer storage medium
CN113778614A (en) * 2021-08-03 2021-12-10 科大国创云网科技有限公司 Cluster abnormity monitoring and warning system and method facing enterprise service bus
CN113791954A (en) * 2021-09-17 2021-12-14 上海道客网络科技有限公司 Container bare metal server and method and system for coping with physical environment risks thereof
CN113821416A (en) * 2021-09-18 2021-12-21 中国电信股份有限公司 Monitoring alarm method, device, storage medium and electronic equipment
CN114189423A (en) * 2021-12-08 2022-03-15 兴业银行股份有限公司 Intelligent inquiry alarm system, method and medium with comprehensive compatibility and expansion
CN114253807A (en) * 2021-12-20 2022-03-29 深圳前海微众银行股份有限公司 Alarm information notification method and device
CN114328107A (en) * 2021-12-28 2022-04-12 北京易华录信息技术股份有限公司 Monitoring method and system for optomagnetic fusion storage server cluster and electronic equipment
CN114860510A (en) * 2022-07-08 2022-08-05 飞狐信息技术(天津)有限公司 Data monitoring method and system of micro-service system
CN114884838A (en) * 2022-05-20 2022-08-09 远景智能国际私人投资有限公司 Monitoring method of Kubernetes component and server
CN114926288A (en) * 2022-06-06 2022-08-19 中信建投证券股份有限公司 Intelligent strategy monitoring cloud platform and intelligent strategy monitoring method and device
CN114944980A (en) * 2022-07-26 2022-08-26 上海有孚智数云创数字科技有限公司 System method, apparatus, medium, and program product for monitoring alarms
CN115022196A (en) * 2022-06-14 2022-09-06 启明信息技术股份有限公司 Method and system for predicting software operation problems and giving alarm
CN115080366A (en) * 2022-08-22 2022-09-20 深圳依时货拉拉科技有限公司 Alarm method, alarm device, computer equipment and storage medium
CN115150292A (en) * 2022-05-17 2022-10-04 深圳萨摩耶数字科技有限公司 Monitoring method and device for k8s cluster, electronic equipment and storage medium
CN115473783A (en) * 2022-08-04 2022-12-13 浪潮软件集团有限公司 Prometheus-based index alarm management system and method
CN115801539A (en) * 2022-11-16 2023-03-14 浪潮云信息技术股份公司 Tenant-side container monitoring, collecting and alarming method and system under container cloud scene
CN115801541A (en) * 2022-11-18 2023-03-14 湖南长银五八消费金融股份有限公司 Slow access warning method and device in full-link tracking platform and computer equipment
CN115827393A (en) * 2023-02-21 2023-03-21 德特赛维技术有限公司 Server cluster monitoring and warning system
CN115996180A (en) * 2022-12-01 2023-04-21 深圳前海环融联易信息科技服务有限公司 Monitoring alarm system, method, equipment and computer storage medium
CN116232965A (en) * 2022-12-23 2023-06-06 中国联合网络通信集团有限公司 Cluster host monitoring system, method and storage medium
CN116346904A (en) * 2023-05-19 2023-06-27 北京奇虎科技有限公司 Information pushing method, device, equipment and storage medium
WO2024002190A1 (en) * 2022-06-30 2024-01-04 中兴通讯股份有限公司 Monitor-based container adjustment method and device, and storage medium
CN118095494A (en) * 2024-03-28 2024-05-28 暗物智能科技(广州)有限公司 Model training method, device, computer equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110417901A (en) * 2019-07-31 2019-11-05 北京金山云网络技术有限公司 Data processing method, device and gateway server
CN110780918A (en) * 2019-10-28 2020-02-11 江苏满运软件科技有限公司 Middleware container processing method and device, electronic equipment and storage medium
CN110941531A (en) * 2019-11-15 2020-03-31 北京浪潮数据技术有限公司 Monitoring alarm method, device and equipment for monitoring alarm management platform
CN111045901A (en) * 2019-12-11 2020-04-21 东软集团股份有限公司 Container monitoring method and device, storage medium and electronic equipment
CN111459749A (en) * 2020-03-18 2020-07-28 平安科技(深圳)有限公司 Prometous-based private cloud monitoring method and device, computer equipment and storage medium
CN111459763A (en) * 2020-04-03 2020-07-28 中国建设银行股份有限公司 Cross-kubernets cluster monitoring system and method
US20200267212A1 (en) * 2019-02-15 2020-08-20 International Business Machines Corporation Method for managing and allocating resources in a clustered computing environment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200267212A1 (en) * 2019-02-15 2020-08-20 International Business Machines Corporation Method for managing and allocating resources in a clustered computing environment
CN110417901A (en) * 2019-07-31 2019-11-05 北京金山云网络技术有限公司 Data processing method, device and gateway server
CN110780918A (en) * 2019-10-28 2020-02-11 江苏满运软件科技有限公司 Middleware container processing method and device, electronic equipment and storage medium
CN110941531A (en) * 2019-11-15 2020-03-31 北京浪潮数据技术有限公司 Monitoring alarm method, device and equipment for monitoring alarm management platform
CN111045901A (en) * 2019-12-11 2020-04-21 东软集团股份有限公司 Container monitoring method and device, storage medium and electronic equipment
CN111459749A (en) * 2020-03-18 2020-07-28 平安科技(深圳)有限公司 Prometous-based private cloud monitoring method and device, computer equipment and storage medium
CN111459763A (en) * 2020-04-03 2020-07-28 中国建设银行股份有限公司 Cross-kubernets cluster monitoring system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田贞朗;: "Kubernetes基于Prometheus弹性伸缩POD的方法" *

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112671602A (en) * 2020-12-14 2021-04-16 北京金山云网络技术有限公司 Data processing method, device, system, equipment and storage medium of edge node
CN112925649A (en) * 2021-03-31 2021-06-08 中国人民解放军国防科技大学 Unified monitoring method for virtual network functions
CN112925649B (en) * 2021-03-31 2021-09-14 中国人民解放军国防科技大学 Unified monitoring method for virtual network functions
CN113242150A (en) * 2021-06-03 2021-08-10 上海天旦网络科技发展有限公司 Calico network plug-in-based data packet capturing method and system in K8s
CN113377617B (en) * 2021-06-11 2023-06-16 重庆农村商业银行股份有限公司 Monitoring system
CN113377617A (en) * 2021-06-11 2021-09-10 重庆农村商业银行股份有限公司 Monitoring system
CN113419818A (en) * 2021-06-23 2021-09-21 北京达佳互联信息技术有限公司 Basic component deployment method, device, server and storage medium
CN113542068A (en) * 2021-07-15 2021-10-22 中国银行股份有限公司 Redis multi-instance monitoring system and method
CN113778614A (en) * 2021-08-03 2021-12-10 科大国创云网科技有限公司 Cluster abnormity monitoring and warning system and method facing enterprise service bus
CN113377626A (en) * 2021-08-11 2021-09-10 上海领健信息技术有限公司 Visual unified alarm method, device, equipment and medium based on service tree
CN113704065A (en) * 2021-08-31 2021-11-26 平安普惠企业管理有限公司 Monitoring method, device, equipment and computer storage medium
CN113791954B (en) * 2021-09-17 2023-09-22 上海道客网络科技有限公司 Container bare metal server and method and system for coping physical environment risk of container bare metal server
CN113791954A (en) * 2021-09-17 2021-12-14 上海道客网络科技有限公司 Container bare metal server and method and system for coping with physical environment risks thereof
CN113821416A (en) * 2021-09-18 2021-12-21 中国电信股份有限公司 Monitoring alarm method, device, storage medium and electronic equipment
CN114189423A (en) * 2021-12-08 2022-03-15 兴业银行股份有限公司 Intelligent inquiry alarm system, method and medium with comprehensive compatibility and expansion
CN114189423B (en) * 2021-12-08 2024-08-06 兴业银行股份有限公司 Fully compatible extended intelligent inquiry and alarm system, method and medium
CN114253807A (en) * 2021-12-20 2022-03-29 深圳前海微众银行股份有限公司 Alarm information notification method and device
CN114328107A (en) * 2021-12-28 2022-04-12 北京易华录信息技术股份有限公司 Monitoring method and system for optomagnetic fusion storage server cluster and electronic equipment
CN115150292A (en) * 2022-05-17 2022-10-04 深圳萨摩耶数字科技有限公司 Monitoring method and device for k8s cluster, electronic equipment and storage medium
CN114884838B (en) * 2022-05-20 2023-05-12 远景智能国际私人投资有限公司 Monitoring method and server of Kubernetes component
CN114884838A (en) * 2022-05-20 2022-08-09 远景智能国际私人投资有限公司 Monitoring method of Kubernetes component and server
CN114926288A (en) * 2022-06-06 2022-08-19 中信建投证券股份有限公司 Intelligent strategy monitoring cloud platform and intelligent strategy monitoring method and device
CN115022196A (en) * 2022-06-14 2022-09-06 启明信息技术股份有限公司 Method and system for predicting software operation problems and giving alarm
WO2024002190A1 (en) * 2022-06-30 2024-01-04 中兴通讯股份有限公司 Monitor-based container adjustment method and device, and storage medium
CN114860510A (en) * 2022-07-08 2022-08-05 飞狐信息技术(天津)有限公司 Data monitoring method and system of micro-service system
CN114860510B (en) * 2022-07-08 2022-12-02 飞狐信息技术(天津)有限公司 Data monitoring method and system of micro-service system
CN114944980A (en) * 2022-07-26 2022-08-26 上海有孚智数云创数字科技有限公司 System method, apparatus, medium, and program product for monitoring alarms
CN114944980B (en) * 2022-07-26 2022-10-21 上海有孚智数云创数字科技有限公司 System method, apparatus, and medium for monitoring alarms
CN115473783A (en) * 2022-08-04 2022-12-13 浪潮软件集团有限公司 Prometheus-based index alarm management system and method
CN115080366B (en) * 2022-08-22 2022-11-15 深圳依时货拉拉科技有限公司 Alarm method, alarm device, computer equipment and storage medium
CN115080366A (en) * 2022-08-22 2022-09-20 深圳依时货拉拉科技有限公司 Alarm method, alarm device, computer equipment and storage medium
CN115801539A (en) * 2022-11-16 2023-03-14 浪潮云信息技术股份公司 Tenant-side container monitoring, collecting and alarming method and system under container cloud scene
CN115801541B (en) * 2022-11-18 2024-03-22 湖南长银五八消费金融股份有限公司 Method and device for alarming slow access in full-link tracking platform and computer equipment
CN115801541A (en) * 2022-11-18 2023-03-14 湖南长银五八消费金融股份有限公司 Slow access warning method and device in full-link tracking platform and computer equipment
CN115996180A (en) * 2022-12-01 2023-04-21 深圳前海环融联易信息科技服务有限公司 Monitoring alarm system, method, equipment and computer storage medium
CN115996180B (en) * 2022-12-01 2024-09-20 深圳前海环融联易信息科技服务有限公司 Monitoring alarm system, method, equipment and computer storage medium
CN116232965A (en) * 2022-12-23 2023-06-06 中国联合网络通信集团有限公司 Cluster host monitoring system, method and storage medium
CN115827393B (en) * 2023-02-21 2023-10-20 德特赛维技术有限公司 Server cluster monitoring and alarming system
CN115827393A (en) * 2023-02-21 2023-03-21 德特赛维技术有限公司 Server cluster monitoring and warning system
CN116346904A (en) * 2023-05-19 2023-06-27 北京奇虎科技有限公司 Information pushing method, device, equipment and storage medium
CN116346904B (en) * 2023-05-19 2023-09-22 北京奇虎科技有限公司 Information pushing method, device, equipment and storage medium
CN118095494A (en) * 2024-03-28 2024-05-28 暗物智能科技(广州)有限公司 Model training method, device, computer equipment and readable storage medium

Also Published As

Publication number Publication date
CN112511339B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN112511339B (en) Container monitoring alarm method, system, equipment and storage medium based on multiple clusters
CN109714192B (en) Monitoring method and system for monitoring cloud platform
WO2021017301A1 (en) Management method and apparatus based on kubernetes cluster, and computer-readable storage medium
US7152104B2 (en) Method and apparatus for notifying administrators of selected events in a distributed computer system
CN102652410B (en) Cloud computing supervision and management system
CN108234168A (en) A kind of method for exhibiting data and system based on service topology
EP3018577A1 (en) Interface call system and method
CN111190794A (en) Operation and maintenance monitoring and management system
CN112511580B (en) Message pushing method, device, storage medium and equipment
CN103501237A (en) Device management method, management platform, device and system
WO2019153532A1 (en) Deployment method and apparatus for monitoring system, and computer device and storage medium
CN113377626B (en) Visual unified alarm method, device, equipment and medium based on service tree
CN110138753B (en) Distributed message service system, method, apparatus, and computer-readable storage medium
CN102089749B (en) Method and apparatus for managing binding information about a bundle installed remotely in an osgi service platform
CN113626286A (en) Multi-cluster instance processing method and device, electronic equipment and storage medium
CN112162821A (en) Container cluster resource monitoring method, device and system
CN114518934A (en) Unified operation and maintenance platform architecture system
CN112055061A (en) Distributed message processing method and device
CN112698929B (en) Information acquisition method and device
CN115934464A (en) Information platform monitoring and collecting system
CN113037549A (en) Operation and maintenance environment warning method
US9922539B1 (en) System and method of telecommunication network infrastructure alarms queuing and multi-threading
CN113094053A (en) Product delivery method and device and computer storage medium
US8494931B2 (en) Management of actions based on priority levels and calendar entries
CN114168297A (en) Method, device, equipment and medium for scheduling collection tasks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant