CN112511339A - Container monitoring alarm method, system, equipment and storage medium based on multiple clusters - Google Patents
Container monitoring alarm method, system, equipment and storage medium based on multiple clusters Download PDFInfo
- Publication number
- CN112511339A CN112511339A CN202011251413.9A CN202011251413A CN112511339A CN 112511339 A CN112511339 A CN 112511339A CN 202011251413 A CN202011251413 A CN 202011251413A CN 112511339 A CN112511339 A CN 112511339A
- Authority
- CN
- China
- Prior art keywords
- cluster
- alarm
- monitoring
- index
- prometheus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/08—Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Environmental & Geological Engineering (AREA)
- Debugging And Monitoring (AREA)
- Alarm Systems (AREA)
Abstract
The application discloses a container monitoring and alarming method, a system, equipment and a storage medium based on multiple clusters, wherein the method comprises the following steps: configuring capturing rules of indexes of all set resources in prometheus.yml through a monitoring module, deploying monitoring components of at least one cluster to be monitored, and periodically capturing instantaneous index data of running of each resource in the cluster by the monitoring components according to the preset capturing rules; yml, configuring alarm rules of all set resources in prometheus.yml through an alarm module, configuring alarm information through an alarm management component, and sending the alarm information to a message notification module; when the instantaneous index data of any resource operation captured by the monitoring module triggers an alarm rule, the alarm information is sent to the message notification module through the Alertmanager. The method and the device can monitor the operation index of each node of the multiple clusters and give an alarm to abnormal conditions in time.
Description
Technical Field
The present invention relates to a cluster technology, and in particular, to a container monitoring and warning method, system, device, and storage medium based on multiple clusters.
Background
With the popularization of container technology, more and more enterprises develop applications through a micro-service framework, deliver codes in a mirror image mode, deploy operation services in a container mode, and switch operation and maintenance monitoring from a traditional virtual machine to monitoring of containers. Currently, the mainstream container monitoring scheme adopts the modes of exporters (collection) + Prometheus (pulling and storing) + Grafana (display graph) + alert (threshold alarm).
By adopting the modes of exporters (collection), Prometheus (pulling and storing), Grafana (display chart) and Alertmanager (threshold alarm), the technical requirements of operation and maintenance personnel are high, the configuration is complicated, the technical details of Prometheus, PromQL query statements and the like need to be known, and the meanings of various running states and indexes of Kubernetes (K8 s for short) various resources need to be known. In addition, excessive storage space is wasted without simplified indexes, and monitoring and alarming in a multi-cluster environment need to maintain multiple sets of configuration. The excessive configuration greatly increases the learning and using cost of operation and maintenance personnel, and is especially useless for developers who want to customize application threshold value alarms.
Disclosure of Invention
The present invention is directed to a container monitoring and alarming method, system, device and storage medium based on multiple clusters, so as to solve the problems set forth in the foregoing technical background.
In order to achieve the purpose, the invention adopts the following technical scheme:
the first aspect of the present application provides a container monitoring and alarming method based on multiple clusters, including:
the method comprises the steps of maintaining a Prometous configuration file promemeus.yml through a monitoring module, configuring capture rules of indexes of all set resources in promemeus.yml, and deploying monitoring components of at least one cluster to be monitored, wherein the monitoring components capture instantaneous index data of running of each resource in the cluster periodically according to preset capture rules;
the method comprises the steps of maintaining a Prometous configuration file promemeus.yml through an alarm module, configuring alarm rules of all set resources in promemeus.yml, and configuring alarm information through an alarm management component Alertmanager to send the alarm information to a message notification module;
configuring account passwords of a message sending channel through a message notification module, and managing different alarm information to be sent to corresponding subscription terminals through adding a theme and the subscription terminals of the theme;
when the alarm rule is triggered by the instantaneous index data of any resource operation captured by the monitoring module, the alarm information is sent to the message notification module through the Alertmanager, and the message notification module sends the alarm information to the corresponding subscription terminal.
Preferably, the cluster is an 8ks cluster.
Preferably, the resource comprises one or more of a cluster, a host, a namespace, an application, a container.
Preferably, the index includes one or more of a CPU, a memory, a storage disk, and a network.
Preferably, the fetch rules include one or more of fetch address, fetch cycle, index re-labeling.
Preferably, deploying, by the monitoring module, the monitoring component of at least one cluster to be monitored includes: deploying an index capture storage component Prometheus and an alarm management component Alertmanager on a first cluster, deploying a host index collector node-explorer and a container index collector cAdviror respectively on each node of each cluster to be monitored, deploying a cluster state index collector club-state-metrics respectively on each cluster to be monitored, and,
and deploying a middleware collector corresponding to the specified middleware on each cluster to be monitored, wherein each middleware corresponds to an independent middleware collector.
More preferably, the instant index data running on each node (node) is collected by the host index collector node-expander and the container index collector cAdvisor into the index capture storage component Prometheus, matching the alarm rule pre-configured in the yml configuration file Prometheus. yml of Prometheus, and if the alarm rule is triggered, the alarm management component alert manager sends the alarm information to the message notification module.
More preferably, in the yml configuration file prometheus.yml of Prometheus, the fetch address of the fetch pointer includes:
index access addresses of host index collector node-expoerter deployed by each node of each cluster;
index access addresses of container index collectors cAdvisors deployed by each node of each cluster;
index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster; and the number of the first and second groups,
the pointer access address of each middleware collector deployed on each cluster.
More preferably, when at least one second cluster needs to join in monitoring, the first cluster records the grabbing address and the access token of the grabbing index of the second cluster, the grabbing address and the access token of the grabbing index of the second cluster are added to the cluster deployment file yaml, and after configuration is completed, a reloading configuration interface of Prometheus is called to enable configuration to take effect; wherein the first cluster and the second cluster are different clusters.
Preferably, the grab rule comprises: and taking cluster/host/namespace/application/container instances as resource latitude, only pulling and storing the indexes such as CPU/memory/network/storage disk and the like which are most concerned by the user, and filtering a large amount of indexes which are useless to the user.
Preferably, the method further comprises:
generating a first alarm strategy according to a strategy instruction input by a user;
updating yml configuration file Prometheus. yml of Prometheus according to the first alarm policy, wherein the updated Prometheus. yml comprises the first alarm policy; and calling a reloading configuration interface of Prometheus to enable the configuration to be effective.
Preferably, after the alarm rule is triggered by the instantaneous index data of any captured resource operation, the method further includes: and the user views the alarm information through the UI visualization module.
Preferably, the message sending channel configured by the message notification module comprises one or more of a mailbox, a short message, an enterprise WeChat, a voice telephone notification and a QQ notification.
Preferably, the method further comprises: presetting a theme subscribed by a user, wherein the theme comprises alarm information interested by the user; and when the captured instantaneous index data of any resource operation triggers an alarm rule, sending alarm information associated with the theme through a configured message sending channel.
Preferably, the alarm information includes: cluster dimension alarm items, node dimension alarm items and container group dimension alarm items.
More preferably, the cluster dimension alarm item includes at least one of: the utilization rate of the CPU exceeds 80%, the utilization rate of the memory exceeds 80%, the local storage of all nodes of the cluster exceeds 80%, the resource utilization of a namespace exceeds 80%, and the state of a cluster container group (pod) is abnormal.
More preferably, the node dimension alarm item includes at least one of: the utilization rate of the CPU of the node (node) exceeds 80%, the memory utilization rate of the node (node) exceeds 80%, and the local storage utilization condition of the node (node) exceeds 80%.
More preferably, the container group dimension alarm item includes at least one of: the CPU utilization rate of the container group (pod) exceeds 80%, and the memory utilization rate of the container group (pod) exceeds 80%.
A second aspect of the present application provides a container monitoring and warning system based on multiple clusters, including: monitoring module, alarm module and message notice module, wherein:
the monitoring module comprises:
the index capture rule maintenance unit is used for configuring capture rules of indexes of all set resources in an yml configuration file prometheus.yml of Prometheus;
the monitoring component deployment unit is used for deploying the monitoring components of at least one cluster to be monitored through a cluster deployment file yaml, and the monitoring components are used for periodically capturing instantaneous index data of running of each resource in the cluster according to a preset capturing rule;
the alarm module comprises:
the system comprises an alarm rule maintenance unit, a resource setting unit and a resource setting unit, wherein the alarm rule maintenance unit is used for configuring alarm rules of all set resources in an yml configuration file prometheus.yml of Prometheus;
the receiving unit is used for receiving the alarm information sent by the monitoring module and pushing the alarm information to an alarm management component alert manager when the monitoring module determines that the instantaneous index data captured on the cluster to be monitored triggers an alarm rule;
the sending unit is used for sending the alarm information in the alarm management component alert manager to the message notification module;
and the message notification module is used for sending the alarm information to the corresponding subscription terminal according to the preset account password of the message sending channel, the preset topic and the preset subscription terminal of the topic.
Preferably, the alarm module further comprises: and the alarm rule updating unit is used for recording a policy instruction input by a user, generating a first alarm policy, and updating an yml configuration file prometheus.yml of Prometheus according to the first alarm policy, wherein the updated prometheus.yml comprises the first alarm policy.
Preferably, the multi-cluster-based container monitoring and warning system further includes: and the UI visualization module is used for inquiring and/or displaying the alarm information sent by the alarm module and/or the instantaneous index data monitored by the monitoring module.
More preferably, the UI visualization module may be displayed through dashboard chart information.
Preferably, the message sending channel configured by the message notification module comprises one or more of a mailbox, a short message, an enterprise WeChat, a voice telephone notification and a QQ notification.
Preferably, the cluster is an 8ks cluster.
Preferably, the monitoring assembly comprises:
the index grabbing storage component Prometous is used for being deployed in the first cluster;
the alarm management component Alertmanager is used for being deployed in the first cluster;
the system comprises a host index collector node-explorer and a container index collector cAdvisor, wherein the host index collector node-explorer and the container index collector cAdvisor are used for being deployed at each node (node) of each cluster to be monitored;
the cluster state index collector kube-state-metrics is used for being deployed in each cluster to be monitored; and the number of the first and second groups,
and the middleware collector is used for being deployed in each cluster to be monitored, and each middleware collector corresponds to an independent middleware.
More preferably, in the yml configuration file prometheus.yml of Prometheus, the fetch address of the fetch pointer includes:
index access addresses of host index collector node-expoerter deployed by each node of each cluster;
index access addresses of container index collectors cAdvisors deployed by each node of each cluster;
index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster; and the number of the first and second groups,
the pointer access address of each middleware collector deployed on each cluster.
Preferably, the grab rule comprises: and taking cluster/host/namespace/application/container instances as resource latitude, only pulling and storing the indexes such as CPU/memory/network/storage disk and the like which are most concerned by the user, and filtering a large amount of indexes which are useless to the user.
Preferably, the alarm information includes: cluster dimension alarm items, node dimension alarm items and container group dimension alarm items.
More preferably, the cluster dimension alarm item includes at least one of: the utilization rate of the CPU exceeds 80%, the utilization rate of the memory exceeds 80%, the local storage of all nodes of the cluster exceeds 80%, the resource utilization of a namespace exceeds 80%, and the state of a cluster container group (pod) is abnormal.
More preferably, the node dimension alarm item includes at least one of: the utilization rate of the CPU of the node (node) exceeds 80%, the memory utilization rate of the node (node) exceeds 80%, and the local storage utilization condition of the node (node) exceeds 80%.
More preferably, the container group dimension alarm item includes at least one of: the CPU utilization rate of the container group (pod) exceeds 80%, and the memory utilization rate of the container group (pod) exceeds 80%.
The third aspect of the present application provides a container monitoring and warning device based on multiple clusters, including:
a memory having a computer program stored therein;
a processor for executing all computer programs in said memory for implementing the steps of said multi-cluster based container monitoring alarm method of the first aspect disclosed herein.
A fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the multi-cluster based container monitoring alarm method of the first aspect disclosed herein.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
the application discloses a container monitoring and alarming method, a system, equipment and a storage medium based on multiple clusters, wherein the operation indexes of each node and container of the multiple clusters can be monitored through a monitoring module and an alarming module, and abnormal conditions are alarmed in time, so that reasonable adjustment and distribution of system resources are facilitated, and the overall performance of the clusters is improved;
the container monitoring and alarming system based on the Kubernetes cluster can be automatically deployed without complex configuration;
the method simplifies and optimizes a large amount of resource monitoring indexes based on Kubernetes;
the method and the device can customize the alarm rule and the push of the alarm information, so that operation and maintenance and developers can smoothly realize monitoring and alarm of the concerned application service on the premise of completely not knowing Prometheus and Kubernetes technologies.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a block diagram of a multi-cluster based container monitoring and warning system according to a preferred embodiment of the present invention;
FIG. 2 is a flow chart of cluster deployment in a preferred embodiment of the present invention;
FIG. 3 is a diagram of cluster deployment results in accordance with a preferred embodiment of the present invention;
FIG. 4 is a flow chart of a multi-cluster based container monitoring alarm method according to a preferred embodiment of the present invention;
FIG. 5 is a flowchart of a user creating alert rules in accordance with a preferred embodiment of the present invention;
FIG. 6 is a functional block diagram of a multi-cluster based container monitoring alarm system in accordance with a preferred embodiment of the present invention;
fig. 7 is a schematic structural diagram of a container monitoring and warning device based on multiple clusters according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order, it being understood that the data so used may be interchanged under appropriate circumstances. Furthermore, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
A Kubernetes cluster (hereinafter referred to as a cluster) is composed of a plurality of host nodes. All the applications are managed by the cluster in a container form and distributed and deployed on the nodes through the cluster container orchestration function. The container monitoring and warning system can be deployed on a main cluster and supports monitoring of a plurality of clusters.
Fig. 1 is a block diagram of a container monitoring and warning system based on multiple clusters according to a preferred embodiment of the present invention. As shown in fig. 1, a multi-cluster-based container monitoring and alarming system includes: monitoring module 1, warning module 2, message notification module 3 and UI visualization module 4, wherein:
the monitoring module 1 includes:
the index capture rule maintenance unit is used for configuring capture rules of indexes of all set resources in an yml configuration file prometheus.yml of Prometheus;
the monitoring component deployment unit is used for deploying the monitoring components of at least one cluster to be monitored through a cluster deployment file yaml, and the monitoring components are used for periodically capturing instantaneous index data of running of each resource in the cluster according to a preset capturing rule;
the alarm module 2 includes:
the system comprises an alarm rule maintenance unit, a resource setting unit and a resource setting unit, wherein the alarm rule maintenance unit is used for configuring alarm rules of all set resources in an yml configuration file prometheus.yml of Prometheus;
the monitoring module 1 is used for capturing the instantaneous index data of the cluster to be monitored and sending the instantaneous index data to the receiving unit, and the receiving unit is used for receiving the alarm information sent by the monitoring module 1 and pushing the alarm information to the alarm management component alert manager when the monitoring module 1 determines that the instantaneous index data captured on the cluster to be monitored triggers the alarm rule;
the sending unit is used for sending the alarm information in the alarm management component alert manager to the message notification module 3;
the message notification module 3 is used for sending the alarm information to the corresponding subscription terminal according to the preset account password of the message sending channel, the theme and the subscription terminal of the theme;
and the UI visualization module 4 is used for inquiring and/or displaying the alarm information sent by the alarm module 2 and/or the instantaneous index data monitored by the monitoring module 1.
The monitoring component in the above includes:
1) the index grabbing storage component Prometous is used for being deployed in the main cluster;
2) the alarm management component Alertmanager is used for being deployed in the main cluster;
3) a core collector:
a host index collector node-explorer for being deployed at each node (node) of each cluster to be monitored;
a container index collector cAdvisor for being deployed at each node (node) of each cluster to be monitored;
the cluster state index collector kube-state-metrics is used for being deployed in each cluster to be monitored;
4) and (4) other collectors:
various middleware collectors corresponding to the middleware can be customized, such as collectors of MySQL, MongoDB, Redis and the like, and only the cluster deployment file yaml needs to be provided under the path specified by the monitoring module, wherein each middleware instance deploys an independent middleware collector, for example, if the cluster has three MySQL, three middleware collectors need to be deployed, and each middleware collector is responsible for one MySQL.
When a plurality of clusters need to be added into monitoring, the main cluster needs to add information such as access addresses and access tokens of other clusters so as to normally access each cluster and deploy monitoring components.
Fig. 2 is a flow chart of cluster deployment in the present application, and a deployment result chart is shown with reference to fig. 3.
As shown in fig. 2, the deployment process of the cluster is:
step S01: judging whether the basic component (namely the monitoring component) is deployed, if so, executing the step S11, otherwise, executing the step S02;
step S02: generating a main cluster deployment file yaml;
step S11: judging whether to deploy a new cluster at the same time, if so, executing the step S12, otherwise, executing the step S21;
step S12: inputting an access address (grab address) and an access token of the new cluster, and executing step S13;
step S13: judging whether the networks are connected, if so, executing the step S14, otherwise, executing the step S12;
step S14: judging whether a collector of a new cluster is deployed, if so, executing the step S15, otherwise, executing the step S21;
step S15: generating a new cluster deployment file yaml;
step S21: judging whether a new deployment file is generated, if so, executing the step S31, otherwise, ending the deployment process;
step 31: and starting to run the deployment file and ending the deployment process.
In the above, the access addresses of the grab indicators of all resources recorded by prometheus.yml specifically include:
1) index access addresses of host index collector node-expoerter deployed by each node of each cluster;
2) index access addresses of container index collectors cAdvisors deployed by each node of each cluster;
3) index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster;
4) the pointer access address of each middleware collector deployed on each cluster.
The capture rules in the above are: the indexes of various collectors are filtered and recalculated, and only the indexes of CPU/memory/network/disk and the like which are most concerned by a storage user are pulled by taking a cluster/host/namespace/application/container example as a resource latitude, so that a large number of indexes which are useless to the user are eliminated, the storage pressure is reduced, and the query performance of the user is greatly improved.
In the above, when a new cluster is added, after the main cluster records the access address and the access token of the new cluster, the monitoring module adds the index access address and the access token for accessing the new cluster collector in the configuration file, and after the configuration is completed, calls the reloading configuration interface of Prometheus to enable the configuration to take effect.
Fig. 4 is a flowchart of a container monitoring alarm method based on multiple clusters according to a preferred embodiment of the present invention. As shown in fig. 4, a container monitoring and alarming method based on multiple clusters includes:
step 01: the access address (fetch address) of the fetch target for deploying all resources and the alarm rule of all resources are installed through the yml configuration file of Prometheus.
Wherein the access address includes: recording the index access address of a host index collector node-expoerter deployed at each node of each cluster; recording the index access address of a container index collector cAdvisor deployed at each node of each cluster; recording index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster; and recording the index access address of each middleware collector deployed on each cluster.
Step 02: deploying the monitoring component of at least one cluster to be monitored through the cluster deployment file yaml, wherein the monitoring component periodically captures instantaneous index data of each resource operation in the cluster according to a preset capture rule.
Deploying, by a monitoring module, a monitoring component of at least one cluster to be monitored, comprising: the method comprises the steps of deploying an index capture storage component Prometheus and an alarm management component Alertmanager on a first cluster, deploying a host index collector node-inserter and a container index collector cAdviror on each node (node) of each cluster to be monitored respectively, deploying a cluster state index collector club-state-metrics on each cluster to be monitored respectively, deploying a middleware collector corresponding to a specified middleware on each cluster to be monitored, and enabling each middleware to correspond to an independent middleware collector.
Step 03: when the instantaneous index data of any resource operation captured by the monitoring module triggers an alarm rule, the alarm information is sent to the message notification module through the Alertmanager.
The method comprises the steps that instantaneous index data running on each node (node) are collected by a host index collector node-expander and a container index collector cAdviror, an input index grabbing storage component Prometheus is collected and matched with an alarm rule configured in an yml configuration file of the Prometheus in advance, and if the alarm rule is triggered, an alarm management component Alertmanager configures alarm information and sends the alarm information to a message notification module.
Step 04: and the message notification module sends the alarm information to the corresponding subscription terminal.
And the message notification module is configured with an account password of a message sending channel, and manages different alarm information to be sent to the corresponding subscription terminal by adding a theme and the subscription terminal of the theme. The message sending channel configured by the message notification module can be a mailbox, a short message, an enterprise WeChat, a voice telephone notification, a QQ notification and the like. The message notification module presets a topic subscribed by the user, wherein the topic comprises the warning information interested by the user. And when the captured instantaneous index data of any resource operation triggers an alarm rule, the message notification module sends alarm information associated with the theme to the subscription terminal through the configured message sending channel.
In a specific application scenario, the writing threshold of the configuration file is high, and taking the yaml file as an example, a user needs to know information such as attributes (such as names, deployment units and the like) of each container on a cluster to be monitored and meanings of various data indexes very much, so that a correct yaml file can be written, the operation is complex, and the monitoring efficiency is reduced. Therefore, in the application, a user can create an alarm rule through a UI visualization module, generate a configuration page of the alarm rule, issue a policy instruction through the configuration page to generate a first alarm policy, update the yml configuration file of Prometheus according to the first alarm policy, where the updated yml configuration file of Prometheus includes the first alarm policy, and then activate the alarm rule by using a Prometheus reload configuration file mechanism.
For example, a user may add an alarm rule through the UI visualization module, monitor all container instances (resources) under all clusters, and alarm a subscribing terminal subscribing to a specified topic when the memory usage rate (index) is greater than (condition) 80% (threshold). The alarm module records the alarm rule created by the user, modifies the Prometheus configuration file, and activates the alarm rule by using a Prometheus reloading configuration file mechanism.
In addition, after the alarm rule is triggered by the instantaneous index data of any resource operation, the user can also check alarm information through the UI visualization module.
Specifically, a flow chart of creating the alarm rule is shown in fig. 5.
In the foregoing, the alarm information includes: cluster dimension alarm items, node dimension alarm items and container group dimension alarm items.
Wherein the cluster dimension alarm item may include: the utilization rate of the CPU exceeds 80%, the utilization rate of the memory exceeds 80%, the local storage of all nodes of the cluster exceeds 80%, the resource utilization of a namespace exceeds 80%, and the state of a cluster container group (pod) is abnormal.
Wherein the node dimension alarm item may include: the utilization rate of the CPU of the node (node) exceeds 80%, the memory utilization rate of the node (node) exceeds 80%, and the local storage utilization condition of the node (node) exceeds 80%.
Wherein the container group dimension alarm item may include: the CPU utilization rate of the container group (pod) exceeds 80%, and the memory utilization rate of the container group (pod) exceeds 80%.
Referring to fig. 6, the operation principle of the container monitoring and warning system of the present application is as follows:
1) and the monitoring module maintains the index access address and the index capture rule of each cluster collector in prometheus.yml, and deploys collectors for the new cluster through the UI visualization module.
2) And the alarm module maintains an alarm rule formula in prometheus.yml, and adds and modifies the alarm rule through the UI visualization module.
3) And Prometheus loading configuration, periodically grabbing the instantaneous indexes of each collector according to the index access address and the index grabbing rule, wherein the collectors do not store data but allow the Prometheus to grab the instantaneous indexes.
4) And the Prometheus periodically calculates whether the alarm rule expression reaches the requirement index threshold value according to the alarm rule.
5) Prometheus pushes alerts to alert manager when the alert rule expression satisfies a condition, such as memory usage of a certain container instance is greater than 80%.
6) Summarizing and alarming and pushing: and after the alarm is collected into the alert manager, sending the alarm information to the message notification module according to the configuration file of the alert manager.
7) The message notification module is pre-configured with account passwords of message sending channels (short messages, mailboxes, enterprise WeChats and the like), and reasonably manages different alarms to be sent to different subscription terminals by adding themes and terminals (mobile phone numbers, mailbox addresses and the like) subscribed by the themes. Once the alarm rule is triggered, the user can receive a notification through a preset sending channel, a preset theme and a preset subscription terminal.
The present application further provides a multi-cluster-based container monitoring and alarming device, which may specifically be a client deployed with a kubernets platform, as shown in fig. 7, the container monitoring and alarming device includes a memory 31 and a processor 32, where the memory 31 stores a computer program, and the processor 32 is configured to execute all the computer programs in the memory 31, so as to implement the steps of the multi-cluster container monitoring and alarming method described above.
The present application also provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for multi-cluster container monitoring alarm as described above.
In summary, the present application discloses a container monitoring and alarming method, system, device and storage medium based on multiple clusters, which can monitor the operation index of each node of the multiple clusters through a monitoring module and an alarming module, and alarm the abnormal condition in time, thereby facilitating reasonable adjustment and allocation of system resources and improving the overall performance of the clusters; the container monitoring and alarming system based on the Kubernetes cluster can be automatically deployed without complex configuration; the method simplifies and optimizes a large amount of resource monitoring indexes based on Kubernetes; the method and the device can customize the alarm rule and the push of the alarm information, so that operation and maintenance and developers can smoothly realize monitoring and alarm of the concerned application service on the premise of completely not knowing Prometheus and Kubernetes technologies.
The embodiments of the present invention have been described in detail, but the embodiments are merely examples, and the present invention is not limited to the embodiments described above. Any equivalent modifications and substitutions to those skilled in the art are also within the scope of the present invention. Accordingly, equivalent changes and modifications made without departing from the spirit and scope of the present invention should be covered by the present invention.
Claims (10)
1. The container monitoring and alarming method based on the multiple clusters is characterized by comprising the following steps:
the method comprises the steps of maintaining a Prometous configuration file promemeus.yml through a monitoring module, configuring capture rules of indexes of all set resources in promemeus.yml, and deploying monitoring components of at least one cluster to be monitored, wherein the monitoring components capture instantaneous index data of running of each resource in the cluster periodically according to preset capture rules;
the method comprises the steps of maintaining a Prometous configuration file promemeus.yml through an alarm module, configuring alarm rules of all set resources in promemeus.yml, and configuring alarm information through an alarm management component Alertmanager to send the alarm information to a message notification module;
configuring account passwords of a message sending channel through a message notification module, and managing different alarm information to be sent to corresponding subscription terminals through adding a theme and the subscription terminals of the theme;
when the alarm rule is triggered by the instantaneous index data of any resource operation captured by the monitoring module, the alarm information is sent to the message notification module through the Alertmanager, and the message notification module sends the alarm information to the corresponding subscription terminal.
2. The multi-cluster-based container monitoring alarm method according to claim 1, wherein deploying the monitoring component of at least one cluster to be monitored by the monitoring module comprises: deploying an index capture storage component Prometheus and an alarm management component Alertmanager on a first cluster, deploying a host index collector node-inserter and a container index collector cAdviror on each node of each cluster to be monitored respectively, deploying a cluster state index collector club-state-metrics on each cluster to be monitored respectively, and,
and deploying a middleware collector corresponding to the specified middleware on each cluster to be monitored, wherein each middleware corresponds to an independent middleware collector.
3. The multi-cluster-based container monitoring alarm method according to claim 2, wherein the host index collector node-expander and the container index collector cAdvisor collect incoming index capture storage component Prometheus, match alarm rules preconfigured in yml configuration file Prometheus. yml of Prometheus, and if an alarm rule is triggered, the alarm management component alert manager sends alarm information to the message notification module.
4. The multi-cluster-based container monitoring alarm method according to claim 2, wherein in Prometheus' yml configuration file prometheus.yml, the grab address of the grab pointer comprises:
index access addresses of host index collector node-expoerter deployed by each node of each cluster;
index access addresses of container index collectors cAdvisors deployed by each node of each cluster;
index access addresses of a cluster state index collector kube-state-metrics deployed on each cluster; and the number of the first and second groups,
the pointer access address of each middleware collector deployed on each cluster.
5. The multi-cluster-based container monitoring alarm method according to claim 2, wherein when at least one second cluster needs to join in monitoring, the first cluster records the grab address and the access token of the grab index of the second cluster, the grab address and the access token of the grab index of the second cluster are added to the cluster deployment file yaml, and after configuration is completed, a reloading configuration interface of Prometheus is called to enable configuration to take effect; wherein the first cluster and the second cluster are different clusters.
6. The multi-cluster-based container monitoring alarm method according to claim 1, further comprising:
generating a first alarm strategy according to a strategy instruction input by a user;
updating yml configuration file Prometheus. yml of Prometheus according to the first alarm policy, wherein the updated Prometheus. yml comprises the first alarm policy; and calling a reloading configuration interface of Prometheus to enable the configuration to be effective.
7. The multi-cluster-based container monitoring alarm method according to claim 1, further comprising: presetting a theme subscribed by a user, wherein the theme comprises alarm information interested by the user; and when the captured instantaneous index data of any resource operation triggers an alarm rule, sending alarm information associated with the theme through a configured message sending channel.
8. A multi-cluster based container monitoring alarm system, comprising: monitoring module, alarm module and message notice module, wherein:
the monitoring module comprises:
the index capture rule maintenance unit is used for configuring capture rules of indexes of all set resources in an yml configuration file prometheus.yml of Prometheus;
the monitoring component deployment unit is used for deploying the monitoring components of at least one cluster to be monitored through a cluster deployment file yaml, and the monitoring components are used for periodically capturing instantaneous index data of running of each resource in the cluster according to a preset capturing rule;
the alarm module comprises:
the system comprises an alarm rule maintenance unit, a resource setting unit and a resource setting unit, wherein the alarm rule maintenance unit is used for configuring alarm rules of all set resources in an yml configuration file prometheus.yml of Prometheus;
the receiving unit is used for receiving the alarm information sent by the monitoring module and pushing the alarm information to an alarm management component alert manager when the monitoring module determines that the instantaneous index data captured on the cluster to be monitored triggers an alarm rule;
the sending unit is used for sending the alarm information in the alarm management component alert manager to the message notification module;
and the message notification module is used for sending the alarm information to the corresponding subscription terminal according to the preset account password of the message sending channel, the preset topic and the preset subscription terminal of the topic.
9. A multi-cluster based container monitoring and warning device, comprising:
a memory having a computer program stored therein;
a processor for executing all computer programs in said memory for implementing the steps of the multi-cluster based container monitoring alarm method according to any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the multi-cluster based container monitoring alarm method according to any of the claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011251413.9A CN112511339B (en) | 2020-11-09 | 2020-11-09 | Container monitoring alarm method, system, equipment and storage medium based on multiple clusters |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011251413.9A CN112511339B (en) | 2020-11-09 | 2020-11-09 | Container monitoring alarm method, system, equipment and storage medium based on multiple clusters |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112511339A true CN112511339A (en) | 2021-03-16 |
CN112511339B CN112511339B (en) | 2023-04-07 |
Family
ID=74957795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011251413.9A Active CN112511339B (en) | 2020-11-09 | 2020-11-09 | Container monitoring alarm method, system, equipment and storage medium based on multiple clusters |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112511339B (en) |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112671602A (en) * | 2020-12-14 | 2021-04-16 | 北京金山云网络技术有限公司 | Data processing method, device, system, equipment and storage medium of edge node |
CN112925649A (en) * | 2021-03-31 | 2021-06-08 | 中国人民解放军国防科技大学 | Unified monitoring method for virtual network functions |
CN113242150A (en) * | 2021-06-03 | 2021-08-10 | 上海天旦网络科技发展有限公司 | Calico network plug-in-based data packet capturing method and system in K8s |
CN113377617A (en) * | 2021-06-11 | 2021-09-10 | 重庆农村商业银行股份有限公司 | Monitoring system |
CN113377626A (en) * | 2021-08-11 | 2021-09-10 | 上海领健信息技术有限公司 | Visual unified alarm method, device, equipment and medium based on service tree |
CN113419818A (en) * | 2021-06-23 | 2021-09-21 | 北京达佳互联信息技术有限公司 | Basic component deployment method, device, server and storage medium |
CN113542068A (en) * | 2021-07-15 | 2021-10-22 | 中国银行股份有限公司 | Redis multi-instance monitoring system and method |
CN113704065A (en) * | 2021-08-31 | 2021-11-26 | 平安普惠企业管理有限公司 | Monitoring method, device, equipment and computer storage medium |
CN113778614A (en) * | 2021-08-03 | 2021-12-10 | 科大国创云网科技有限公司 | Cluster abnormity monitoring and warning system and method facing enterprise service bus |
CN113791954A (en) * | 2021-09-17 | 2021-12-14 | 上海道客网络科技有限公司 | Container bare metal server and method and system for coping with physical environment risks thereof |
CN113821416A (en) * | 2021-09-18 | 2021-12-21 | 中国电信股份有限公司 | Monitoring alarm method, device, storage medium and electronic equipment |
CN114189423A (en) * | 2021-12-08 | 2022-03-15 | 兴业银行股份有限公司 | Intelligent inquiry alarm system, method and medium with comprehensive compatibility and expansion |
CN114253807A (en) * | 2021-12-20 | 2022-03-29 | 深圳前海微众银行股份有限公司 | Alarm information notification method and device |
CN114328107A (en) * | 2021-12-28 | 2022-04-12 | 北京易华录信息技术股份有限公司 | Monitoring method and system for optomagnetic fusion storage server cluster and electronic equipment |
CN114860510A (en) * | 2022-07-08 | 2022-08-05 | 飞狐信息技术(天津)有限公司 | Data monitoring method and system of micro-service system |
CN114884838A (en) * | 2022-05-20 | 2022-08-09 | 远景智能国际私人投资有限公司 | Monitoring method of Kubernetes component and server |
CN114926288A (en) * | 2022-06-06 | 2022-08-19 | 中信建投证券股份有限公司 | Intelligent strategy monitoring cloud platform and intelligent strategy monitoring method and device |
CN114944980A (en) * | 2022-07-26 | 2022-08-26 | 上海有孚智数云创数字科技有限公司 | System method, apparatus, medium, and program product for monitoring alarms |
CN115022196A (en) * | 2022-06-14 | 2022-09-06 | 启明信息技术股份有限公司 | Method and system for predicting software operation problems and giving alarm |
CN115080366A (en) * | 2022-08-22 | 2022-09-20 | 深圳依时货拉拉科技有限公司 | Alarm method, alarm device, computer equipment and storage medium |
CN115150292A (en) * | 2022-05-17 | 2022-10-04 | 深圳萨摩耶数字科技有限公司 | Monitoring method and device for k8s cluster, electronic equipment and storage medium |
CN115473783A (en) * | 2022-08-04 | 2022-12-13 | 浪潮软件集团有限公司 | Prometheus-based index alarm management system and method |
CN115801539A (en) * | 2022-11-16 | 2023-03-14 | 浪潮云信息技术股份公司 | Tenant-side container monitoring, collecting and alarming method and system under container cloud scene |
CN115801541A (en) * | 2022-11-18 | 2023-03-14 | 湖南长银五八消费金融股份有限公司 | Slow access warning method and device in full-link tracking platform and computer equipment |
CN115827393A (en) * | 2023-02-21 | 2023-03-21 | 德特赛维技术有限公司 | Server cluster monitoring and warning system |
CN115996180A (en) * | 2022-12-01 | 2023-04-21 | 深圳前海环融联易信息科技服务有限公司 | Monitoring alarm system, method, equipment and computer storage medium |
CN116232965A (en) * | 2022-12-23 | 2023-06-06 | 中国联合网络通信集团有限公司 | Cluster host monitoring system, method and storage medium |
CN116346904A (en) * | 2023-05-19 | 2023-06-27 | 北京奇虎科技有限公司 | Information pushing method, device, equipment and storage medium |
WO2024002190A1 (en) * | 2022-06-30 | 2024-01-04 | 中兴通讯股份有限公司 | Monitor-based container adjustment method and device, and storage medium |
CN118095494A (en) * | 2024-03-28 | 2024-05-28 | 暗物智能科技(广州)有限公司 | Model training method, device, computer equipment and readable storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110417901A (en) * | 2019-07-31 | 2019-11-05 | 北京金山云网络技术有限公司 | Data processing method, device and gateway server |
CN110780918A (en) * | 2019-10-28 | 2020-02-11 | 江苏满运软件科技有限公司 | Middleware container processing method and device, electronic equipment and storage medium |
CN110941531A (en) * | 2019-11-15 | 2020-03-31 | 北京浪潮数据技术有限公司 | Monitoring alarm method, device and equipment for monitoring alarm management platform |
CN111045901A (en) * | 2019-12-11 | 2020-04-21 | 东软集团股份有限公司 | Container monitoring method and device, storage medium and electronic equipment |
CN111459749A (en) * | 2020-03-18 | 2020-07-28 | 平安科技(深圳)有限公司 | Prometous-based private cloud monitoring method and device, computer equipment and storage medium |
CN111459763A (en) * | 2020-04-03 | 2020-07-28 | 中国建设银行股份有限公司 | Cross-kubernets cluster monitoring system and method |
US20200267212A1 (en) * | 2019-02-15 | 2020-08-20 | International Business Machines Corporation | Method for managing and allocating resources in a clustered computing environment |
-
2020
- 2020-11-09 CN CN202011251413.9A patent/CN112511339B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200267212A1 (en) * | 2019-02-15 | 2020-08-20 | International Business Machines Corporation | Method for managing and allocating resources in a clustered computing environment |
CN110417901A (en) * | 2019-07-31 | 2019-11-05 | 北京金山云网络技术有限公司 | Data processing method, device and gateway server |
CN110780918A (en) * | 2019-10-28 | 2020-02-11 | 江苏满运软件科技有限公司 | Middleware container processing method and device, electronic equipment and storage medium |
CN110941531A (en) * | 2019-11-15 | 2020-03-31 | 北京浪潮数据技术有限公司 | Monitoring alarm method, device and equipment for monitoring alarm management platform |
CN111045901A (en) * | 2019-12-11 | 2020-04-21 | 东软集团股份有限公司 | Container monitoring method and device, storage medium and electronic equipment |
CN111459749A (en) * | 2020-03-18 | 2020-07-28 | 平安科技(深圳)有限公司 | Prometous-based private cloud monitoring method and device, computer equipment and storage medium |
CN111459763A (en) * | 2020-04-03 | 2020-07-28 | 中国建设银行股份有限公司 | Cross-kubernets cluster monitoring system and method |
Non-Patent Citations (1)
Title |
---|
田贞朗;: "Kubernetes基于Prometheus弹性伸缩POD的方法" * |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112671602A (en) * | 2020-12-14 | 2021-04-16 | 北京金山云网络技术有限公司 | Data processing method, device, system, equipment and storage medium of edge node |
CN112925649A (en) * | 2021-03-31 | 2021-06-08 | 中国人民解放军国防科技大学 | Unified monitoring method for virtual network functions |
CN112925649B (en) * | 2021-03-31 | 2021-09-14 | 中国人民解放军国防科技大学 | Unified monitoring method for virtual network functions |
CN113242150A (en) * | 2021-06-03 | 2021-08-10 | 上海天旦网络科技发展有限公司 | Calico network plug-in-based data packet capturing method and system in K8s |
CN113377617B (en) * | 2021-06-11 | 2023-06-16 | 重庆农村商业银行股份有限公司 | Monitoring system |
CN113377617A (en) * | 2021-06-11 | 2021-09-10 | 重庆农村商业银行股份有限公司 | Monitoring system |
CN113419818A (en) * | 2021-06-23 | 2021-09-21 | 北京达佳互联信息技术有限公司 | Basic component deployment method, device, server and storage medium |
CN113542068A (en) * | 2021-07-15 | 2021-10-22 | 中国银行股份有限公司 | Redis multi-instance monitoring system and method |
CN113778614A (en) * | 2021-08-03 | 2021-12-10 | 科大国创云网科技有限公司 | Cluster abnormity monitoring and warning system and method facing enterprise service bus |
CN113377626A (en) * | 2021-08-11 | 2021-09-10 | 上海领健信息技术有限公司 | Visual unified alarm method, device, equipment and medium based on service tree |
CN113704065A (en) * | 2021-08-31 | 2021-11-26 | 平安普惠企业管理有限公司 | Monitoring method, device, equipment and computer storage medium |
CN113791954B (en) * | 2021-09-17 | 2023-09-22 | 上海道客网络科技有限公司 | Container bare metal server and method and system for coping physical environment risk of container bare metal server |
CN113791954A (en) * | 2021-09-17 | 2021-12-14 | 上海道客网络科技有限公司 | Container bare metal server and method and system for coping with physical environment risks thereof |
CN113821416A (en) * | 2021-09-18 | 2021-12-21 | 中国电信股份有限公司 | Monitoring alarm method, device, storage medium and electronic equipment |
CN114189423A (en) * | 2021-12-08 | 2022-03-15 | 兴业银行股份有限公司 | Intelligent inquiry alarm system, method and medium with comprehensive compatibility and expansion |
CN114189423B (en) * | 2021-12-08 | 2024-08-06 | 兴业银行股份有限公司 | Fully compatible extended intelligent inquiry and alarm system, method and medium |
CN114253807A (en) * | 2021-12-20 | 2022-03-29 | 深圳前海微众银行股份有限公司 | Alarm information notification method and device |
CN114328107A (en) * | 2021-12-28 | 2022-04-12 | 北京易华录信息技术股份有限公司 | Monitoring method and system for optomagnetic fusion storage server cluster and electronic equipment |
CN115150292A (en) * | 2022-05-17 | 2022-10-04 | 深圳萨摩耶数字科技有限公司 | Monitoring method and device for k8s cluster, electronic equipment and storage medium |
CN114884838B (en) * | 2022-05-20 | 2023-05-12 | 远景智能国际私人投资有限公司 | Monitoring method and server of Kubernetes component |
CN114884838A (en) * | 2022-05-20 | 2022-08-09 | 远景智能国际私人投资有限公司 | Monitoring method of Kubernetes component and server |
CN114926288A (en) * | 2022-06-06 | 2022-08-19 | 中信建投证券股份有限公司 | Intelligent strategy monitoring cloud platform and intelligent strategy monitoring method and device |
CN115022196A (en) * | 2022-06-14 | 2022-09-06 | 启明信息技术股份有限公司 | Method and system for predicting software operation problems and giving alarm |
WO2024002190A1 (en) * | 2022-06-30 | 2024-01-04 | 中兴通讯股份有限公司 | Monitor-based container adjustment method and device, and storage medium |
CN114860510A (en) * | 2022-07-08 | 2022-08-05 | 飞狐信息技术(天津)有限公司 | Data monitoring method and system of micro-service system |
CN114860510B (en) * | 2022-07-08 | 2022-12-02 | 飞狐信息技术(天津)有限公司 | Data monitoring method and system of micro-service system |
CN114944980A (en) * | 2022-07-26 | 2022-08-26 | 上海有孚智数云创数字科技有限公司 | System method, apparatus, medium, and program product for monitoring alarms |
CN114944980B (en) * | 2022-07-26 | 2022-10-21 | 上海有孚智数云创数字科技有限公司 | System method, apparatus, and medium for monitoring alarms |
CN115473783A (en) * | 2022-08-04 | 2022-12-13 | 浪潮软件集团有限公司 | Prometheus-based index alarm management system and method |
CN115080366B (en) * | 2022-08-22 | 2022-11-15 | 深圳依时货拉拉科技有限公司 | Alarm method, alarm device, computer equipment and storage medium |
CN115080366A (en) * | 2022-08-22 | 2022-09-20 | 深圳依时货拉拉科技有限公司 | Alarm method, alarm device, computer equipment and storage medium |
CN115801539A (en) * | 2022-11-16 | 2023-03-14 | 浪潮云信息技术股份公司 | Tenant-side container monitoring, collecting and alarming method and system under container cloud scene |
CN115801541B (en) * | 2022-11-18 | 2024-03-22 | 湖南长银五八消费金融股份有限公司 | Method and device for alarming slow access in full-link tracking platform and computer equipment |
CN115801541A (en) * | 2022-11-18 | 2023-03-14 | 湖南长银五八消费金融股份有限公司 | Slow access warning method and device in full-link tracking platform and computer equipment |
CN115996180A (en) * | 2022-12-01 | 2023-04-21 | 深圳前海环融联易信息科技服务有限公司 | Monitoring alarm system, method, equipment and computer storage medium |
CN115996180B (en) * | 2022-12-01 | 2024-09-20 | 深圳前海环融联易信息科技服务有限公司 | Monitoring alarm system, method, equipment and computer storage medium |
CN116232965A (en) * | 2022-12-23 | 2023-06-06 | 中国联合网络通信集团有限公司 | Cluster host monitoring system, method and storage medium |
CN115827393B (en) * | 2023-02-21 | 2023-10-20 | 德特赛维技术有限公司 | Server cluster monitoring and alarming system |
CN115827393A (en) * | 2023-02-21 | 2023-03-21 | 德特赛维技术有限公司 | Server cluster monitoring and warning system |
CN116346904A (en) * | 2023-05-19 | 2023-06-27 | 北京奇虎科技有限公司 | Information pushing method, device, equipment and storage medium |
CN116346904B (en) * | 2023-05-19 | 2023-09-22 | 北京奇虎科技有限公司 | Information pushing method, device, equipment and storage medium |
CN118095494A (en) * | 2024-03-28 | 2024-05-28 | 暗物智能科技(广州)有限公司 | Model training method, device, computer equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112511339B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112511339B (en) | Container monitoring alarm method, system, equipment and storage medium based on multiple clusters | |
CN109714192B (en) | Monitoring method and system for monitoring cloud platform | |
WO2021017301A1 (en) | Management method and apparatus based on kubernetes cluster, and computer-readable storage medium | |
US7152104B2 (en) | Method and apparatus for notifying administrators of selected events in a distributed computer system | |
CN102652410B (en) | Cloud computing supervision and management system | |
CN108234168A (en) | A kind of method for exhibiting data and system based on service topology | |
EP3018577A1 (en) | Interface call system and method | |
CN111190794A (en) | Operation and maintenance monitoring and management system | |
CN112511580B (en) | Message pushing method, device, storage medium and equipment | |
CN103501237A (en) | Device management method, management platform, device and system | |
WO2019153532A1 (en) | Deployment method and apparatus for monitoring system, and computer device and storage medium | |
CN113377626B (en) | Visual unified alarm method, device, equipment and medium based on service tree | |
CN110138753B (en) | Distributed message service system, method, apparatus, and computer-readable storage medium | |
CN102089749B (en) | Method and apparatus for managing binding information about a bundle installed remotely in an osgi service platform | |
CN113626286A (en) | Multi-cluster instance processing method and device, electronic equipment and storage medium | |
CN112162821A (en) | Container cluster resource monitoring method, device and system | |
CN114518934A (en) | Unified operation and maintenance platform architecture system | |
CN112055061A (en) | Distributed message processing method and device | |
CN112698929B (en) | Information acquisition method and device | |
CN115934464A (en) | Information platform monitoring and collecting system | |
CN113037549A (en) | Operation and maintenance environment warning method | |
US9922539B1 (en) | System and method of telecommunication network infrastructure alarms queuing and multi-threading | |
CN113094053A (en) | Product delivery method and device and computer storage medium | |
US8494931B2 (en) | Management of actions based on priority levels and calendar entries | |
CN114168297A (en) | Method, device, equipment and medium for scheduling collection tasks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |