CN115729783A - Failure risk monitoring method, device, storage medium and program product - Google Patents
Failure risk monitoring method, device, storage medium and program product Download PDFInfo
- Publication number
- CN115729783A CN115729783A CN202211520954.6A CN202211520954A CN115729783A CN 115729783 A CN115729783 A CN 115729783A CN 202211520954 A CN202211520954 A CN 202211520954A CN 115729783 A CN115729783 A CN 115729783A
- Authority
- CN
- China
- Prior art keywords
- historical
- fault
- current
- matching
- states
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000012544 monitoring process Methods 0.000 title claims abstract description 36
- 238000003860 storage Methods 0.000 title claims abstract description 23
- 238000005070 sampling Methods 0.000 claims abstract description 60
- 238000012806 monitoring device Methods 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 10
- 238000013461 design Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 10
- 230000003287 optical effect Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000012423 maintenance Methods 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 5
- 230000001960 triggered effect Effects 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000012502 risk assessment Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000003032 molecular docking Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000013522 software testing Methods 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
Images
Landscapes
- Debugging And Monitoring (AREA)
Abstract
本申请实施例提供一种故障风险监控方法、设备、存储介质及程序产品,该方法包括获取系统的当前运行状态,当前运行状态包括多个运行指标的当前值,将当前运行状态与多个历史运行状态分别进行匹配,获得多个历史运行状态分别对应的匹配度,不同历史运行状态对应不同的采样时间,历史运行状态包括多个运行指标在对应采样时间的历史值获取多个匹配度中最大值对应的历史运行状态下的故障事件,并根据故障事件生成故障风险提示。本实施例提供的方法,能够提高监控的全面性和准确性。
Embodiments of the present application provide a failure risk monitoring method, device, storage medium, and program product. The method includes obtaining the current operating status of the system, the current operating status includes the current values of multiple operating indicators, and combining the current operating status with multiple historical The running status is matched separately to obtain the corresponding matching degrees of multiple historical running statuses. Different historical running statuses correspond to different sampling times. The historical running status includes multiple running indicators that have the largest matching degree among the historical values corresponding to the sampling time. Fault events in the historical operating state corresponding to the value, and generate fault risk prompts based on the fault events. The method provided in this embodiment can improve the comprehensiveness and accuracy of monitoring.
Description
技术领域technical field
本申请实施例涉及软件测试技术领域,尤其涉及一种故障风险监控方法、设备、存储介质及程序产品。The embodiments of the present application relate to the technical field of software testing, and in particular to a failure risk monitoring method, device, storage medium and program product.
背景技术Background technique
为了保障软件的正常运行,生产运维人员通常会对应用系统部署的服务器、数据库、网络、客户端的运行情况进行监控,并根据监控到的指标数据来追查和定位问题。In order to ensure the normal operation of the software, production operation and maintenance personnel usually monitor the operation of the servers, databases, networks, and clients deployed by the application system, and track down and locate problems based on the monitored indicator data.
相关技术中,通常是对某一项或多项指标超过阈值的情况进行预警。In related technologies, an early warning is usually given when one or more indicators exceed a threshold.
然而,实现本申请过程中,发明人发现现有技术中至少存在如下问题:现有的方式仅能对监控到的指标进行预警,且经常会发生存在故障风险的时候未发出故障预警,监控的全面性和准确性较低。However, during the process of implementing this application, the inventors found that there are at least the following problems in the prior art: the existing methods can only provide early warnings for the monitored indicators, and it often happens that failure warnings are not issued when there is a risk of failure. Comprehensiveness and accuracy are low.
发明内容Contents of the invention
本申请实施例提供一种故障风险监控方法、设备、存储介质及程序产品,以提高监控的全面性和准确性。Embodiments of the present application provide a failure risk monitoring method, device, storage medium, and program product, so as to improve the comprehensiveness and accuracy of monitoring.
第一方面,本申请实施例提供一种故障风险监控方法,包括:In the first aspect, the embodiment of the present application provides a failure risk monitoring method, including:
获取系统的当前运行状态;所述当前运行状态包括多个运行指标的当前值;Obtain the current operating state of the system; the current operating state includes the current values of multiple operating indicators;
将所述当前运行状态与多个历史运行状态分别进行匹配,获得多个所述历史运行状态分别对应的匹配度;不同历史运行状态对应不同的采样时间;所述历史运行状态包括多个所述运行指标在对应采样时间的历史值;Matching the current operating state with multiple historical operating states to obtain matching degrees corresponding to the multiple historical operating states; different historical operating states correspond to different sampling times; the historical operating states include multiple The historical value of the running indicator at the corresponding sampling time;
获取多个所述匹配度中最大值对应的历史运行状态下的故障事件,并根据所述故障事件生成故障风险提示。Obtaining a plurality of fault events in the historical operating state corresponding to the maximum value among the matching degrees, and generating a fault risk prompt according to the fault events.
在一种可能的设计中,所述将所述当前运行状态与多个历史运行状态分别进行匹配,包括:In a possible design, said matching the current running state with multiple historical running states respectively includes:
若多个所述当前值均未超出对应的预设阈值范围,则将所述当前运行状态与多个历史运行状态分别进行匹配。If none of the multiple current values exceeds the corresponding preset threshold range, the current running state is matched with multiple historical running states respectively.
在一种可能的设计中,所述获取系统的当前运行状态之后,还包括:In a possible design, after the acquisition of the current operating state of the system, further includes:
若多个所述当前值中存在至少一个当前值均超出对应的预设阈值范围,则生成对应的故障预警;If at least one of the multiple current values exceeds the corresponding preset threshold range, a corresponding fault warning is generated;
从多个所述历史运行状态中筛选获得多个待匹配运行状态;所述待匹配运行状态对应的故障事件包括所述故障预警对应的故障事件;Obtaining a plurality of operating states to be matched by screening from a plurality of historical operating states; the fault event corresponding to the operating state to be matched includes the fault event corresponding to the fault warning;
所述将所述当前运行状态与多个历史运行状态分别进行匹配,获得多个所述历史运行状态分别对应的匹配度,包括:The step of matching the current operating state with multiple historical operating states to obtain matching degrees respectively corresponding to the multiple historical operating states includes:
将所述当前运行状态与多个所述待匹配运行状态分别进行匹配,获得多个所述待匹配运行状态分别对应的匹配度。Matching the current operating state with the plurality of operating states to be matched respectively to obtain matching degrees respectively corresponding to the operating states to be matched.
在一种可能的设计中,所述将所述当前运行状态与多个历史运行状态分别进行匹配,获得多个所述历史运行状态分别对应的匹配度,包括:In a possible design, the matching of the current running state and multiple historical running states to obtain matching degrees corresponding to the multiple historical running states respectively includes:
获取所述当前运行状态的第一向量和多个所述历史运行状态的第二向量;Acquiring the first vector of the current running state and a plurality of second vectors of the historical running state;
针对多个所述历史运行状态中的每个历史运行状态的第二向量,计算所述第一向量和所述第二向量之间的马氏距离,并根据所述马氏距离确定所述历史运行状态对应的匹配度。Computing the Mahalanobis distance between the first vector and the second vector for the second vector of each historical operating state among the plurality of historical operating states, and determining the historical The matching degree corresponding to the running status.
在一种可能的设计中,所述将所述当前运行状态与多个历史运行状态分别进行匹配之前,还包括:In a possible design, before the matching of the current running state and multiple historical running states respectively, further includes:
将预设采集周期划分为多个时间区间;Divide the preset collection period into multiple time intervals;
针对每个时间区间,根据所述时间区间对应的采样频率进行历史运行状态的采集,并将采集的历史运行状态与对应采集时刻发生的故障事件进行关联存储。For each time interval, the historical operation state is collected according to the sampling frequency corresponding to the time interval, and the collected historical operation state is associated and stored with the fault event occurring at the corresponding collection time.
在一种可能的设计中,所述根据所述时间区间对应的采样频率进行历史运行状态的采集之前,还包括:In a possible design, before performing the collection of the historical operation state according to the sampling frequency corresponding to the time interval, it also includes:
根据置信度需求和抽样误差需求,确定故障事件的需求量;According to the confidence requirements and sampling error requirements, determine the demand for fault events;
根据所述需求量采集多个故障事件;collecting a plurality of fault events according to the demand;
将所述多个故障事件划分至多个不同的时间区间内;dividing the plurality of fault events into a plurality of different time intervals;
针对每个时间区间,获取所述时间区间内发生的故障事件的数量;For each time interval, obtain the number of fault events occurring in the time interval;
根据多个所述时间区间分别对应的故障事件的数量,确定多个所述时间区间分别对应的采样频率。According to the number of fault events respectively corresponding to the multiple time intervals, the sampling frequencies respectively corresponding to the multiple time intervals are determined.
在一种可能的设计中,所述根据多个所述时间区间分别对应的故障事件的数量,确定多个所述时间区间分别对应的采样频率,包括:In a possible design, the determining the sampling frequencies respectively corresponding to the multiple time intervals according to the number of fault events respectively corresponding to the multiple time intervals includes:
确定所述预设采集周期对应的采集次数;Determining the number of acquisitions corresponding to the preset acquisition period;
针对每个时间区间,计算所述时间区间对应的故障事件的数量与所述多个故障事件的总量之间的比值,并根据所述比值和所述采集次数确定所述时间区间对应的采样频率。For each time interval, calculate the ratio between the number of fault events corresponding to the time interval and the total number of the plurality of fault events, and determine the sampling corresponding to the time interval according to the ratio and the number of acquisitions frequency.
在一种可能的设计中,所述针对每个时间区间,根据所述时间区间对应的采样频率进行历史运行状态的采集之前,还包括:In a possible design, for each time interval, before collecting the historical operation state according to the sampling frequency corresponding to the time interval, it also includes:
根据软件迭代频率确定目标时长;Determine the target duration according to the software iteration frequency;
所述针对每个时间区间,根据所述时间区间对应的采样频率进行历史运行状态的采集,包括:For each time interval, the collection of historical operation status according to the sampling frequency corresponding to the time interval includes:
针对所述目标时长对应的多个预设采集周期内的每个时间区间,根据所述时间区间对应的采样频率进行历史运行状态的采集,并将采集的历史运行状态加入样本集;For each time interval in the plurality of preset collection periods corresponding to the target duration, collect the historical operation state according to the sampling frequency corresponding to the time interval, and add the collected historical operation state to the sample set;
所述将所述当前运行状态与多个历史运行状态分别进行匹配,包括:The matching of the current running state with multiple historical running states respectively includes:
将所述当前运行状态与所述样本集中的多个历史运行状态分别进行匹配。The current running state is matched with multiple historical running states in the sample set respectively.
第二方面,本申请实施例提供一种故障风险监控设备,包括:In the second aspect, the embodiment of the present application provides a failure risk monitoring device, including:
获取模块,用于获取系统的当前运行状态;所述当前运行状态包括多个运行指标的当前值;An acquisition module, configured to acquire the current operating state of the system; the current operating state includes current values of multiple operating indicators;
匹配模块,用于将所述当前运行状态与多个历史运行状态分别进行匹配,获得多个所述历史运行状态分别对应的匹配度;不同历史运行状态对应不同的采样时间;所述历史运行状态包括多个所述运行指标在对应采样时间的历史值;A matching module, configured to match the current operating state with a plurality of historical operating states, respectively, to obtain matching degrees corresponding to a plurality of the historical operating states; different historical operating states correspond to different sampling times; the historical operating states Including multiple historical values of the operating indicators at corresponding sampling times;
生成模块,用于获取多个所述匹配度中最大值对应的历史运行状态下的故障事件,并根据所述故障事件生成故障风险提示。A generating module, configured to obtain a plurality of fault events in historical operating states corresponding to the maximum value among the matching degrees, and generate fault risk prompts according to the fault events.
第三方面,本申请实施例提供一种故障风险监控设备,包括:至少一个处理器和存储器;In a third aspect, an embodiment of the present application provides a failure risk monitoring device, including: at least one processor and a memory;
所述存储器存储计算机执行指令;the memory stores computer-executable instructions;
所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第一方面以及第一方面各种可能的设计所述的方法。The at least one processor executes the computer-executed instructions stored in the memory, so that the at least one processor executes the method described in the above first aspect and various possible designs of the first aspect.
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的方法。In the fourth aspect, the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the above first aspect and the first Aspects of various possible designs of the described method.
第五方面,本申请实施例提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,实现如上第一方面以及第一方面各种可能的设计所述的方法。In a fifth aspect, an embodiment of the present application provides a computer program product, including a computer program. When the computer program is executed by a processor, the method described in the above first aspect and various possible designs of the first aspect is implemented.
本实施例提供的故障风险监控方法、设备、存储介质及程序产品,该方法包括获取系统的当前运行状态,当前运行状态包括多个运行指标的当前值,将当前运行状态与多个历史运行状态分别进行匹配,获得多个历史运行状态分别对应的匹配度,不同历史运行状态对应不同的采样时间,历史运行状态包括多个运行指标在对应采样时间的历史值获取多个匹配度中最大值对应的历史运行状态下的故障事件,并根据故障事件生成故障风险提示。本实施例提供的故障风险监控方法,通过获取当前的多个运行指标,并将多个运行指标与预先采集的历史指标进行匹配,并基于匹配度最大的历史运行状态下发生的故障事件,生成故障风险提示,从而能够提高监控的全面性和准确性。The failure risk monitoring method, device, storage medium and program product provided in this embodiment, the method includes obtaining the current operating state of the system, the current operating state includes the current values of multiple operating indicators, and combining the current operating state with the multiple historical operating states Matching is performed separately to obtain the matching degrees corresponding to multiple historical operating states. Different historical operating states correspond to different sampling times. The historical operating states include multiple operating indicators at the historical values corresponding to the sampling time. Fault events in the historical operating state of the system, and generate fault risk prompts based on the fault events. The fault risk monitoring method provided in this embodiment obtains multiple current operating indicators, matches the multiple operating indicators with pre-collected historical indicators, and generates fault events based on the historical operating status with the highest matching degree Failure risk prompts can improve the comprehensiveness and accuracy of monitoring.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present application. Those skilled in the art can also obtain other drawings based on these drawings without any creative effort.
图1为本申请实施例提供的故障风险监控方法的应用场景示意图;FIG. 1 is a schematic diagram of an application scenario of a failure risk monitoring method provided in an embodiment of the present application;
图2为本申请实施例提供的故障风险监控方法的流程示意图一;Fig. 2 is a schematic flow diagram 1 of the failure risk monitoring method provided by the embodiment of the present application;
图3为本申请实施例提供的故障风险监控方法的流程示意图二;FIG. 3 is a schematic flow diagram II of the failure risk monitoring method provided by the embodiment of the present application;
图4为本申请实施例提供的针对应用服务器集群的抽样示意图;FIG. 4 is a schematic diagram of a sample of an application server cluster provided by an embodiment of the present application;
图5为本申请实施例提供的故障风险监控设备的结构示意图;FIG. 5 is a schematic structural diagram of a failure risk monitoring device provided in an embodiment of the present application;
图6为本申请实施例提供的故障风险监控设备的硬件结构示意图。FIG. 6 is a schematic diagram of a hardware structure of a failure risk monitoring device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.
为了保障软件的正常运行,生产运维人员通常会对应用系统部署的服务器、数据库、网络、客户端的运行情况进行监控,通过设置性能监控指标和指标阈值实现故障风险预警,并根据监控到的指标数据来追查和定位问题。In order to ensure the normal operation of the software, production operation and maintenance personnel usually monitor the operation of the servers, databases, networks, and clients deployed in the application system, and realize early warning of failure risks by setting performance monitoring indicators and indicator thresholds, and according to the monitored indicators data to track down and locate problems.
相关的性能风险预警系统通常是对某一项或多项指标超过阈值的情况进行预警,但是实际应用场景中存在两方面不足:一是现有预警系统仅能对监控到的指标进行预警,然而存在一些性能指标因为监控难度大而无法进行全面的监控,这些监控漏洞带来的潜在风险用传统的阈值预警系统无法覆盖,监控不全面;二是当监控到的各项指标值均低于阈值时,阈值预警系统会认为应用运行正常不会发出预警,但是实际仍然有发生故障的可能性,监控准确性较低。Related performance risk early warning systems usually provide early warning when one or more indicators exceed the threshold, but there are two deficiencies in actual application scenarios: First, the existing early warning system can only provide early warning for the monitored indicators, but There are some performance indicators that cannot be comprehensively monitored due to the difficulty of monitoring. The potential risks brought by these monitoring loopholes cannot be covered by the traditional threshold early warning system, and the monitoring is not comprehensive; the second is when the values of the monitored indicators are all lower than the threshold , the threshold warning system will think that the application is running normally and will not issue a warning, but there is still a possibility of failure in reality, and the monitoring accuracy is low.
为解决上述技术问题,本申请发明人研究发现可以通过收集应用系统在过去一段时间内各项运行指标值和发生故障的数据作为样本,计算应用系统当前各项运行指标值与样本的匹配度,找出匹配度最大的样本,该样本可以确定为与当前应用系统的运行状态最为接近,如果该样本包含故障事件,则可以判断系统存在故障风险,同时还可以将样本故障事件的解决方案和细节提供给相关人员参考。基于此,本申请实施例提供一种故障风险监控方法,能够提高监控的全面性和准确性。In order to solve the above technical problems, the inventors of the present application found that the matching degree between the current operating index values of the application system and the samples can be calculated by collecting the operating index values and failure data of the application system in the past period of time as samples. Find the sample with the highest matching degree. This sample can be determined to be the closest to the running state of the current application system. If the sample contains a fault event, it can be judged that the system has a fault risk. At the same time, the solution and details of the sample fault event can be Provided to relevant personnel for reference. Based on this, an embodiment of the present application provides a failure risk monitoring method, which can improve the comprehensiveness and accuracy of monitoring.
图1为本申请实施例提供的故障风险监控方法的应用场景示意图。如图1所示,服务器101与监控设备102通信连接。FIG. 1 is a schematic diagram of an application scenario of a fault risk monitoring method provided by an embodiment of the present application. As shown in FIG. 1 , a
在具体实现过程中,服务器101采集并将当前运行状态发送给监控设备102,。监控设备102获取当前运行状态;所述当前运行状态包括多个运行指标的当前值;将所述当前运行状态与多个历史运行状态分别进行匹配,获得多个所述历史运行状态分别对应的匹配度;不同历史运行状态对应不同的采样时间;所述历史运行状态包括多个所述运行指标在对应采样时间的历史值;获取多个所述匹配度中最大值对应的历史运行状态下的故障事件,并根据所述故障事件生成故障风险提示。本申请实施例提供的故障风险监控方法通过获取当前的多个运行指标,并将多个运行指标与预先采集的历史指标进行匹配,并基于匹配度最大的历史运行状态下发生的故障事件,生成故障风险提示,从而能够提高监控的全面性和准确性。In a specific implementation process, the
其中,监控设备102可以为终端设备或服务器。Wherein, the
服务器101可以为一种服务器或多种服务器,当为多种服务器时,可以基于服务器的运行指标的异同情况,判断是否将多种服务器分别进行监控。The
示例性的,服务器101可以为应用服务器,应用服务器的当前运行状态可以包括多个运行指标,例如:服务资源类:CPU使用率、磁盘空间使用率、磁盘索引节点inode使用率、内存空间使用率、僵尸进程数;网络类:带宽利用率、TCP连接状态;应用程序接口(Application Programming Interface,API)类:API正确率、API相应时间;业务量类:当前在线用户数等。服务器102还可以为数据库服务器,数据库服务器的当前运行状态可以包括多个运行指标,例如:服务资源类:CPU使用率、磁盘空间使用率、磁盘索引节点inode使用率、内存空间使用率、僵尸进程数;网络类:带宽利用率、TCP连接状态;数据库类:慢SQL消耗的时间、表空间使用率、活动连接数占最大连接数的比例、预警alert日志死锁错误(ORA-60)个数等。Exemplarily, the
若服务器101位应用服务器和数据库服务器时,由于应用服务器和数据库服务器需要收集的指标项不同,在进行采样和风险评估时可以将应用服务器和数据库服务器分开考虑,对应的故障事件也可以划分明确的归属。需要注意的是,为了方便捕获问题,可以按事件报错的位置即问题的表象而非产生的根因进行分类,即分为应用服务器端发现的问题和数据库服务器端发现的问题。If 101 servers are application servers and database servers, the application servers and database servers can be considered separately during sampling and risk assessment because the indicators to be collected by the application servers and database servers are different, and the corresponding fault events can also be clearly divided belong. It should be noted that, in order to facilitate the capture of problems, they can be classified according to the location of the event error report, that is, the appearance of the problem rather than the root cause, that is, the problems found on the application server side and the problems found on the database server side.
另外,故障事件的发生也分为两种情况:一种是系统的某一项运行指标超过设定的阈值,即指标表现异常,预警系统自动发出故障预警,此类事件均对应了明确的异常指标项,可以直接根据指标项的归属判断是应用服务器端的问题或是数据库服务器端的问题;第二种是在各项指标表现正常的情况下,由运行维护人员或用户上报的故障事件,此类事件需要经人工判断来确定分类。In addition, the occurrence of fault events can also be divided into two situations: one is that a certain operating index of the system exceeds the set threshold, that is, the index performance is abnormal, and the early warning system automatically issues a fault early warning. Such events correspond to clear abnormalities Index items can be directly judged based on the attribution of the index items to determine whether it is a problem on the application server side or a problem on the database server side; Events require human judgment to determine classification.
需要说明的是,图1所示的场景示意图仅仅是一个示例,本申请实施例描述的故障风险监控方法以及场景是为了更加清楚地说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着系统的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。It should be noted that the schematic diagram of the scene shown in Figure 1 is only an example, and the fault risk monitoring method and scene described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not constitute a reference to the implementation of the present application. Those skilled in the art know that, with the evolution of the system and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solution of the present application will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
图2为本申请实施例提供的故障风险监控方法的流程示意图一。如图2所示,该方法包括:FIG. 2 is a first schematic flowchart of a failure risk monitoring method provided by an embodiment of the present application. As shown in Figure 2, the method includes:
201、获取系统的当前运行状态;所述当前运行状态包括多个运行指标的当前值。201. Acquire the current operating state of the system; the current operating state includes current values of multiple operating indicators.
本实施例的执行主体可以为终端设备或服务器,如图1所示的监控设备102。The execution subject of this embodiment may be a terminal device or a server, such as the
本实施例中的系统可以为如图1所示的服务器101安装运行的系统。The system in this embodiment may be a system installed and running on the
202、将所述当前运行状态与多个历史运行状态分别进行匹配,获得多个所述历史运行状态分别对应的匹配度;不同历史运行状态对应不同的采样时间;所述历史运行状态包括多个所述运行指标在对应采样时间的历史值。202. Match the current operating state with multiple historical operating states to obtain matching degrees corresponding to the multiple historical operating states; different historical operating states correspond to different sampling times; the historical operating states include multiple The historical value of the operating index at the corresponding sampling time.
具体的,可以遍历应用系统当前的各项运行指标,当某项运行指标超过对应的阈值时自动触发预警,将预警事件保存至事件列表中,并在前端页面提示预警的指标项和指标值。假设检测到了m个超阈值的指标项,进而可以基于m的数值进行分别处理。Specifically, the current operating indicators of the application system can be traversed, and when an operating indicator exceeds the corresponding threshold, an early warning is automatically triggered, the early warning event is saved in the event list, and the indicator item and indicator value of the early warning are prompted on the front-end page. Assuming that m index items exceeding the threshold are detected, they can be processed separately based on the value of m.
本实施例中,当前运行状态和历史运行状态的匹配方式有多种,在一些实施例中,所述将所述当前运行状态与多个历史运行状态分别进行匹配,获得多个所述历史运行状态分别对应的匹配度,可以包括:获取所述当前运行状态的第一向量和多个所述历史运行状态的第二向量;针对多个所述历史运行状态中的每个历史运行状态的第二向量,计算所述第一向量和所述第二向量之间的马氏距离,并根据所述马氏距离确定所述历史运行状态对应的匹配度。In this embodiment, there are many ways to match the current running state and the historical running state. In some embodiments, the current running state is matched with multiple historical running states to obtain multiple historical running states The matching degrees corresponding to the states may include: acquiring the first vector of the current operating state and the second vectors of the plurality of historical operating states; Two vectors, calculating the Mahalanobis distance between the first vector and the second vector, and determining the matching degree corresponding to the historical operation status according to the Mahalanobis distance.
具体的,马氏距离的计算公式可以如公式(1)所示:Specifically, the calculation formula of the Mahalanobis distance can be shown as formula (1):
其中,x,y分别代表当前系统各运行指标值向量和某个样本的各运行指标值向量,DM(x,y)为二者之间的马氏距离。∑为样本集S的协方差矩阵,∑-1为协方差矩阵的逆矩阵。Among them, x and y respectively represent the vectors of each operating index value of the current system and each operating index value vector of a certain sample, and DM(x, y) is the Mahalanobis distance between the two. Σ is the covariance matrix of the sample set S, and Σ-1 is the inverse matrix of the covariance matrix.
协方差矩阵的计算方式可参考如下公式(2):The calculation method of the covariance matrix can refer to the following formula (2):
其中,m为样本量,n为运行指标个数,ci表示第i个指标项,cov(ci,cj)=E([ci-E(ci)][cj-E(cj)]),E(ci)表示ci的平均值,cov(ci,cj)表示第i个指标和第j个指标之间的协方差。Among them, m is the sample size, n is the number of running indicators, ci indicates the i-th indicator item, cov(ci,cj)=E([ci-E(ci)][cj-E(cj)]), E (ci) represents the average value of ci, and cov(ci,cj) represents the covariance between the i-th indicator and the j-th indicator.
在一些实施例中,对于未触发阈值预警的情况,为了监控的全面性,可以进行各运行指标的全匹配,具体的,所述将所述当前运行状态与多个历史运行状态分别进行匹配,可以包括:若多个所述当前值均未超出对应的预设阈值范围,即m=0,则将所述当前运行状态与多个历史运行状态分别进行匹配。示例性的,可以计算应用系统当前各项运行指标值与样本集S中所有样本的马氏距离,并记录与当前运行指标距离最小(匹配度最大)的样本ID。In some embodiments, for the situation where the threshold warning is not triggered, for the comprehensiveness of monitoring, full matching of each operating index can be performed. Specifically, the matching of the current operating state and multiple historical operating states respectively, It may include: if none of the multiple current values exceeds the corresponding preset threshold range, that is, m=0, matching the current running state with multiple historical running states respectively. Exemplarily, the Mahalanobis distance between the current operating index values of the application system and all samples in the sample set S can be calculated, and the ID of the sample with the smallest distance (the highest matching degree) to the current operating index can be recorded.
在一些实施例中,对于已触发阈值预警的情况,为了节省计算资源,可以仅针对当前值超出阈值范围的运行指标进行匹配处理。具体的,所述获取系统的当前运行状态之后,还可以包括:若多个所述当前值中存在至少一个当前值均超出对应的预设阈值范围,则生成对应的故障预警;从多个所述历史运行状态中筛选获得多个待匹配运行状态;所述待匹配运行状态对应的故障事件包括所述故障预警对应的故障事件;所述将所述当前运行状态与多个历史运行状态分别进行匹配,获得多个所述历史运行状态分别对应的匹配度,可以包括:将所述当前运行状态与多个所述待匹配运行状态分别进行匹配,获得多个所述待匹配运行状态分别对应的匹配度。In some embodiments, when a threshold warning has been triggered, in order to save computing resources, matching processing may be performed only on operating indicators whose current values exceed the threshold range. Specifically, after acquiring the current operating state of the system, it may also include: if at least one of the multiple current values exceeds the corresponding preset threshold range, generating a corresponding fault warning; Multiple running states to be matched are obtained by screening the historical running states; the fault events corresponding to the running states to be matched include the fault events corresponding to the fault warning; Matching, obtaining matching degrees respectively corresponding to a plurality of the historical operation states may include: respectively matching the current operation state with the plurality of operation states to be matched, and obtaining the matching degrees corresponding to the plurality of operation states to be matched respectively suitability.
示例性的,对于已触发阈值预警的情况,即事件已经发生,此时只需计算系统当前的运行指标值与样本集S中所有“事件标记”为“Y”的样本的马氏距离,并记录与当前运行指标距离最小的样本ID。Exemplarily, for the situation where the threshold warning has been triggered, that is, the event has occurred, it is only necessary to calculate the Mahalanobis distance between the current operating index value of the system and all samples whose "event mark" is "Y" in the sample set S, and Record the sample ID with the smallest distance from the current running metric.
203、获取多个所述匹配度中最大值对应的历史运行状态下的故障事件,并根据所述故障事件生成故障风险提示。203. Obtain a plurality of fault events in the historical operating state corresponding to the maximum value among the matching degrees, and generate a fault risk prompt according to the fault events.
示例性的,获取到匹配度最大值对应的额最小距离样本ID后,结合该样本的“事件标记”来判断应用系统当前是否存在故障风险,“事件标记”为“Y”表示存在风险,则前端页面提示风险预警以及匹配到的样本详情和对应的事件解决分析详情;“事件标记”为“N”表示不存在风险,则前端页面无风险提示。Exemplarily, after obtaining the minimum distance sample ID corresponding to the maximum matching degree, combine the "event flag" of the sample to determine whether the application system currently has a failure risk, and the "event flag" is "Y" indicating that there is a risk, then The front-end page prompts risk warning, matched sample details, and corresponding incident resolution analysis details; "Event Mark" is "N" to indicate that there is no risk, and there is no risk warning on the front-end page.
本实施例提供的故障风险监控方法,通过获取当前的多个运行指标,并将多个运行指标与预先采集的历史指标进行匹配,并基于匹配度最大的历史运行状态下发生的故障事件,生成故障风险提示,从而能够提高监控的全面性和准确性。The fault risk monitoring method provided in this embodiment obtains multiple current operating indicators, matches the multiple operating indicators with pre-collected historical indicators, and generates fault events based on the historical operating status with the highest matching degree Failure risk prompts can improve the comprehensiveness and accuracy of monitoring.
图3为本申请实施例提供的故障风险监控方法的流程示意图二。如图3所示,在上述实施例的基础上,例如在图2所示实施例的基础上,本实施例中对历史运行状态,即风险评估样本集的生成与维护过程进行了示例性说明,该方法包括:FIG. 3 is a second schematic flow diagram of the failure risk monitoring method provided by the embodiment of the present application. As shown in FIG. 3, on the basis of the above-mentioned embodiments, for example, on the basis of the embodiment shown in FIG. 2, this embodiment exemplifies the historical operation status, that is, the generation and maintenance process of the risk assessment sample set. , the method includes:
301、将预设采集周期划分为多个时间区间。301. Divide the preset collection period into multiple time intervals.
302、针对每个时间区间,根据所述时间区间对应的采样频率进行历史运行状态的采集,并将采集的历史运行状态与对应采集时刻发生的故障事件进行关联存储。302. For each time interval, collect historical operation statuses according to the sampling frequency corresponding to the time intervals, and store the collected historical operation statuses in association with fault events occurring at corresponding collection times.
具体的,应用系统的各项运行指标值随着时间不断发生着变化,根据经验一般每天的业务高峰期和执行批处理任务时相比其他一些时间段更容易发生故障。因此为了使采集到的样本更具有代表性,需要先对系统近期发生的生产事件进行预调查,根据生产事件发生的时间分布情况设置合理的采样频率。Specifically, the values of various operating indicators of the application system are constantly changing over time. According to experience, the daily business peak period and the execution of batch tasks are more prone to failure than other time periods. Therefore, in order to make the collected samples more representative, it is necessary to conduct a pre-investigation on the recent production events of the system, and set a reasonable sampling frequency according to the time distribution of production events.
在一些实施例中,为了节省计算资源,不同时间区间采用不同采样频率,采样频率的确定方式可以包括:根据置信度需求和抽样误差需求,确定故障事件的需求量;根据所述需求量采集多个故障事件;将所述多个故障事件划分至多个不同的时间区间内;针对每个时间区间,获取所述时间区间内发生的故障事件的数量;根据多个所述时间区间分别对应的故障事件的数量,确定多个所述时间区间分别对应的采样频率。In some embodiments, in order to save computing resources, different sampling frequencies are used in different time intervals, and the method of determining the sampling frequency may include: determining the demand of fault events according to the requirements of confidence and sampling error; fault events; divide the plurality of fault events into a plurality of different time intervals; for each time interval, obtain the number of fault events occurring in the time interval; The number of events determines the sampling frequencies respectively corresponding to the multiple time intervals.
在一些实施例中,为了采样频率设置的更加合理,可以基于软件功能的波动周期等参数来确定。具体的,所述根据多个所述时间区间分别对应的故障事件的数量,确定多个所述时间区间分别对应的采样频率,可以包括:确定所述预设采集周期对应的采集次数;针对每个时间区间,计算所述时间区间对应的故障事件的数量与所述多个故障事件的总量之间的比值,并根据所述比值和所述采集次数确定所述时间区间对应的采样频率。In some embodiments, in order to set the sampling frequency more reasonably, it may be determined based on parameters such as the fluctuation period of the software function. Specifically, the determining the sampling frequencies corresponding to the multiple time intervals according to the number of fault events respectively corresponding to the multiple time intervals may include: determining the number of acquisitions corresponding to the preset acquisition period; time interval, calculate the ratio between the number of fault events corresponding to the time interval and the total number of the plurality of fault events, and determine the sampling frequency corresponding to the time interval according to the ratio and the number of acquisitions.
在一些实施例中,样本集的覆盖时长的确定可以参考软件迭代频率,以提高合理性。具体的,可以根据软件迭代频率确定目标时长;所述针对每个时间区间,根据所述时间区间对应的采样频率进行历史运行状态的采集,可以包括:针对所述目标时长对应的多个预设采集周期内的每个时间区间,根据所述时间区间对应的采样频率进行历史运行状态的采集,并将采集的历史运行状态加入样本集;所述将所述当前运行状态与多个历史运行状态分别进行匹配,可以包括:将所述当前运行状态与所述样本集中的多个历史运行状态分别进行匹配。In some embodiments, the determination of the coverage duration of the sample set may refer to software iteration frequency to improve rationality. Specifically, the target duration can be determined according to the software iteration frequency; for each time interval, the collection of the historical running state according to the sampling frequency corresponding to the time interval may include: multiple presets corresponding to the target duration For each time interval in the collection cycle, collect the historical running state according to the sampling frequency corresponding to the time interval, and add the collected historical running state to the sample set; the described current running state and multiple historical running states Performing the matching separately may include: respectively matching the current running status with multiple historical running statuses in the sample set.
示例性的,首先,可以根据分类型变量总体估计公式估算预调查样本数量,计算公式如下公式(3)所示:Exemplarily, first, the sample size of the pre-survey can be estimated according to the general estimation formula of the classification variable, and the calculation formula is shown in the following formula (3):
其中,n为样本容量,z为根据置信区间查z值表获得,p为目标总体的比例期望值,δ为抽样误差范围。Among them, n is the sample size, z is obtained by looking up the z value table according to the confidence interval, p is the expected value of the proportion of the target population, and δ is the range of sampling error.
示例性的,假如设置信区间为95%(z值为1.96),抽样误差范围为4%,p(1-p)取最大值0.25,得到的样本容量n为600.25,即在置信水平95%、抽样误差范围4%的情况下,则需要收集软件最近发生的601笔生产事件数据进行调查分析。Exemplarily, if the confidence interval is set to 95% (z value is 1.96), the sampling error range is 4%, and the maximum value of p(1-p) is 0.25, the obtained sample size n is 600.25, which is at the confidence level of 95%. , If the sampling error range is 4%, it is necessary to collect the data of 601 recent production events of the software for investigation and analysis.
其次,可以根据上述估算的样本容量收集齐预调查样本,按一天24个小时划分为多个时间区间,例如可以划分为24个时间区间,即[0:00,1:00),[1:00,2:00),[2:00,3:00),……,[23:00,24:00),统计每个时间区间内发生的事件数量,并计算各区间内事件数量占样本总量的比例:b1,b2,b3,b4,……,b24。Secondly, the pre-survey samples can be collected according to the sample size estimated above, and divided into multiple time intervals according to 24 hours a day, for example, it can be divided into 24 time intervals, namely [0:00, 1:00), [1: 00, 2:00), [2:00, 3:00), ..., [23:00, 24:00), count the number of events that occur in each time interval, and calculate the number of events in each interval as a percentage of the sample The proportion of the total amount: b1, b2, b3, b4, ..., b24.
再次,可以根据上述得到的各时间区间事件数量占比b1,b2,b3,b4,……,b24,设置在不同时间区间抽样的频率。事件数量占比越高,说明在该时间区间内系统更容易发生故障,因此抽样的频率应该越高。合理假设平均每n分钟对软件的各项性能运行指标进行一次采样(因为在n分钟内软件的各项性能指标一般不会发生大的波动,即使指标有较大波动也可以在下个n分钟周期内收集到,持续较短时间的指标值突变可视作噪声,例如n可以为5),则一天内共采样288次,得到每个区间的采样频率为b1×288,b2×288,……,b24×288,记作:f1,f2,f3,……,f24。Thirdly, the frequency of sampling in different time intervals can be set according to the proportions b1, b2, b3, b4, . . . , b24 of the number of events in each time interval obtained above. The higher the proportion of the number of events, it means that the system is more likely to fail during this time interval, so the sampling frequency should be higher. It is reasonable to assume that the various performance indicators of the software are sampled every n minutes on average (because the various performance indicators of the software generally do not fluctuate greatly within n minutes, even if the indicators fluctuate greatly, they can be sampled in the next n minutes period. The index value mutation collected within a short period of time can be regarded as noise, for example, n can be 5), then a total of 288 samples are taken in one day, and the sampling frequency of each interval is b1×288, b2×288,… , b24×288, denoted as: f1, f2, f3, ..., f24.
又次,除超阈值触发自动预警外,其他的故障事件需要人工维护事件信息,或者对接已有的运维管理系统获取事件信息。风险估计系统可以根据非自动预警事件发生的时间就近匹配当时采集的样本,并将事件信息补充到样本所关联的事件列表中。Thirdly, in addition to the automatic warning triggered by exceeding the threshold, other fault events require manual maintenance of event information, or docking with the existing operation and maintenance management system to obtain event information. The risk estimation system can match the samples collected at that time according to the time when the non-automatic early warning event occurred, and add the event information to the event list associated with the sample.
最后,样本覆盖的时间周期可以结合软件迭代频率、发生故障的次数来综合考虑确定,例如采集系统近15天运行的样本和近180天发生故障事件的样本,如此设定是因为大部分时间系统都是在正常运行,多台服务器每5分钟采集一次数据,15天采集到的样本量已足够代表系统的正常运行状态(一台服务器15天的样本量为4320),相比而言故障事件只是偶发事件,需要拉长抽样周期才能获得足够的样本。每采集完新的一天的样本,风险评估系统将自动清除已过期的样本,完成样本集S的自动更新。Finally, the time period covered by the sample can be comprehensively considered and determined in combination with the software iteration frequency and the number of failures. For example, collecting the samples of the system running in the past 15 days and the samples of the failure events in the past 180 days. This setting is because most of the time the system They are all running normally, and multiple servers collect data every 5 minutes. The sample size collected in 15 days is enough to represent the normal operation status of the system (the sample size of a server in 15 days is 4320), compared to failure events It is only an occasional event, and the sampling period needs to be lengthened to obtain enough samples. Every time a new day's samples are collected, the risk assessment system will automatically clear expired samples and complete the automatic update of the sample set S.
303、获取系统的当前运行状态;所述当前运行状态包括多个运行指标的当前值。303. Acquire the current operating state of the system; the current operating state includes current values of multiple operating indicators.
304、将所述当前运行状态与多个历史运行状态分别进行匹配,获得多个所述历史运行状态分别对应的匹配度;不同历史运行状态对应不同的采样时间;所述历史运行状态包括多个所述运行指标在对应采样时间的历史值。304. Match the current running state with multiple historical running states respectively to obtain matching degrees corresponding to the multiple historical running states; different historical running states correspond to different sampling times; the historical running state includes multiple The historical value of the operating index at the corresponding sampling time.
305、获取多个所述匹配度中最大值对应的历史运行状态下的故障事件,并根据所述故障事件生成故障风险提示。305. Obtain multiple fault events in historical operating states corresponding to the maximum value among the matching degrees, and generate a fault risk prompt according to the fault events.
本实施例中步骤303至步骤305与上述实施例中步骤201至步骤203相类似,此处不再赘述。
本实施例提供的故障风险监控方法,通过针对不同的时间分区采用不同的采样频率,使得采集到的样本更具代表性。从而能够减少采集量,增大样本采集的针对性,并且在后续计算中提高计算效率,节约计算资源。The failure risk monitoring method provided in this embodiment makes the collected samples more representative by adopting different sampling frequencies for different time zones. Therefore, the amount of collection can be reduced, the pertinence of sample collection can be increased, and the calculation efficiency can be improved in subsequent calculations, saving calculation resources.
为了更清楚的说明样本集S的构成,以下以对应用服务器集群进行样本采集的过程进行示例说明。In order to illustrate the composition of the sample set S more clearly, the following uses an example to illustrate the sample collection process of the application server cluster.
图4为本申请实施例提供的针对应用服务器集群的抽样示意图,如图4所示,应用通常会同时部署多个服务器节点,可以分别针对各服务器节点进行抽样,样本信息包括采样时间、各项运行指标值以及对应的故障事件,如图中指标列表和事件列表之间通过样本ID进行关联。指标列表中的“事件标记”可以标记样本是否存在对应的事件,若存在则“事件标记”取“Y”,不存在取“N”。事件列表存储的信息可以根据需要扩展,比如手工录入或者对接已有的运维系统获取事件分析和解决的细节,为识别出的故障风险提供排查和解决的决策支持。数据库服务器集群按类似的方式进行抽样。Fig. 4 is a schematic diagram of sampling for an application server cluster provided by the embodiment of the present application. As shown in Fig. 4, an application usually deploys multiple server nodes at the same time, and each server node can be sampled separately. The sample information includes sampling time, various Run indicator values and corresponding fault events, as shown in the figure, the indicator list and event list are associated through the sample ID. The "event mark" in the index list can mark whether there is a corresponding event in the sample. If there is, the "event mark" will be "Y", and if it does not exist, it will be "N". The information stored in the event list can be expanded according to needs, such as manual entry or docking with the existing operation and maintenance system to obtain details of event analysis and resolution, and provide decision support for troubleshooting and resolution of identified failure risks. Database server clusters are sampled in a similar fashion.
图5为本申请实施例提供的故障风险监控设备的结构示意图。如图5所示,该故障风险监控设备50包括:获取模块501、匹配模块502以及生成模块503。FIG. 5 is a schematic structural diagram of a failure risk monitoring device provided in an embodiment of the present application. As shown in FIG. 5 , the failure risk monitoring device 50 includes: an
获取模块501,用于获取系统的当前运行状态;所述当前运行状态包括多个运行指标的当前值;An
匹配模块502,用于将所述当前运行状态与多个历史运行状态分别进行匹配,获得多个所述历史运行状态分别对应的匹配度;不同历史运行状态对应不同的采样时间;所述历史运行状态包括多个所述运行指标在对应采样时间的历史值;The
生成模块503,用于获取多个所述匹配度中最大值对应的历史运行状态下的故障事件,并根据所述故障事件生成故障风险提示。The
本申请实施例提供的故障风险监控设备,通过获取当前的多个运行指标,并将多个运行指标与预先采集的历史指标进行匹配,并基于匹配度最大的历史运行状态下发生的故障事件,生成故障风险提示,从而能够提高监控的全面性和准确性。The fault risk monitoring device provided in the embodiment of the present application obtains multiple current operating indicators and matches the multiple operating indicators with pre-collected historical indicators, and based on the fault events that occur in the historical operating state with the highest matching degree, Generate failure risk prompts, which can improve the comprehensiveness and accuracy of monitoring.
本申请实施例提供的故障风险监控设备,可用于执行上述的方法实施例,其实现原理和技术效果类似,本实施例此处不再赘述。The failure risk monitoring device provided in the embodiment of the present application can be used to implement the above method embodiment, and its implementation principle and technical effect are similar, so this embodiment will not repeat them here.
图6为本申请实施例提供的故障风险监控设备的硬件结构示意图,该设备可以是终端设备或服务器。FIG. 6 is a schematic diagram of a hardware structure of a failure risk monitoring device provided in an embodiment of the present application, and the device may be a terminal device or a server.
设备60可以包括以下一个或多个组件:处理组件601,存储器602,电源组件603,多媒体组件604,音频组件605,输入/输出(I/O)接口606,传感器组件607,以及通信组件608。
处理组件601通常控制装置60的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件601可以包括一个或多个处理器609来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件601可以包括一个或多个模块,便于处理组件601和其他组件之间的交互。例如,处理组件601可以包括多媒体模块,以方便多媒体组件604和处理组件601之间的交互。The
存储器602被配置为存储各种类型的数据以支持在装置60的操作。这些数据的示例包括用于在装置60上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器602可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件603为装置60的各种组件提供电力。电源组件603可以包括电源管理系统,一个或多个电源,及其他与为装置60生成、管理和分配电力相关联的组件。The
多媒体组件604包括在所述装置60和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件604包括一个前置摄像头和/或后置摄像头。当装置60处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The
音频组件605被配置为输出和/或输入音频信号。例如,音频组件605包括一个麦克风(MIC),当装置60处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器602或经由通信组件608发送。在一些实施例中,音频组件605还包括一个扬声器,用于输出音频信号。The
I/O接口606为处理组件601和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/
传感器组件607包括一个或多个传感器,用于为装置60提供各个方面的状态评估。例如,传感器组件607可以检测到装置60的打开/关闭状态,组件的相对定位,例如所述组件为装置60的显示器和小键盘,传感器组件607还可以检测装置60或装置60一个组件的位置改变,用户与装置60接触的存在或不存在,装置60方位或加速/减速和装置60的温度变化。传感器组件607可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件607还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件607还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件608被配置为便于装置60和其他设备之间有线或无线方式的通信。装置60可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件608经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件608还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The
在示例性实施例中,装置60可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment,
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器602,上述指令可由装置60的处理器609执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as the
上述的计算机可读存储介质,上述可读存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。可读存储介质可以是通用或专用计算机能够存取的任何可用介质。The above-mentioned computer-readable storage medium, the above-mentioned readable storage medium can be realized by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable Programmable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
一种示例性的可读存储介质耦合至处理器,从而使处理器能够从该可读存储介质读取信息,且可向该可读存储介质写入信息。当然,可读存储介质也可以是处理器的组成部分。处理器和可读存储介质可以位于专用集成电路(Application Specific IntegratedCircuits,简称:ASIC)中。当然,处理器和可读存储介质也可以作为分立组件存在于设备中。An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium can also be a component of the processor. The processor and the readable storage medium may be located in application specific integrated circuits (Application Specific Integrated Circuits, ASIC for short). Of course, the processor and the readable storage medium can also exist in the device as discrete components.
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above method embodiments can be completed by program instructions and related hardware. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps including the above-mentioned method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.
本申请实施例还提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时,实现如上故障风险监控设备执行的故障风险监控方法。An embodiment of the present application further provides a computer program product, including a computer program, and when the computer program is executed by a processor, the above failure risk monitoring method performed by the failure risk monitoring device is implemented.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit it; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present application. scope.
Claims (12)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211520954.6A CN115729783B (en) | 2022-11-30 | 2022-11-30 | Fault risk monitoring method, device, storage medium and program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211520954.6A CN115729783B (en) | 2022-11-30 | 2022-11-30 | Fault risk monitoring method, device, storage medium and program product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115729783A true CN115729783A (en) | 2023-03-03 |
CN115729783B CN115729783B (en) | 2025-01-21 |
Family
ID=85299515
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211520954.6A Active CN115729783B (en) | 2022-11-30 | 2022-11-30 | Fault risk monitoring method, device, storage medium and program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115729783B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116117827A (en) * | 2023-04-13 | 2023-05-16 | 北京奔驰汽车有限公司 | Industrial robot state monitoring method and device |
CN116660660A (en) * | 2023-06-06 | 2023-08-29 | 南京志卓电子科技有限公司 | Train power supply safety monitoring system and method |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107707376A (en) * | 2017-06-09 | 2018-02-16 | 贵州白山云科技有限公司 | A kind of method and system for monitoring and alerting |
CN110245053A (en) * | 2019-06-20 | 2019-09-17 | 中国工商银行股份有限公司 | Failure predication diagnostic method and system |
CN111949501A (en) * | 2020-08-14 | 2020-11-17 | 中国工商银行股份有限公司 | IT system operation risk monitoring method and device |
CN112115031A (en) * | 2020-09-29 | 2020-12-22 | 中国银行股份有限公司 | Cluster state monitoring method and device |
CN113127315A (en) * | 2020-01-16 | 2021-07-16 | 中移(苏州)软件技术有限公司 | Message queue fault prediction method, device, equipment and storage medium |
WO2021164267A1 (en) * | 2020-02-21 | 2021-08-26 | 平安科技(深圳)有限公司 | Anomaly detection method and apparatus, and terminal device and storage medium |
CN113377559A (en) * | 2020-03-10 | 2021-09-10 | 北京同邦卓益科技有限公司 | Big data based exception handling method, device, equipment and storage medium |
CN114238058A (en) * | 2021-12-21 | 2022-03-25 | 建信金融科技有限责任公司 | Monitoring method, apparatus, device, medium, and program product |
CN115129731A (en) * | 2022-06-28 | 2022-09-30 | 珠海格力电器股份有限公司 | Database updating method and device, electronic equipment and storage medium |
-
2022
- 2022-11-30 CN CN202211520954.6A patent/CN115729783B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107707376A (en) * | 2017-06-09 | 2018-02-16 | 贵州白山云科技有限公司 | A kind of method and system for monitoring and alerting |
CN110245053A (en) * | 2019-06-20 | 2019-09-17 | 中国工商银行股份有限公司 | Failure predication diagnostic method and system |
CN113127315A (en) * | 2020-01-16 | 2021-07-16 | 中移(苏州)软件技术有限公司 | Message queue fault prediction method, device, equipment and storage medium |
WO2021164267A1 (en) * | 2020-02-21 | 2021-08-26 | 平安科技(深圳)有限公司 | Anomaly detection method and apparatus, and terminal device and storage medium |
CN113377559A (en) * | 2020-03-10 | 2021-09-10 | 北京同邦卓益科技有限公司 | Big data based exception handling method, device, equipment and storage medium |
CN111949501A (en) * | 2020-08-14 | 2020-11-17 | 中国工商银行股份有限公司 | IT system operation risk monitoring method and device |
CN112115031A (en) * | 2020-09-29 | 2020-12-22 | 中国银行股份有限公司 | Cluster state monitoring method and device |
CN114238058A (en) * | 2021-12-21 | 2022-03-25 | 建信金融科技有限责任公司 | Monitoring method, apparatus, device, medium, and program product |
CN115129731A (en) * | 2022-06-28 | 2022-09-30 | 珠海格力电器股份有限公司 | Database updating method and device, electronic equipment and storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116117827A (en) * | 2023-04-13 | 2023-05-16 | 北京奔驰汽车有限公司 | Industrial robot state monitoring method and device |
CN116660660A (en) * | 2023-06-06 | 2023-08-29 | 南京志卓电子科技有限公司 | Train power supply safety monitoring system and method |
CN116660660B (en) * | 2023-06-06 | 2023-10-20 | 南京志卓电子科技有限公司 | Train power supply safety monitoring system and method |
Also Published As
Publication number | Publication date |
---|---|
CN115729783B (en) | 2025-01-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107391746A (en) | Log analysis method, equipment and computer-readable recording medium | |
CN115729783A (en) | Failure risk monitoring method, device, storage medium and program product | |
CN106548402B (en) | Resource transfer monitoring method and device | |
US20150195157A1 (en) | Data Metrics Analytics | |
WO2018059122A1 (en) | Service recommendation method, terminal, server, and storage medium | |
CN112465237B (en) | Fault prediction method, device, equipment and storage medium based on big data analysis | |
CN112948614A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN111324511A (en) | Alarm rule generation method and device, electronic equipment and storage medium | |
WO2019085754A1 (en) | Application cleaning method and apparatus, and storage medium and electronic device | |
CN112182295A (en) | Business processing method and device based on behavior prediction and electronic equipment | |
CN110543410A (en) | Method for processing cluster index, method and device for inquiring cluster index | |
CN111913850B (en) | Data anomaly detection method, device, equipment and storage medium | |
CN115827398B (en) | Calculation method, device, electronic equipment and storage medium of alarm information component value | |
CN114936040B (en) | Program data processing method, device, electronic equipment and storage medium | |
US8984127B2 (en) | Diagnostics information extraction from the database signals with measureless parameters | |
CN107797924B (en) | SQL script abnormity detection method and terminal thereof | |
CN116974869A (en) | Index data monitoring method and device, electronic equipment and storage medium | |
CN116541238A (en) | Log file acquisition method and device, electronic equipment and readable storage medium | |
CN113626806A (en) | Data monitoring method and data monitoring device | |
CN114443407A (en) | Detection method and system of server, electronic equipment and storage medium | |
CN110738571A (en) | transaction risk control method and related device | |
CN114495009B (en) | Object determination method and system | |
US20240012733A1 (en) | System and method for quantifying digital experiences | |
CN113364602B (en) | Method, device and storage medium for triggering page fault alarm | |
CN117454196B (en) | Anomaly detection method, device, equipment and medium based on time sequence prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |