[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113918376B - Fault detection method, device, equipment and computer readable storage medium - Google Patents

Fault detection method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN113918376B
CN113918376B CN202111519411.8A CN202111519411A CN113918376B CN 113918376 B CN113918376 B CN 113918376B CN 202111519411 A CN202111519411 A CN 202111519411A CN 113918376 B CN113918376 B CN 113918376B
Authority
CN
China
Prior art keywords
data
operation data
abnormal
fault
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111519411.8A
Other languages
Chinese (zh)
Other versions
CN113918376A (en
Inventor
杨平
李奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Smart Grain Safety Technology Hunan Co ltd
Original Assignee
Hunan Tianyun Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Tianyun Software Technology Co ltd filed Critical Hunan Tianyun Software Technology Co ltd
Priority to CN202111519411.8A priority Critical patent/CN113918376B/en
Publication of CN113918376A publication Critical patent/CN113918376A/en
Application granted granted Critical
Publication of CN113918376B publication Critical patent/CN113918376B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a fault detection method, a fault detection device, fault detection equipment and a computer readable storage medium. The fault detection method comprises the steps of obtaining first operation data of server equipment at the current time point; acquiring second operation data which is positioned outside a target confidence interval in the first operation data, wherein the target confidence interval comprises a confidence interval determined according to historical operation data of a historical time point corresponding to the current time point; inputting the second operation data into the trained abnormal detection model, and detecting whether the second operation data is abnormal data through the abnormal detection model to obtain a detection result; determining whether the abnormal data meets a preset fault condition or not under the condition that the detection result indicates that the second operation data is the abnormal data; and under the condition that the abnormal data meet the preset fault condition, determining that the server equipment fails. According to the embodiment of the application, the accuracy of detection of the server equipment can be improved.

Description

Fault detection method, device, equipment and computer readable storage medium
Technical Field
The invention belongs to the field of intelligent operation and maintenance, and particularly relates to a fault detection method, a fault detection device, fault detection equipment and a computer readable storage medium.
Background
With the popularization of the internet, data generated by applying the internet is more and more, and a large-scale data center is produced in order to store and process the data.
In large data centers, servers are important units for storing and processing data. Fault detection of the server device becomes extremely important.
The current detection of server failure mainly adopts a method of setting a fixed failure baseline, under the condition that the acquired operation data of the server equipment exceeds the failure baseline, the acquired operation data is judged to be failure data, and further, the server equipment generating the failure data is determined to have failure. However, since the set fault baseline is usually determined empirically, the set fault baseline is not accurate, and thus the determination of fault data is not accurate, and the fault detection of the server device is not accurate.
Disclosure of Invention
The embodiment of the invention provides a fault detection method, a fault detection device, equipment and a computer readable storage medium, which can improve the accuracy of server equipment detection.
In a first aspect, an embodiment of the present invention provides a fault detection method, where the method includes:
acquiring first operation data of server equipment at a current time point;
acquiring second operation data which is positioned outside a target confidence interval in the first operation data, wherein the target confidence interval comprises a confidence interval determined according to historical operation data of a historical time point corresponding to the current time point;
inputting the second operation data into the trained abnormal detection model, and detecting whether the second operation data is abnormal data through the abnormal detection model to obtain a detection result;
determining whether the abnormal data meets a preset fault condition or not under the condition that the detection result indicates that the second operation data is the abnormal data;
and under the condition that the abnormal data meet the preset fault condition, determining that the server equipment fails.
In some embodiments, prior to obtaining the second operating data of the first operating data that is outside of the confidence interval, the method further comprises:
acquiring historical operating data of a plurality of historical time points corresponding to the current time point; the current time point and the historical time point are time points located at the same position in different periods;
and determining the confidence interval of the historical operating data under the probability distribution as a target confidence interval.
In some embodiments, determining whether the abnormal data meets a preset fault condition specifically includes:
recording abnormal data under the condition that the difference value between the abnormal data and the reference value meets a first preset threshold value;
and under the condition that the recorded abnormal data quantity meets a second preset threshold value, determining that the abnormal data meets a preset fault condition.
In some embodiments, after obtaining the first operation data of the server device at the current time point, the fault detection method further includes:
acquiring third operation data positioned in the target confidence interval in the first operation data;
and modifying the fault state into a normal state when the server equipment corresponding to the third operation data is in the fault state.
In some embodiments, after determining that the server device fails, the failure detection method further includes:
and sending the server equipment identifier with the fault and the fault information to a target system so that the target system generates alarm information according to the server equipment identifier with the fault and the fault information.
In a second aspect, an embodiment of the present invention provides a fault detection apparatus, including:
the first acquisition module is used for acquiring first operating data of the server equipment at the current time point;
the second acquisition module is used for acquiring second operation data which is positioned outside a target confidence interval in the first operation data, and the target confidence interval comprises a confidence interval determined according to historical operation data of a historical time point corresponding to the current time point;
the input module is used for inputting the second operation data into the trained abnormal detection model, and detecting whether the second operation data is abnormal data or not through the abnormal detection model to obtain a detection result;
the first determining module is used for determining whether the abnormal data meet a preset fault condition or not under the condition that the detection result indicates that the second operation data are the abnormal data;
and the second determining module is used for determining that the server equipment fails under the condition that the abnormal data meets the preset failure condition.
In some embodiments, the fault detection device further comprises:
the third acquisition module is used for acquiring historical operating data of a plurality of historical time points corresponding to the current time point before acquiring second operating data which is positioned outside the confidence interval in the first operating data; the current time point and the historical time point are time points located at the same position in different periods;
and the third determining module is used for determining the confidence interval of the historical operating data under the probability distribution as the target confidence interval.
In some embodiments, the second determining module comprises:
the recording unit is used for recording the abnormal data under the condition that the difference value between the abnormal data and the reference value meets a first preset threshold value;
and the determining unit is used for determining that the abnormal data meet the preset fault condition under the condition that the recorded abnormal data quantity meets a second preset threshold value.
In some embodiments, the fault detection device further comprises:
the third acquisition module is used for acquiring third operating data which is positioned in the target confidence interval in the first operating data after acquiring the first operating data of the server equipment at the current time point;
and the modification module is used for modifying the fault state into a normal state when the server equipment corresponding to the third operation data is in the fault state.
In some embodiments, the fault detection device further comprises:
and the sending module is used for sending the server equipment identifier with the fault and the fault information to the target system after the server equipment is determined to have the fault, so that the target system generates alarm information according to the server equipment identifier with the fault and the fault information.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor and a memory storing computer program instructions;
the steps of the fault detection method as in any of the embodiments of the first aspect are implemented when the processor executes the computer program instructions.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which computer program instructions are stored, and when executed by a processor, the computer program instructions implement the steps of the fault detection method as in any one of the embodiments of the first aspect.
In a fifth aspect, an embodiment of the present invention provides a computer program product, where instructions in the computer program product, when executed by a processor of an electronic device, enable the electronic device to perform the fault detection method as in any one of the embodiments of the first aspect.
According to the fault detection method, the fault detection device, the fault detection equipment and the computer readable storage medium, historical operating data of a historical time point corresponding to a current time point are obtained, a confidence interval corresponding to the historical operating data is determined according to the obtained historical operating data, second operating data in first operating data located outside the confidence interval are input into a trained abnormal detection model, and whether the abnormal data meet a fault condition or not is determined under the condition that the second operating data are determined to be abnormal data by the abnormal detection model, so that whether the server equipment breaks down or not is determined. Therefore, the problem that the operation amount of the abnormal detection model is overlarge due to the fact that all collected first operation data are input into the abnormal detection model is solved because the second operation data in the first operation data are screened through the confidence interval, and the abnormal data can be judged more accurately because whether the second operation data are abnormal data or not is judged through the abnormal detection model, and further, the efficiency and the accuracy of fault detection of the server equipment are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of an embodiment of a fault detection method according to the present invention;
fig. 2 is a schematic structural diagram of an embodiment of a fault detection apparatus provided in the embodiment of the present invention;
fig. 3 is a schematic structural diagram of an embodiment of an electronic device according to an embodiment of the present invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
At present, when detecting a fault of a server device, a baseline of server operation data is usually defined by the experience of operation and maintenance personnel, and when the server device operation data exceeds the set baseline, it is determined that the server device is faulty. However, since the baseline of the server operation data defined by the experience of the operation and maintenance personnel is not accurate, and the operation data of the server equipment at different time periods are different, the fault detection of the server equipment is not accurate in the current fault detection process of the server equipment.
In order to solve the problem of the prior art, embodiments of the present invention provide a fault detection method, apparatus, device, and computer storage medium.
The following first introduces a fault detection method provided by an embodiment of the present invention.
Fig. 1 is a schematic flow chart illustrating a fault detection method according to an embodiment of the present invention. As shown in fig. 1, the method may include the steps of:
s110, acquiring first operation data of the server equipment at the current time point;
s120, second operation data outside a target confidence interval in the first operation data are obtained, and the target confidence interval comprises a confidence interval determined according to historical operation data of a historical time point corresponding to the current time point;
s130, inputting the second operation data into the trained abnormal detection model, and detecting whether the second operation data is abnormal data or not through the abnormal detection model to obtain a detection result;
s140, under the condition that the detection result indicates that the second operation data are abnormal data, determining whether the abnormal data meet a preset fault condition;
and S150, determining that the server equipment fails under the condition that the abnormal data meet the preset failure condition.
Therefore, historical operation data of a historical time point corresponding to the current time point are obtained, a confidence interval corresponding to the historical operation data is determined according to the obtained historical operation data, second operation data in first operation data located outside the confidence interval are input into a trained abnormity detection model, and whether the abnormal data meet a fault condition is determined under the condition that the second operation data are determined to be abnormal data by the abnormity detection model, so that whether the server equipment fails is determined. Therefore, the problem that the operation amount of the abnormal detection model is overlarge due to the fact that all collected first operation data are input into the abnormal detection model is solved because the second operation data in the first operation data are screened through the confidence interval, and the abnormal data can be judged more accurately because whether the second operation data are abnormal data or not is judged through the abnormal detection model, and further, the efficiency and the accuracy of fault detection of the server equipment are improved.
In some embodiments, in S110, the server device may include a server device of a data center, and the server device of the data center may include at least one of a server, a virtual machine, a network device, a storage device, a power distribution cabinet, and a host. The first operation data may include operation data generated by a server, a virtual machine, a network device, a storage device, a power distribution cabinet, or a host in operation.
In some embodiments, the fault detection device may receive the first operation data sent by the server device through network transmission or wired transmission.
In some embodiments, in order to obtain a more accurate confidence interval so as to filter the first operation data at the current time point, before S120, the fault detection method may further include:
acquiring historical operating data of a plurality of historical time points corresponding to the current time point; the current time point and the historical time point are time points located at the same position in different periods;
and determining the confidence interval of the historical operating data under the probability distribution as a target confidence interval.
The time point at which the current time point and the historical time point are at the same position in different periods may include: and dividing the preset time into a plurality of periods, wherein the position of the current time point in the current period corresponds to the position of the historical time point in the historical period.
In some embodiments, historical operating data of a plurality of historical time points corresponding to the current time period may be obtained through a time sequence model, where the time sequence model may include a KNN model.
In some embodiments, the probability distribution may include a T distribution, and for example, the acquired historical operating data corresponding to a plurality of historical time points may be arranged according to the T distribution, then a confidence interval of the T distribution of the arranged historical operating data is acquired, and then the acquired confidence interval is determined as a target confidence interval.
Therefore, the confidence interval of the historical operating data under the probability distribution is determined as the target confidence interval according to the historical operating data of the historical time point corresponding to the current time point, and the determined target confidence interval can be more accurate.
In some embodiments, after S110, the fault detection method may further include:
acquiring third operation data positioned in the target confidence interval in the first operation data;
and modifying the fault state into a normal state when the server equipment corresponding to the third operation data is in the fault state.
In some embodiments, the third operational data may include the first operational data within the target confidence interval.
In some embodiments, the operation data corresponding to the server device is continuous data within a period of time, and the data before the time point of the server device indicates that the server device has failed, but when the operation data at the current time point is within the confidence interval, the failure state of the corresponding server device may be modified to be a normal state.
Therefore, the state of the server equipment with normal operation data is modified from the fault state to the normal state, so that the state of the server equipment can be timely adjusted, and the problem of inaccurate detection caused by the fact that the server equipment is determined to be in the fault state when being in the normal state is solved.
In some embodiments, in S120, the first operating data may include second operating data, and the second operating data may be data of the first operating data that is outside the target confidence interval.
In some embodiments, the fault detection device may filter the first operation data by the target confidence interval to obtain second operation data located outside the target confidence interval in the first operation data.
In some embodiments, in S130, the anomaly detection model may include a binary model. The detection result may include that the second operation data is abnormal data or that the second operation data is normal data.
In some embodiments, prior to S130, the fault detection method may include training an anomaly detection model. Wherein training the anomaly detection model may comprise:
and training the anomaly detection model based on the multiple groups of training samples to obtain the trained anomaly detection model. Wherein each set of training samples may include: historical operating data and historical abnormal data corresponding to the historical operating data.
In some embodiments, training the anomaly detection model based on the plurality of sets of training samples may include:
for each group of training samples, the following steps are respectively carried out:
inputting each group of training samples into an anomaly detection model to obtain predicted anomaly data corresponding to historical operating data;
determining a loss function value of the anomaly detection model according to each piece of predicted anomaly data and historical anomaly data;
and under the condition that the loss function value does not meet the training stopping condition, adjusting the model parameters of the abnormality detection model, and training the abnormality detection model after parameter adjustment by using the training sample. And obtaining the trained anomaly detection model until the training stopping condition is met.
Here, the training stop condition may include a condition set by a user, and an exemplary training stop condition may include that the loss function value is less than a certain threshold value or that the number of iterations of training reaches a certain specific value.
Therefore, the abnormal operation data can be more accurately determined by the trained abnormal detection model in the determination of the abnormal operation data in advance by training the abnormal detection model.
In some embodiments, in S140, the preset fault condition may include a fault condition set by a user.
In some embodiments, determining whether the abnormal data satisfies a preset fault condition may include:
recording abnormal data under the condition that the difference value between the abnormal data and the reference value meets a first preset threshold value;
and under the condition that the recorded abnormal data quantity meets a second preset threshold value, determining that the abnormal data meets a preset fault condition.
Here, the reference value may include a reference value set empirically by a user, wherein the reference value may be determined based on historical abnormal operation data. The first preset threshold and the second preset threshold can be set in a user-defined mode according to the requirements of users.
In some specific examples, a user first determines a reference value according to historical abnormal operation data, determines whether the first operation data exceeds the reference value when the acquired first operation data at the current time point is not in a target confidence interval, and records the first operation data when the first operation data exceeds the reference value and a difference between the first operation data and the reference value meets a first preset threshold. And continuously detecting the first operation data, and determining that the fault condition is met under the condition that the recorded number of the first operation data meeting the condition exceeds a second preset threshold value.
Therefore, the first operation data are screened through a set target confidence interval to obtain second operation data in the screened first operation data, then the second operation data are detected by using an abnormal detection model, after the abnormal operation data in the second operation data are determined, the difference value between the abnormal operation data and a reference value is judged, and under the condition that the difference value between the abnormal operation data and the reference value meets a first preset threshold value, the abnormal operation data are recorded until the recorded abnormal operation data meet a second preset threshold value, and the fault condition is judged to be met. After the abnormal data are obtained, the abnormal data are judged by using the reference value, and the fault condition is determined to be met after the preset condition is met, so that the fault can be judged more accurately in the judgment of whether the fault occurs.
In some embodiments, in S150, in a case where the failure detection means determines that the abnormal data satisfies a preset failure condition, it is determined that the server device corresponding to the first operation data has failed.
In some embodiments, after S150, the fault detection method may further include: and sending the server equipment identifier with the fault and the fault information to a target system so that the target system generates alarm information according to the server equipment identifier with the fault and the fault information.
In some embodiments, the target system may include a system capable of generating alert information, and may include, for example, a detection system or an alert system.
After the fault detection device determines that the server equipment has the fault, the fault detection device sends the identifier of the server equipment with the fault and the information with the fault to the target system, so that the target system generates alarm information according to the acquired identifier of the server equipment with the fault and the information with the fault. After generating the alarm information, the target system may display the generated alarm information, so that the user may obtain the alarm information in time.
Therefore, the user can timely know the server equipment with the fault and the fault information, and the user can timely solve the fault according to the known information.
It should be noted that the application scenarios described in the foregoing disclosure are for more clearly illustrating the technical solutions of the embodiments of the disclosure, and do not constitute a limitation of the technical solutions provided in the embodiments of the disclosure, and as a person of ordinary skill in the art knows new application scenarios, the technical solutions provided in the embodiments of the disclosure are also applicable to similar technical problems.
Based on the same inventive concept, the embodiment of the present application further provides a fault detection apparatus, and the following describes in detail the fault detection apparatus provided by the embodiment of the present application with reference to fig. 2:
fig. 2 shows a schematic structural diagram of an embodiment of a fault detection apparatus 200 provided in the present application.
As shown in fig. 2, the fault detection apparatus 200 may include:
a first obtaining module 201, configured to obtain first operation data of a server device at a current time point;
the second obtaining module 202 may be configured to obtain second operation data, which is located outside a target confidence interval in the first operation data, where the target confidence interval includes a confidence interval determined according to historical operation data of a historical time point corresponding to the current time point;
the input module 203 may be configured to input the second operation data to the trained anomaly detection model, and detect whether the second operation data is anomalous data through the anomaly detection model to obtain a detection result;
the first determining module 204 may be configured to determine whether the abnormal data meets a preset fault condition when the detection result indicates that the second operation data is the abnormal data;
the second determining module 205 may be configured to determine that the server device fails when the abnormal data meets a preset failure condition.
Therefore, historical operation data of a historical time point corresponding to the current time point are obtained, a confidence interval corresponding to the historical operation data is determined according to the obtained historical operation data, second operation data in first operation data located outside the confidence interval are input into a trained abnormity detection model, and whether the abnormal data meet a fault condition is determined under the condition that the second operation data are determined to be abnormal data by the abnormity detection model, so that whether the server equipment fails is determined. Therefore, the problem that the operation amount of the abnormal detection model is overlarge due to the fact that all collected first operation data are input into the abnormal detection model is solved because the second operation data in the first operation data are screened through the confidence interval, and the abnormal data can be judged more accurately because whether the second operation data are abnormal data or not is judged through the abnormal detection model, and further, the efficiency and the accuracy of fault detection of the server equipment are improved.
In some embodiments, in order to make the determined target confidence interval more accurate, the fault detection apparatus further comprises:
the third obtaining module is used for obtaining historical operating data of a plurality of historical time points corresponding to the current time point before obtaining second operating data, which is positioned outside the confidence interval, in the first operating data; the current time point and the historical time point are time points located at the same position in different periods;
the third determination module may be configured to determine a confidence interval of the historical operating data under the probability distribution as a target confidence interval.
Therefore, the confidence interval of the historical operating data under the probability distribution is determined as the target confidence interval according to the historical operating data of the historical time point corresponding to the current time point, and the determined target confidence interval can be more accurate.
In some embodiments, to make the detection of the fault more accurate, the second determining module includes:
the recording unit can be used for recording the abnormal data under the condition that the difference value between the abnormal data and the reference value meets a first preset threshold value;
and the determining unit can be used for determining that the abnormal data meet the preset fault condition under the condition that the recorded abnormal data quantity meets a second preset threshold value.
Therefore, the abnormal operation data can be more accurately determined by the trained abnormal detection model in the determination of the abnormal operation data in advance by training the abnormal detection model.
In some embodiments, in order for the fault detection to be more accurate, the fault detection apparatus further includes:
the third obtaining module may be configured to obtain third operating data, which is located within the target confidence interval, in the first operating data after obtaining the first operating data of the server device at the current time point;
and the modification module can be used for modifying the fault state into a normal state when the server equipment corresponding to the third operation data is in the fault state.
Therefore, the state of the server equipment with normal operation data is modified from the fault state to the normal state, so that the state of the server equipment can be timely adjusted, and the problem of inaccurate detection caused by the fact that the server equipment is determined to be in the fault state when being in the normal state is solved.
In some embodiments, the determination of the fault is more accurate, and the fault detection apparatus further includes:
the sending module may be configured to send the server device identifier and the fault information that have the fault to the target system after determining that the server device has the fault, so that the target system generates the alarm information according to the server device identifier and the fault information that have the fault
Therefore, the first operation data are screened through a set target confidence interval to obtain second operation data in the screened first operation data, then the second operation data are detected by using an abnormal detection model, after the abnormal operation data in the second operation data are determined, the difference value between the abnormal operation data and a reference value is judged, and under the condition that the difference value between the abnormal operation data and the reference value meets a first preset threshold value, the abnormal operation data are recorded until the recorded abnormal operation data meet a second preset threshold value, and the fault condition is judged to be met. After the abnormal data are obtained, the abnormal data are judged by using the reference value, and the fault condition is determined to be met after the preset condition is met, so that the fault can be judged more accurately in the judgment of whether the fault occurs.
Fig. 3 shows a hardware structure diagram of an embodiment of the electronic device provided in the present application.
The electronic device 300 may comprise a processor 301 and a memory 302 in which computer program instructions are stored.
Specifically, the processor 301 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 302 may include mass storage that may be used for data or instructions. By way of example, and not limitation, memory 302 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, tape, or Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 302 may include removable or non-removable (or fixed) media, where appropriate. The memory 302 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 302 is a non-volatile solid-state memory.
The memory may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the methods according to an aspect of the application.
The processor 301 implements any of the fault detection methods in the above embodiments by reading and executing computer program instructions stored in the memory 302.
In some examples, electronic device 300 may also include a communication interface 303 and a bus 310. As shown in fig. 3, the processor 301, the memory 302, and the communication interface 303 are connected via a bus 310 to complete communication therebetween.
The communication interface 303 may be mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.
Bus 310 includes hardware, software, or both to couple the components of the online data traffic billing device to each other. By way of example, and not limitation, bus 310 may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hyper Transport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of these. Bus 310 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
Illustratively, as the payment terminal, the electronic device 300 may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like. As the code scanning terminal, the electronic device 300 may be a Point of sale (POS), a code scanner, or the like.
The electronic device may execute the fault detection method in the embodiment of the present application, so as to implement the fault detection method and apparatus described in conjunction with fig. 1 to 2.
In addition, in combination with the fault detection method in the foregoing embodiments, the embodiments of the present application may be implemented by providing a computer-readable storage medium. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the fault detection methods in the above embodiments. Examples of computer-readable storage media include non-transitory computer-readable storage media such as portable disks, hard disks, Random Access Memories (RAMs), Read Only Memories (ROMs), erasable programmable read only memories (EPROMs or flash memories), portable compact disk read only memories (CD-ROMs), optical storage devices, magnetic storage devices, and so forth.
It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.
The functional blocks shown in the above structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are programs or code segments that may be used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As will be apparent to those skilled in the art, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims (7)

1. A method of fault detection, the method comprising:
acquiring first operation data of server equipment at a current time point;
acquiring second operation data which is positioned outside a target confidence interval in the first operation data, wherein the target confidence interval comprises a confidence interval determined according to historical operation data of a historical time point corresponding to the current time point;
inputting the second operation data into a trained abnormal detection model, and detecting whether the second operation data is abnormal data or not through the abnormal detection model to obtain a detection result;
determining whether the abnormal data meets a preset fault condition or not under the condition that the detection result indicates that the second operation data is abnormal data;
determining that the server equipment fails under the condition that the abnormal data meet the preset failure condition;
before the obtaining of the second operation data outside the confidence interval in the first operation data, the method further comprises:
acquiring historical operating data of a plurality of historical time points corresponding to the current time point; the current time point and the historical time point are time points which are located at the same position in different periods;
determining a confidence interval of the historical operating data under probability distribution as a target confidence interval;
the determining whether the abnormal data meets a preset fault condition specifically includes:
recording the abnormal data under the condition that the difference value between the abnormal data and a reference value meets a first preset threshold value;
and under the condition that the recorded abnormal data quantity meets a second preset threshold value, determining that the abnormal data meets a preset fault condition.
2. The method of claim 1, wherein after the obtaining the first operational data of the server device at the current point in time, the method further comprises:
acquiring third operation data in the first operation data, wherein the third operation data is positioned in the target confidence interval;
and modifying the fault state into a normal state when the server equipment corresponding to the third operation data is in the fault state.
3. The method of claim 1, wherein after said determining that the server device is down, the method further comprises:
and sending the server equipment identifier with the fault and the fault information to a target system so that the target system generates alarm information according to the server equipment identifier with the fault and the fault information.
4. A fault detection device, characterized in that the device comprises:
the first acquisition module is used for acquiring first operating data of the server equipment at the current time point;
the second acquisition module is used for acquiring second operation data which is positioned outside a target confidence interval in the first operation data, wherein the target confidence interval comprises a confidence interval determined according to historical operation data of a historical time point corresponding to the current time point;
the input module is used for inputting the second operation data into a trained abnormal detection model, and detecting whether the second operation data is abnormal data or not through the abnormal detection model to obtain a detection result;
the first determining module is used for determining whether the abnormal data meet a preset fault condition or not under the condition that the detection result indicates that the second operation data are abnormal data;
determining that the server equipment fails under the condition that the abnormal data meet the preset failure condition;
the device further comprises:
a third obtaining module, configured to obtain historical operating data of a plurality of historical time points corresponding to the current time point before obtaining second operating data, which is located outside a confidence interval, in the first operating data; the current time point and the historical time point are time points which are located at the same position in different periods;
the second determination module is used for determining a confidence interval of the historical operating data under probability distribution as a target confidence interval;
the second determining module includes:
the recording unit is used for recording the abnormal data under the condition that the difference value between the abnormal data and the reference value meets a first preset threshold value;
and the determining unit is used for determining that the abnormal data meet the preset fault condition under the condition that the recorded abnormal data quantity meets a second preset threshold value.
5. An electronic device, characterized in that the device comprises: a processor, and a memory storing computer program instructions; the processor reads and executes the computer program instructions to implement the fault detection method of any one of claims 1-3.
6. A computer-readable storage medium having computer program instructions stored thereon which, when executed by a processor, implement the fault detection method of any one of claims 1-3.
7. A computer program product, wherein instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the fault detection method of any of claims 1-3.
CN202111519411.8A 2021-12-14 2021-12-14 Fault detection method, device, equipment and computer readable storage medium Active CN113918376B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111519411.8A CN113918376B (en) 2021-12-14 2021-12-14 Fault detection method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111519411.8A CN113918376B (en) 2021-12-14 2021-12-14 Fault detection method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113918376A CN113918376A (en) 2022-01-11
CN113918376B true CN113918376B (en) 2022-03-04

Family

ID=79248849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111519411.8A Active CN113918376B (en) 2021-12-14 2021-12-14 Fault detection method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113918376B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114844810B (en) * 2022-05-30 2024-04-26 中国建设银行股份有限公司 Heartbeat data processing method, device, equipment and medium
CN115017211A (en) * 2022-06-15 2022-09-06 平安国际融资租赁有限公司 Method and device for determining abnormality detection object, storage medium and computer equipment
CN115134164B (en) * 2022-07-18 2024-02-23 深信服科技股份有限公司 Uploading behavior detection method, system, equipment and computer storage medium
CN115499302B (en) * 2022-08-17 2024-08-06 中国电信股份有限公司 Monitoring method and device of business system, readable storage medium and electronic equipment
CN115238831B (en) * 2022-09-21 2023-04-14 中国南方电网有限责任公司超高压输电公司广州局 Fault prediction method, device, computer equipment and storage medium
CN115914575A (en) * 2022-11-11 2023-04-04 菲尼克斯(南京)智能制造技术工程有限公司 Equipment working condition capturing system and method
CN116208532B (en) * 2022-12-30 2024-10-18 苏州浪潮智能科技有限公司 Abnormality detection method, abnormality detection device, storage medium, and electronic apparatus
CN115858311A (en) * 2023-03-04 2023-03-28 北京神州光大科技有限公司 Operation and maintenance monitoring method and device, electronic equipment and readable storage medium
CN116796229A (en) * 2023-06-21 2023-09-22 北京优特捷信息技术有限公司 Equipment fault detection method, device, equipment and storage medium
CN117310394A (en) * 2023-11-29 2023-12-29 天津市英环信诚科技有限公司 Big data-based power failure detection method and device, electronic equipment and medium
CN118094450B (en) * 2024-04-26 2024-07-09 江苏中天互联科技有限公司 Fault early warning method and related equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933500A (en) * 2019-03-27 2019-06-25 新奥数能科技有限公司 Equipment fault alarm method, device, readable medium and electronic equipment
CN112162878A (en) * 2020-09-30 2021-01-01 深圳前海微众银行股份有限公司 Database fault discovery method and device, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548253A (en) * 2016-11-08 2017-03-29 中国地质大学(武汉) Method based on the wind power prediction of nonparametric probability
CN110631624B (en) * 2019-09-04 2020-12-15 精英数智科技股份有限公司 Method, device and system for identifying abnormal operation data of mine sensor
CN111669123B (en) * 2020-05-11 2021-12-17 国家能源集团新能源技术研究院有限公司 Method and device for fault diagnosis of photovoltaic string
CN112504505A (en) * 2020-08-31 2021-03-16 中国能源建设集团安徽省电力设计院有限公司 High-voltage tunnel cable overheating early warning method based on multivariate state estimation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933500A (en) * 2019-03-27 2019-06-25 新奥数能科技有限公司 Equipment fault alarm method, device, readable medium and electronic equipment
CN112162878A (en) * 2020-09-30 2021-01-01 深圳前海微众银行股份有限公司 Database fault discovery method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113918376A (en) 2022-01-11

Similar Documents

Publication Publication Date Title
CN113918376B (en) Fault detection method, device, equipment and computer readable storage medium
CN111126824B (en) Multi-index correlation model training method and multi-index anomaly analysis method
CN108235303B (en) Method, device, equipment and medium for identifying shared flow users
CN111224807B (en) Distributed log processing method, device, equipment and computer storage medium
CN109995555B (en) Monitoring method, device, equipment and medium
CN112214577B (en) Method, device, equipment and computer storage medium for determining target user
CN114331046A (en) Alarm event processing method, device, equipment and computer storage medium
CN114143036A (en) Alarm method, device, equipment and computer storage medium
CN117692216A (en) Abnormal login behavior management method and device, storage medium and electronic equipment
CN115392812B (en) Abnormal root cause positioning method, device, equipment and medium
CN111611097B (en) Fault detection method, device, equipment and storage medium
CN111064719B (en) Method and device for detecting abnormal downloading behavior of file
CN114844762B (en) Alarm authenticity detection method and device
CN106681906A (en) Method and device for detecting abnormal operation of fingerprint module and terminal device
CN115705413A (en) Method and device for determining abnormal log
CN113986659A (en) Fault analysis method, device, equipment and computer storage medium
CN115225170A (en) Method and device for testing shielding effect of shielding device
CN114741690A (en) Network security monitoring method, device, equipment and computer storage medium
CN114240446A (en) Data processing method, device, equipment and computer storage medium
CN114928467A (en) Network security operation and maintenance association analysis method and system
CN114241752A (en) Method, device and equipment for prompting field end congestion and computer readable storage medium
CN113515507B (en) Method and system applied to dam water seepage detection
CN114297072A (en) Code analysis method, system, device, equipment and computer readable storage medium
CN112287035A (en) Data loading method, device, equipment and storage medium
CN117876113A (en) Transaction system processing method, device, equipment, medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 410000 Room 301, R&D Headquarters, Central South University Science Park, Yuelu Street, Yuelu District, Changsha City, Hunan Province

Patentee after: Tianyun Software Technology Co.,Ltd.

Address before: 410000 Room 301, R&D Headquarters, Central South University Science Park, Yuelu Street, Yuelu District, Changsha City, Hunan Province

Patentee before: Hunan Tianyun Software Technology Co.,Ltd.

CP01 Change in the name or title of a patent holder
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Fault detection methods, devices, equipment, and computer-readable storage media

Effective date of registration: 20230524

Granted publication date: 20220304

Pledgee: Changsha Rural Commercial Bank Co Ltd University City Science and Technology Branch

Pledgor: Tianyun Software Technology Co.,Ltd.

Registration number: Y2023430000016

PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230807

Granted publication date: 20220304

Pledgee: Changsha Rural Commercial Bank Co Ltd University City Science and Technology Branch

Pledgor: Tianyun Software Technology Co.,Ltd.

Registration number: Y2023430000016

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231221

Address after: Room 601, 6th Floor, Hengyang Blockchain Industrial Park, Building 5, Emerging Financial Center, High tech Zone, Hengyang City, Hunan Province, 421000

Patentee after: Smart Grain Safety Technology (Hunan) Co.,Ltd.

Address before: 410000 Room 301, R&D Headquarters, Central South University Science Park, Yuelu Street, Yuelu District, Changsha City, Hunan Province

Patentee before: Tianyun Software Technology Co.,Ltd.