CN114765575A

CN114765575A - Network fault cause prediction method and device and electronic equipment

Info

Publication number: CN114765575A
Application number: CN202110001432.4A
Authority: CN
Inventors: 周永庆; 花小磊; 朱琳
Original assignee: China Mobile Communications Group Co Ltd; Research Institute of China Mobile Communication Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; Research Institute of China Mobile Communication Co Ltd
Priority date: 2021-01-04
Filing date: 2021-01-04
Publication date: 2022-07-19
Anticipated expiration: 2041-01-04
Also published as: CN114765575B

Abstract

The invention provides a method and a device for predicting a network fault reason and electronic equipment, and solves the problem of low accuracy rate of the conventional network fault reason prediction. The method of the invention comprises the following steps: obtaining classification feature vectors in a fault work order, wherein the classification feature vectors comprise a first class feature vector and a second class feature vector; obtaining the target fault reason category to which the fault work order belongs according to the first class feature vector and the first classification prediction model; and obtaining a target fault cause sub-category of the fault work order in the target fault cause category according to the second class feature vector and a second class prediction model corresponding to the target fault cause category. According to the invention, through a two-step prediction method, namely, the large class of the fault reasons is predicted, and then the subdivided classes of the fault reasons in the large class of the fault reasons are predicted, so that the number of the predicted classes in each step can be effectively reduced, and the accuracy of the prediction result is improved.

Description

A kind of network fault cause prediction method, device and electronic equipment

技术领域technical field

本发明涉及人工智能技术领域，尤其是涉及一种网络故障原因预测方法、装置及电子设备。The invention relates to the technical field of artificial intelligence, and in particular, to a method, device and electronic equipment for predicting the cause of a network fault.

背景技术Background technique

在网络系统中，网元种类繁多，网络结构复杂，在网络运行的过程中，不可避免地会发生各种故障。在故障发生后，网络运维人员需要对故障进行排查，找出导致故障发生的原因，进而采取相应的处理措施帮助现网恢复运行。In a network system, there are many kinds of network elements, and the network structure is complex. In the process of network operation, various faults will inevitably occur. After a fault occurs, network operation and maintenance personnel need to troubleshoot the fault, find out the cause of the fault, and then take corresponding measures to help the existing network resume operation.

具体来说，在现网运行的过程中，故障发生后，网络设备会产生告警，汇报到网管系统中。网管系统基于收到的告警和一定的派单规则派单给运维人员，运维人员结合告警等多方面的信息对故障原因进行排查，再依据故障原因采取相应的处理措施，在解决了故障之后，将故障原因和处理措施回单对应到相应的工单。Specifically, during the operation of the existing network, after a fault occurs, the network device will generate an alarm and report it to the network management system. The network management system dispatches orders to the operation and maintenance personnel based on the received alarms and certain order dispatching rules. The operation and maintenance personnel investigate the cause of the failure based on the alarm and other information, and then take corresponding measures according to the cause of the failure. After that, map the cause of the failure and the handling measures to the corresponding work order.

现有的故障原因预测技术方案中，故障原因的类别较多，其中某些故障原因较为相似，直接进行预测时准确率较低。In the existing technical solutions for predicting failure causes, there are many types of failure causes, some of which are relatively similar, and the accuracy of direct prediction is low.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种网络故障原因预测方法、装置及电子设备，用于解决现有网络故障原因预测准确率低的问题。The purpose of the present invention is to provide a method, device and electronic device for predicting the cause of a network failure, which are used to solve the problem of low accuracy of predicting the cause of a network failure in the prior art.

为了达到上述目的，本发明提供一种网络故障原因预测方法，包括：In order to achieve the above object, the present invention provides a method for predicting the cause of a network fault, including:

获取故障工单中的分类特征向量，所述分类特征向量包括第一类特征向量和第二类特征向量；obtaining a classification feature vector in the trouble ticket, where the classification feature vector includes a first-class feature vector and a second-class feature vector;

根据所述第一类特征向量和第一分类预测模型，得到所述故障工单所属的目标故障原因类别；According to the first type of feature vector and the first classification prediction model, obtain the target fault cause category to which the fault work order belongs;

根据所述第二类特征向量以及与所述目标故障类别相对应的第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别。According to the feature vector of the second type and the second classification prediction model corresponding to the target fault category, the target fault cause sub-category of the fault work order in the target fault cause category is obtained.

其中，所述获取故障工单中的分类特征向量，包括：Wherein, the obtaining the classification feature vector in the trouble ticket includes:

获取待处理的故障工单，所述故障工单的字段包括告警标题、网元名称、网元类型和故障发生时间；Obtaining a fault ticket to be processed, the fields of the fault ticket include an alarm title, a network element name, a network element type, and a fault occurrence time;

基于所述故障工单的字段与特征向量的对应关系和/或特征提取模型，提取所述故障工单中的分类特征向量。Based on the correspondence between the fields of the trouble ticket and the feature vector and/or the feature extraction model, the classification feature vector in the trouble ticket is extracted.

其中，所述分类特征向量包括：Wherein, the classification feature vector includes:

用于表征所述告警标题的第一特征向量；a first feature vector for characterizing the alert title;

用于表征所述告警标题对应的故障原因类别的第二特征向量；a second feature vector used to characterize the fault cause category corresponding to the alarm title;

用于表征所述网元类型的第三特征向量；a third feature vector for characterizing the network element type;

用于表征所述网元类型对应的故障原因类别的第四特征向量；a fourth feature vector used to characterize the failure cause category corresponding to the network element type;

用于表征所述故障工单关联到的告警信息的第五特征向量；a fifth feature vector used to represent the alarm information associated with the trouble ticket;

用于表征所述告警标题对应的故障原因子类别的第六特征向量；以及用于表征网元类型对应的故障原因子类别的第七特征向量；a sixth eigenvector for characterizing the fault cause subcategory corresponding to the alarm title; and a seventh eigenvector for characterizing the fault cause subcategory corresponding to the network element type;

其中，所述第一类特征向量包括：所述第一特征向量、所述第二特征向量、所述第三特征向量、所述第四特征向量和所述第五特征向量；Wherein, the first type of eigenvectors include: the first eigenvector, the second eigenvector, the third eigenvector, the fourth eigenvector, and the fifth eigenvector;

所述第二类特征向量包括：所述第一特征向量、所述第二特征向量、所述第三特征向量、所述第四特征向量、所述第五特征向量、所述第六特征向量和所述第七特征向量。The second type of eigenvectors include: the first eigenvector, the second eigenvector, the third eigenvector, the fourth eigenvector, the fifth eigenvector, and the sixth eigenvector and the seventh eigenvector.

其中，所述根据所述第一类特征向量和第一分类预测模型，得到所述故障工单所属的目标故障原因类别，包括：Wherein, according to the first type of feature vector and the first classification prediction model, the target fault cause category to which the fault work order belongs is obtained, including:

通过所述第一分类预测模型对所述第一类特征向量进行分类，得到各个故障原因类别的概率值；Classify the first-class feature vector by using the first classification prediction model to obtain the probability value of each fault cause category;

将各个故障原因类别的概率值中最大概率值对应的故障原因类别，确定为所述故障工单所属的目标故障原因类别。The failure cause category corresponding to the maximum probability value among the probability values of each failure cause category is determined as the target failure cause category to which the failure work order belongs.

其中，所述根据所述第二类特征向量以及与所述目标故障原因类别相对应的第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别，包括：Wherein, according to the second type of feature vector and the second classification prediction model corresponding to the target fault cause category, the target fault cause subcategory of the fault work order in the target fault cause category is obtained, include:

通过所述第二分类预测模型对所述第二类特征向量进行分类，得到所述故障工单在所述目标故障原因类别中各个故障原因子类别的概率值；Classify the feature vector of the second type by using the second classification prediction model to obtain the probability value of each fault cause sub-category of the fault work order in the target fault cause category;

将各个故障原因子类别的概率值中最大概率值对应的故障原因子类别，确定为目标故障原因子类别。The fault cause sub-category corresponding to the maximum probability value among the probability values of each fault cause sub-category is determined as the target fault cause sub-category.

其中，所述方法还包括：Wherein, the method also includes:

获取多条历史故障工单及多条历史告警信息，每条所述历史故障工单的字段包括告警标题、网元名称、网元类型、故障发生时间、故障原因类别以及对应故障原因类别的故障原因子类别，每条所述历史告警信息的字段包括告警标题、网元名称和告警开始时间；Obtain multiple historical fault tickets and multiple historical alarm information. The fields of each historical fault ticket include the alarm title, NE name, NE type, fault occurrence time, fault cause category, and the fault corresponding to the fault cause category. Cause sub-category, each field of the historical alarm information includes the alarm title, network element name and alarm start time;

根据所述历史故障工单的字段和所述历史告警信息的字段，得到分类特征向量，所述分类特征向量包括：用于表征所述告警标题的第一特征向量，用于表征所述告警标题对应的故障原因类别的第二特征向量，用于表征所述网元类型的第三特征向量，用于表征所述网元类型对应的故障原因类别的第四特征向量，用于表征所述故障工单关联到的告警信息的第五特征向量，用于表征所述告警标题对应的故障原因子类别的第六特征向量以及用于表征网元类型对应的故障原因子类别的第七特征向量；According to the field of the historical trouble ticket and the field of the historical alarm information, a classification feature vector is obtained, and the classification feature vector includes: a first feature vector used to represent the alarm title, used to represent the alarm title The second eigenvector of the corresponding fault cause category, used to characterize the third eigenvector of the network element type, used to characterize the fourth eigenvector of the fault cause category corresponding to the NE type, used to characterize the fault The fifth eigenvector of the alarm information associated with the work order, the sixth eigenvector used to represent the fault cause subcategory corresponding to the alarm title, and the seventh eigenvector used to represent the fault cause subcategory corresponding to the network element type;

根据所述第一特征向量、所述第二特征向量、所述第三特征向量、所述第四特征向量、所述第五特征向量以及故障原因类别的类别标签，进行模型训练，得到第一分类预测模型。According to the first eigenvector, the second eigenvector, the third eigenvector, the fourth eigenvector, the fifth eigenvector, and the category label of the fault cause category, perform model training to obtain the first Classification prediction model.

其中，根据所述历史故障工单的字段和所述历史告警信息的字段，得到分类特征向量之后，所述方法还包括：Wherein, after obtaining the classification feature vector according to the field of the historical trouble ticket and the field of the historical alarm information, the method further includes:

根据所述故障原因类别，对多条所述历史故障工单进行分组，得到多组历史故障工单数据；Grouping a plurality of the historical fault work orders according to the fault cause category to obtain multiple sets of historical fault work order data;

根据每组历史故障工单数据对应的故障原因子类别的类别标签以及分类特征向量，对各组历史故障工单数据分别进行模型训练，得到多个第二分类预测模型。According to the category label and classification feature vector of the fault cause sub-category corresponding to each group of historical fault work order data, model training is performed on each group of historical fault work order data to obtain a plurality of second classification prediction models.

本发明还提供一种网络故障原因预测装置，包括：The present invention also provides a network fault cause prediction device, comprising:

第一获取模块，用于获取故障工单中的分类特征向量，所述分类特征向量包括第一类特征向量和第二类特征向量；a first obtaining module, configured to obtain a classification feature vector in the trouble ticket, where the classification feature vector includes a first-type feature vector and a second-class feature vector;

第一故障原因预测模块，用于根据所述第一类特征向量和第一分类预测模型，得到所述故障工单所属的目标故障原因类别；a first fault cause prediction module, configured to obtain the target fault cause category to which the fault work order belongs according to the first type feature vector and the first classification prediction model;

第二故障原因预测模块，用于根据所述第二类特征向量以及与所述目标故障原因类别相对应的第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别。The second fault cause prediction module is configured to obtain the target of the fault work order in the target fault cause category according to the second type feature vector and the second classification prediction model corresponding to the target fault cause category Failure cause subcategory.

本发明还提供一种电子设备，包括处理器和收发器，所述收发器在处理器的控制下接收和发送数据，所述处理器用于执行以下操作：The present invention also provides an electronic device comprising a processor and a transceiver, the transceiver receiving and transmitting data under the control of the processor, and the processor is configured to perform the following operations:

根据所述第二类特征向量和第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别。According to the second type of feature vector and the second classification prediction model, the target fault cause sub-category of the fault work order in the target fault cause category is obtained.

其中，所述处理器还用于：Wherein, the processor is also used for:

用于表征所述告警标题对应的故障原因子类别的第六特征向量；以及a sixth feature vector for characterizing the fault cause sub-category corresponding to the alarm title; and

用于表征网元类型对应的故障原因子类别的第七特征向量；a seventh feature vector used to characterize the fault cause sub-category corresponding to the network element type;

其中，所述处理器还用于：Wherein, the processor is also used for:

本发明还提供一种电子设备，包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的程序；所述处理器执行所述程序时实现如上述所述的网络故障原因预测方法。The present invention also provides an electronic device, comprising a memory, a processor, and a program stored on the memory and running on the processor; the processor implements the above-mentioned network failure when executing the program Cause prediction method.

本发明还提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上述所述的网络故障原因预测方法中的步骤。The present invention also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the steps in the above-mentioned method for predicting a cause of a network failure.

本发明的上述技术方案至少具有如下有益效果：The above-mentioned technical scheme of the present invention has at least the following beneficial effects:

本发明实施例中，通过获取故障工单中的分类特征向量，所述分类特征向量包括第一类特征向量和第二类特征向量；根据所述第一类特征向量和第一分类预测模型，得到所述故障工单所属的目标故障原因类别；根据所述第二类特征向量以及与所述目标故障原因类别相对应的第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别，如此，通过两步预测方法，即先对故障原因大类进行预测，再对该故障原因大类中的故障原因细分的类别进行预测，能够有效地减少每一步预测的类别数，提升预测结果的准确率。In the embodiment of the present invention, by acquiring the classification feature vector in the fault work order, the classification feature vector includes a first-class feature vector and a second-class feature vector; according to the first-class feature vector and the first classification prediction model, Obtain the target fault cause category to which the fault work order belongs; obtain the target fault cause of the fault work order according to the second type feature vector and the second classification prediction model corresponding to the target fault cause category The target fault cause sub-category in the category, so, through the two-step prediction method, that is, first predict the fault cause category, and then predict the fault cause sub-category in the fault cause category, which can effectively reduce the number of faults. The number of categories predicted in one step to improve the accuracy of the prediction results.

附图说明Description of drawings

图1表示本发明实施例的网络故障原因预测方法的流程示意图之一；FIG. 1 shows one of the schematic flowcharts of a method for predicting a cause of a network fault according to an embodiment of the present invention;

图2表示本发明实施例的第一分类预测模型和第二分类预测模型的模型训练流程示意图；2 shows a schematic diagram of a model training process of a first classification prediction model and a second classification prediction model according to an embodiment of the present invention;

图3表示本发明实施例的网络故障原因预测方法的流程示意图之二；FIG. 3 shows the second schematic flowchart of the method for predicting the cause of a network fault according to an embodiment of the present invention;

图4表示本发明实施例的网络故障原因预测装置的模块示意图；4 is a schematic diagram of a module of an apparatus for predicting a cause of a network failure according to an embodiment of the present invention;

图5表示本发明实施例的电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明要解决的技术问题、技术方案和优点更加清楚，下面将结合附图及具体实施例进行详细描述。In order to make the technical problems, technical solutions and advantages to be solved by the present invention more clear, the following will be described in detail with reference to the accompanying drawings and specific embodiments.

本发明针对现有网络故障原因预测准确率低的问题，提供一种网络故障原因预测方法、装置及电子设备。The present invention provides a method, device and electronic equipment for predicting the causes of network failures, aiming at the problem of low accuracy in predicting the causes of existing network failures.

如图1所示，为本发明实施例提供的网络故障原因预测方法的流程示意图。该方法具体包括：As shown in FIG. 1 , it is a schematic flowchart of a method for predicting a cause of a network fault provided by an embodiment of the present invention. Specifically, the method includes:

步骤101，获取故障工单中的分类特征向量，所述分类特征向量包括第一类特征向量和第二类特征向量；Step 101, obtaining a classification feature vector in the trouble ticket, where the classification feature vector includes a first-type feature vector and a second-type feature vector;

本步骤中，故障工单是基于收到的告警信息，按照预设的派单规则生成的，是一条新的工单，即故障原因暂未记录在该故障工单上。In this step, the fault work order is generated based on the received alarm information and according to the preset order dispatch rules, and is a new work order, that is, the cause of the fault has not been recorded on the fault work order.

步骤102，根据所述第一类特征向量和第一分类预测模型，得到所述故障工单所属的目标故障原因类别；Step 102, according to the first type of feature vector and the first classification prediction model, obtain the target fault cause category to which the fault work order belongs;

本步骤中，第一分类预测模型为预先训练好的模型，将该故障工单的第一类特征向量作为输入，输入该第一分类预测模型，输出该故障工单所属的目标故障原因类别，即该故障工单所属的故障原因大类。In this step, the first classification prediction model is a pre-trained model, the first type feature vector of the fault work order is used as input, the first classification prediction model is input, and the target failure cause category to which the fault work order belongs is output, That is, the category of fault causes to which the fault ticket belongs.

该第一分类预测模型是基于历史故障工单和告警数据训练得到的。The first classification prediction model is obtained by training based on historical fault tickets and alarm data.

步骤103，根据所述第二类特征向量以及与所述目标故障原因类别相对应的第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别。Step 103 , according to the second type of feature vector and the second classification prediction model corresponding to the target fault cause category, obtain the target fault cause subcategory of the fault work order in the target fault cause category.

本步骤中，第二分类预测模型为预先训练好的模型。In this step, the second classification prediction model is a pre-trained model.

这里，将该故障工单的第二类特征向量作为输入，输入该第二分类预测模型，输出该故障工单在目标故障原因类别中的目标故障原因子类别，即故障原因大类中的故障原因细分的类别。Here, the second type feature vector of the fault work order is used as input, the second classification prediction model is input, and the target fault cause sub-category of the fault work order in the target fault cause category is output, that is, the fault in the fault cause category Category of reason breakdown.

需要说明的是，由于历史故障工单中包括运维人员在故障解决后记录的故障原因和处理措施，包含了很多的运维经验，所以依据历史故障工单所包含的信息进行挖掘，训练机器学习模型，得到第一分类预测模型和第二分类预测模型，之后在新的派单中的工单信息与告警信息，自动化地对故障原因进行预测，从而节省运维人员排查故障原因的时间。It should be noted that since the historical fault tickets include the failure causes and handling measures recorded by the operation and maintenance personnel after the fault is resolved, it contains a lot of operation and maintenance experience, so the information contained in the historical fault tickets is mined and the machine is trained. Learn the model to obtain the first classification prediction model and the second classification prediction model, and then automatically predict the cause of the failure in the work order information and alarm information in the new dispatch order, thereby saving the time of the operation and maintenance personnel to troubleshoot the cause of the failure.

本发明实施例的网络故障原因预测方法，通过获取故障工单中的分类特征向量，所述分类特征向量包括第一类特征向量和第二类特征向量；根据所述第一类特征向量和第一分类预测模型，得到所述故障工单所属的目标故障原因类别；根据所述第二类特征向量以及与所述目标故障原因类别相对应的第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别，如此，通过两步预测方法，即先对故障原因大类进行预测，再对该故障原因大类中的故障原因细分的类别进行预测，能够有效地减少每一步预测的类别数，提升预测结果的准确率。In the method for predicting the cause of a network fault according to the embodiment of the present invention, the classification feature vector in the fault work order is obtained, and the classification feature vector includes a first-class feature vector and a second-class feature vector; according to the first-class feature vector and the second-class feature vector a classification prediction model, to obtain the target failure cause category to which the fault work order belongs; The target fault cause sub-category in the target fault cause category. In this way, through a two-step prediction method, that is, first predict the fault cause category, and then predict the sub-categories of the fault cause in the fault cause category. It can effectively reduce the number of categories predicted at each step and improve the accuracy of the prediction results.

作为一可选地实现方式，本发明实施例的步骤101可具体包括：As an optional implementation manner, step 101 in this embodiment of the present invention may specifically include:

本步骤中，网络故障发生后，会产生告警信息。电子设备基于收到的告警信息，按照预设的派单规则生成待处理的故障工单。In this step, after a network failure occurs, an alarm message will be generated. Based on the received alarm information, the electronic device generates a trouble ticket to be processed according to the preset order dispatch rule.

一般地，待处理的故障工单包括但不限于告警标题、网元名称、网元类型和故障发生时间等字段。需要说明的是，此时待处理的故障工单中未记录造成本次网络故障的故障原因以及对应所采取的处理措施。Generally, the fault ticket to be processed includes but is not limited to fields such as alarm title, network element name, network element type, and fault occurrence time. It should be noted that the fault cause that caused the network fault and the corresponding processing measures are not recorded in the fault ticket to be processed at this time.

可选的，所述分类特征向量包括：Optionally, the classification feature vector includes:

其中，第一特征向量、第二特征向量和第六特征向量均是根据故障工单的告警标题与特征向量的对应关系得到的。The first feature vector, the second feature vector, and the sixth feature vector are all obtained according to the correspondence between the alarm title of the fault ticket and the feature vector.

也就是说，告警标题与表征其自身的特征向量具有第一对应关系，通过该第一对应关系，得到该故障工单的表征其告警标题的第一特征向量；告警标题与表征该告警标题对应的故障原因类别的特征向量具有第二对应关系，通过该第二对应关系，得到该故障工单的表征其告警标题对应的故障原因类别的第二特征向量；告警标题与表征该告警标题对应的故障原因子类别的特征向量具有第三对应关系，通过该第三对应关系，得到该故障工单的表征其告警标题对应的故障原因子类别的第六特征向量。That is to say, the alarm title has a first correspondence with the feature vector representing itself. Through the first correspondence, the first feature vector representing the alarm title of the trouble ticket is obtained; the alarm title corresponds to the alarm title. The feature vector of the fault cause category has a second corresponding relationship, and through the second corresponding relationship, the second feature vector of the fault work order representing the fault cause category corresponding to its alarm title is obtained; the alarm title and the corresponding alarm title are obtained. The feature vector of the fault cause subcategory has a third correspondence, and through the third correspondence, a sixth feature vector of the fault work order is obtained, which represents the fault cause subcategory corresponding to the alarm title of the fault work order.

其中，第三特征向量、第四特征向量和第七特征向量均是根据故障工单的网元类型与特征向量的对应关系得到的。The third eigenvector, the fourth eigenvector, and the seventh eigenvector are all obtained according to the corresponding relationship between the network element type of the fault work order and the eigenvectors.

也就是说，网元类型与表征其自身的特征向量具有第四对应关系，通过该第四对应关系，得到该故障工单的表征其网元类型的第三特征向量；网元类型与表征该网元类型对应的故障原因类别的特征向量具有第五对应关系，通过该第五对应关系，得到该故障工单的表征其网元类型对应的故障原因类别的第四特征向量；网元类型与表征该网元类型对应的故障原因子类别的特征向量具有第六对应关系，通过该第六对应关系，得到该故障工单的表征其网元类型对应的故障原因子类别的第七特征向量。That is to say, the network element type has a fourth correspondence with the feature vector that characterizes itself. Through the fourth correspondence, the third feature vector representing the network element type of the fault work order is obtained; The feature vector of the fault cause category corresponding to the network element type has a fifth correspondence, and through the fifth correspondence, the fourth feature vector of the fault work order that represents the fault cause category corresponding to the network element type is obtained; The feature vector representing the fault cause sub-category corresponding to the network element type has a sixth correspondence, and through the sixth correspondence, the seventh feature vector of the fault work order representing the fault cause sub-category corresponding to the network element type is obtained.

需要特别指出的是，第五特征向量是基于特征提取模型得到的。具体的，首先，提取该故障工单关联到的告警信息，得到m条告警，将每条告警信息通过第一特征提取模型(比如word2vec的CBOW模型)得到对应的向量；之后，求得m个向量的第一均值vec_cbow；然后，将每条告警信息通过第二特征提取模型(比如word2vec的Skip-grams模型)得到对应的向量；之后，求得m个向量的第二均值vec_sg；最后，将第一均值vec_cbow和第二均值vec_sg拼接得到第五特征向量。It should be specially pointed out that the fifth feature vector is obtained based on the feature extraction model. Specifically, first, extract the alarm information associated with the fault ticket to obtain m alarms, and use each alarm information to obtain a corresponding vector through the first feature extraction model (such as the CBOW model of word2vec); then, obtain m The first mean vec_cbow of the vector; then, the corresponding vector is obtained for each alarm message through the second feature extraction model (such as the Skip-grams model of word2vec); after that, the second mean vec_sg of the m vectors is obtained; finally, the The first mean vec_cbow and the second mean vec_sg are concatenated to obtain the fifth feature vector.

需要说明的是，当告警信息满足第一条件和第二条件的情况下，确定该告警信息为该故障工单关联到的告警信息。It should be noted that, when the alarm information satisfies the first condition and the second condition, it is determined that the alarm information is the alarm information associated with the fault work order.

这里，第一条件为告警信息的告警开始时间t₂处于(t₁-t_A,t₁+t_B)之间，其中，t₁表示该故障工单的故障发生时间，t_A和t_B为预设的时间值。Here, the first condition is that the alarm start time t ₂ of the alarm information is between (t ₁ -t _A , t ₁ +t _B ), where t ₁ represents the fault occurrence time of the fault work order, t _A and t _B is the preset time value.

也就是说，告警信息的告警开始时间在该故障工单的故障发生时间的前一段时间和后一段时间之间。That is, the alarm start time of the alarm information is between a period of time before and a period of time after the fault occurrence time of the fault ticket.

第二条件为告警信息中的网元名称与该故障工单中的网元名称相同。The second condition is that the network element name in the alarm information is the same as the network element name in the trouble ticket.

这里，word2vec可以根据给定的语料库，通过优化后的训练模型快速有效地将一个词语表达成向量形式，为自然语言处理领域的应用研究提供了新的工具。word2vec依赖跳过某些符号Skip-grams模型或连续词袋CBOW模型来建立神经词嵌入。Here, word2vec can quickly and efficiently express a word into a vector form through an optimized training model according to a given corpus, which provides a new tool for applied research in the field of natural language processing. word2vec relies on skip-grams models or continuous bag-of-words CBOW models to build neural word embeddings.

作为一可选的实现方式，本发明实施例的方法步骤102，根据所述第一类特征向量和第一分类预测模型，得到所述故障工单所属的目标故障原因类别，可包括：As an optional implementation manner, in step 102 of the method in this embodiment of the present invention, according to the first type feature vector and the first classification prediction model, the target failure cause category to which the fault work order belongs is obtained, which may include:

这里，各个故障原因类别即该故障工单所属的各个故障原因类别，即该故障工单所属的各个故障原因大类。Here, each fault cause category refers to each fault cause category to which the fault work order belongs, that is, each fault cause category to which the fault work order belongs.

需要说明的是，基于第一类特征向量，通过第一分类预测模型的分类，得到该故障工单所属的故障原因大类可能对应很多种，最有可能所属的故障原因大类可通过概率值比较衡量确定。It should be noted that, based on the feature vector of the first type, through the classification of the first classification prediction model, it can be obtained that the fault cause category to which the fault work order belongs may correspond to many kinds, and the most likely fault cause category can pass the probability value. Compare and measure to determine.

通过该实现方式，能够预测出该故障工单所属的目标故障原因类别。Through this implementation, the target fault cause category to which the fault work order belongs can be predicted.

作为一可选的实现方式，本发明实施例的方法步骤103，根据所述第二类特征向量以及与所述目标故障原因类别相对应的第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别，包括：As an optional implementation manner, in step 103 of the method in this embodiment of the present invention, according to the second type feature vector and the second classification prediction model corresponding to the target fault cause type, obtain the location where the fault work order is located. target failure cause subcategories in the target failure cause category, including:

本步骤中，需要说明的是，第二分类预测模型是根据该故障工单所属的目标故障原因类别确定的。也就是说，不同的故障原因类别对应不同的第二分类预测模型。In this step, it should be noted that the second classification prediction model is determined according to the target fault cause category to which the fault work order belongs. That is to say, different failure cause categories correspond to different second classification prediction models.

该实现方式与上述确定故障工单的目标故障原因类别相似，最有可能所属的故障原因子类别可通过概率值比较衡量确定。This implementation method is similar to the above-mentioned determination of the target fault cause category of the fault work order, and the most likely sub-category of the fault cause can be determined by comparing the probability values.

通过上述描述可知，对于预测故障工单的故障原因的准确率关键在于分类预测模型，如何训练出好的分类预测模型，作为一可选的实现方式，本发明实施例的方法还可包括：It can be seen from the above description that the key to the accuracy of predicting the failure cause of a fault work order lies in the classification prediction model. How to train a good classification prediction model, as an optional implementation manner, the method in the embodiment of the present invention may further include:

这里，多条历史故障工单及多条历史告警信息中的“多条”在这里可以理解为大量。需要说明的是，这些历史故障工单及历史告警信息为有效的数据，也就是说，上述历史故障工单以及历史告警信息中没有空字段。Here, "multiple" among the multiple historical fault work orders and the multiple historical alarm information can be understood as a large number here. It should be noted that these historical fault tickets and historical alarm information are valid data, that is, there are no empty fields in the above historical fault tickets and historical alarm information.

在大量的历史故障工单和大量的历史告警信息中，首先，去掉含有空字段的数据；之后，对筛选出的历史故障工单和俩是告警信息中的告警标题进行正则化匹配，提取告警内容，如对于“xxx”发生“yyy告警”，提取出“yyy告警”部分；最后，将故障发生时间和告警开始时间进行格式化，转换为统一的时间格式，比如datetime64格式。In a large number of historical fault tickets and a large amount of historical alarm information, first, remove the data containing empty fields; then, perform regularization matching on the filtered historical fault tickets and the alarm titles in the two alarm information, and extract the alarms The content, such as "yyy alarm" for "xxx", extract the "yyy alarm" part; finally, format the fault occurrence time and alarm start time, and convert them into a unified time format, such as datetime64 format.

本步骤中，根据历史故障工单的告警标题字段，得到第一特征向量、第二特征向量和第六特征向量。In this step, the first feature vector, the second feature vector and the sixth feature vector are obtained according to the alarm title field of the historical trouble ticket.

具体的，对历史故障工单做如下处理：Specifically, the historical fault tickets are processed as follows:

1)对历史故障工单的告警标题进行one-hot编码，得到one-hot向量；将该告警标题与相应的one-hot向量作为字典的key和value存储起来，字典记为dict_1。1) One-hot encoding is performed on the alarm title of the historical fault ticket to obtain a one-hot vector; the alarm title and the corresponding one-hot vector are stored as the key and value of the dictionary, and the dictionary is recorded as dict_1.

这里的one-hot向量即为用于表征告警标题的第一特征向量。The one-hot vector here is the first feature vector used to represent the alarm title.

2)求得历史故障工单中的告警标题上对应出现的每种故障原因大类的次数，并进行归一化处理，得到告警标题对应的向量；将告警标题和对应的向量作为字典的key和value存储起来，字典记为dict_2。2) Obtain the number of times of each fault cause category corresponding to the alarm title in the historical fault work order, and perform normalization processing to obtain the vector corresponding to the alarm title; use the alarm title and the corresponding vector as the key of the dictionary and value are stored, and the dictionary is recorded as dict_2.

这里，该向量即为用于表征告警标题对应的故障原因类别的第二特征向量。Here, the vector is the second feature vector used to represent the fault cause category corresponding to the alarm title.

例如，在所有的工单中，一共有4条工单出现了告警标题A，这四条工单中的告警标题和故障原因大类如下：For example, in all work orders, there are a total of 4 work orders with alarm title A. The alarm titles and fault reasons in these four work orders are as follows:

告警标题A故障原因大类1Alarm Title A Fault Cause Category 1

告警标题A故障原因大类3Alarm Title A Fault Cause Category 3

告警标题A故障原因大类4Alarm Title A Fault Cause Category 4

则告警标题A对应的向量为[2/4,0,1/4,1/4,0,···]，该向量的维度为故障原因类别的类别数。Then the vector corresponding to the alarm title A is [2/4, 0, 1/4, 1/4, 0, . . . ], and the dimension of the vector is the number of categories of fault cause categories.

3)对历史故障工单的网元类型进行one-hot编码，得到one-hot向量；将该网元类型与相应的one-hot向量作为字典的key和value存储起来，字典记为dict_3。3) One-hot encoding is performed on the network element type of the historical fault work order to obtain a one-hot vector; the network element type and the corresponding one-hot vector are stored as the key and value of the dictionary, and the dictionary is recorded as dict_3.

这里的one-hot向量即为用于表征告警标题的第三特征向量。The one-hot vector here is the third feature vector used to represent the alarm title.

4)求得历史故障工单中的网元类型上对应出现的每种故障原因大类的次数，并进行归一化处理，得到网元类型对应的向量；将网元类型和其对应的向量作为字典的key和value存储起来，字典记为dict_4。4) Obtain the number of times of each failure cause category corresponding to the network element type in the historical fault work order, and perform normalization processing to obtain the vector corresponding to the network element type; the network element type and its corresponding vector It is stored as the key and value of the dictionary, and the dictionary is recorded as dict_4.

这里，该向量即为用于表征网元类型对应的故障原因类别的第四特征向量。Here, the vector is the fourth feature vector used to represent the failure cause category corresponding to the network element type.

例如，在所有的工单中，一共有4条工单出现了网元类型A，这四条工单中的网元类型和故障原因大类如下：For example, in all work orders, there are a total of 4 work orders with NE type A. The network element types and fault causes in these four work orders are as follows:

网元类型A故障原因大类1NE Type A Fault Cause Category 1

网元类型A故障原因大类3NE Type A Fault Cause Category 3

网元类型A故障原因大类4NE Type A Fault Causes Category 4

则网元类型A对应的向量为[2/4,0,1/4,1/4,0,···]，该向量的维度为故障原因类别的类别数。The vector corresponding to network element type A is [2/4,0,1/4,1/4,0,...], and the dimension of the vector is the number of categories of fault cause categories.

5)对于用于表征故障工单关联到的告警信息的第五特征向量5) For the fifth feature vector used to represent the alarm information associated with the fault ticket

这里的故障工单指的是历史故障工单。The trouble ticket here refers to the historical trouble ticket.

首先，对于每条历史故障工单，提取其关联到的告警信息。First, for each historical fault ticket, extract the associated alarm information.

这里，当告警信息满足第一条件和第二条件的情况下，确定该告警信息为该历史故障工单关联到的告警信息。Here, when the alarm information satisfies the first condition and the second condition, it is determined that the alarm information is the alarm information associated with the historical trouble ticket.

这里，第一条件为告警信息的告警开始时间t₂处于(t₁-t_A,t₁+t_B)之间，其中，t₁表示该历史故障工单的故障发生时间，t_A和t_B为预设的时间值。Here, the first condition is that the alarm start time t ₂ of the alarm information is between (t ₁ -t _A , t ₁ +t _B ), where t ₁ represents the fault occurrence time of the historical fault work order, t _A and t _B is the preset time value.

也就是说，告警信息的告警开始时间在该历史故障工单的故障发生时间的前一段时间和后一段时间之间。That is, the alarm start time of the alarm information is between a period of time before and a period of time after the fault occurrence time of the historical fault work order.

第二条件为告警信息中的网元名称与该历史故障工单中的网元名称相同。The second condition is that the network element name in the alarm information is the same as the network element name in the historical trouble ticket.

之后，将每条历史故障工单关联到的告警信息，按照告警开始时间的先后进行排序，提取每条告警信息的告警标题，将其作为一个词，组成一句告警语句。After that, sort the alarm information associated with each historical fault ticket according to the order of the alarm start time, extract the alarm title of each alarm information, and use it as a word to form an alarm sentence.

这里，该告警语句用来描述该历史故障工单故障发生的一段时间内网元上产生的一系列有序的告警信息。Here, the alarm statement is used to describe a series of ordered alarm information generated on the network element within a period of time when the historical fault work order fault occurs.

然后，将每条历史故障工单对应的告警语句，组成一篇文档，作为语料，用该语料分别训练CBOW模型和Skip-gram模型两种wrod2vec模型，得到每条告警信息的两种向量表征；保存训练好的CBOW模型model_cbow和Skip-gram模型model_sg；Then, the alarm statement corresponding to each historical fault ticket is formed into a document, which is used as the corpus, and the two wrod2vec models, the CBOW model and the Skip-gram model, are respectively trained with the corpus, and two vector representations of each alarm information are obtained; Save the trained CBOW model model_cbow and Skip-gram model model_sg;

最后，对每条历史故障工单对应的告警语句中的告警标题，查询其告警向量，再将告警向量进行平均。CBOW模型和Skip-gram模型分别对告警标题进行向量化和平均。将两种模型得到的两条向量拼接起来，作为历史故障工单匹配到告警信息的特征向量，即第五特征向量。Finally, query the alarm vector of the alarm title in the alarm statement corresponding to each historical fault ticket, and then average the alarm vectors. The CBOW model and Skip-gram model vectorize and average the alert titles, respectively. The two vectors obtained by the two models are spliced together as the feature vector of the historical fault ticket matching the alarm information, that is, the fifth feature vector.

6)求得历史故障工单中的告警标题上对应出现的每种故障原因子类别(即故障原因细分的类别)的次数，并进行归一化处理，得到告警标题对应的向量；将告警标题和对应的向量作为的key和value存储起来，字典记为dict_5。6) Obtain the number of times of each fault cause sub-category (that is, the fault cause subcategory) corresponding to the alarm title in the historical fault work order, and perform normalization processing to obtain the vector corresponding to the alarm title; The title and the corresponding vector are stored as the key and value, and the dictionary is recorded as dict_5.

这里，该向量即为用于表征告警标题对应的故障原因子类别的第六特征向量。Here, the vector is the sixth feature vector used to represent the fault cause sub-category corresponding to the alarm title.

7)求得历史故障工单中的网元类型上对应出现的每种故障原因子类别(即故障原因细分的类别)的次数，并进行归一化处理，得到网元类型对应的向量；将网元类型和对应的向量作为的key和value存储起来，字典记为dict_6。7) Obtain the number of times of each fault cause sub-category (that is, the sub-category of fault causes) corresponding to the network element type in the historical fault work order, and perform normalization processing to obtain the vector corresponding to the network element type; The network element type and the corresponding vector are stored as the key and value, and the dictionary is recorded as dict_6.

这里，该向量即为用于表征网元类型对应的故障原因子类别的第七特征向量。Here, the vector is the seventh feature vector used to represent the fault cause sub-category corresponding to the network element type.

需要说明的是，可选地，故障原因类别的类别标签通过对历史故障工单的故障原因类别进行数字化编码得到。即用数字1到N标识N类故障原因类别，作为类别标签。It should be noted that, optionally, the category label of the fault cause category is obtained by digitally encoding the fault cause category of the historical fault work order. That is, numbers 1 to N are used to identify N types of fault cause categories as category labels.

这里，将第一特征向量、第二特征向量、第三特征向量、第四特征向量和第五特征向量作为输入，输入预设分类模型中，输出分类结果，即输出类别标签，将分类结果中的类别标签与对应的实际的类别标签进行比较，不断的调整预设分类模型中的参数，缩小分类结果中的类别标签与对应的实际的类别标签的差异，使差异缩小至预设范围，或者使预设分类模型达到最小收敛位置。Here, the first eigenvector, the second eigenvector, the third eigenvector, the fourth eigenvector, and the fifth eigenvector are used as input, and are input into the preset classification model, and the classification result is output, that is, the output category label, and the classification result is Compare the category labels of the classification results with the corresponding actual category labels, continuously adjust the parameters in the preset classification model, reduce the difference between the category labels in the classification results and the corresponding actual category labels, and reduce the difference to the preset range, or Brings the preset classification model to the minimum convergence position.

这里，模型训练时所使用的预设分类模型为GBDT模型或者XGBoost模型。Here, the preset classification model used in model training is the GBDT model or the XGBoost model.

这里，GBDT(Gradient Boosting Decision Tree，梯度提升决策树)模型是一个加法模型，它串行地训练一组CART(Classification and Regression Trees，分类与回归树)，最终对所有回归树的预测结果加和，由此得到一个强学习器，每一颗新树都拟合当前损失函数的负梯度方向。Here, the GBDT (Gradient Boosting Decision Tree) model is an additive model that trains a set of CART (Classification and Regression Trees) serially, and finally sums the prediction results of all regression trees , resulting in a strong learner, where each new tree fits the negative gradient direction of the current loss function.

XGBoost(Extreme Gradient Boosting，梯度提升树)模型，同样是串行地生成模型，取所有模型的和为输出。The XGBoost (Extreme Gradient Boosting, gradient boosting tree) model also generates models serially, and takes the sum of all models as the output.

这里，将训练好的第一分类预测模型model_1进行保存。Here, save the trained first classification prediction model model_1.

进一步地，在根据所述历史故障工单的字段和所述历史告警信息的字段，得到分类特征向量之后，所述方法还包括：Further, after obtaining the classification feature vector according to the field of the historical trouble ticket and the field of the historical alarm information, the method further includes:

这里，将所有的数据，即所有的历史故障工单按照故障原因类别进行分组，得到多组历史故障工单数据。Here, all data, that is, all historical fault work orders, are grouped according to fault cause categories to obtain multiple sets of historical fault work order data.

这里，不同组的历史故障工单数据对应不同的故障原因类别。Here, different groups of historical trouble ticket data correspond to different trouble cause categories.

需要说明的是，假设故障原因类别有N种，A¹，···，A^N，那么第一组数据中，故障原因类别，即故障原因大类全部为A¹，第一组数据的标签为其故障原因子类别，假设故障原因大类A¹对应的故障原因子类别，即故障原因细分的类别共有n_1种，那么第一组数据的标签有

到

共n_1种标签。It should be noted that, assuming that there are N kinds of failure cause categories, A ¹ , ···, A ^N , then in the first group of data, the failure cause categories, that is, the major categories of failure causes, are all A ¹ , and the labels of the first group of data It is its fault cause sub-category, assuming the fault cause sub-category corresponding to the fault cause category A ¹ , that is, there are n_1 types of fault cause sub-categories, then the labels of the first group of data are:

arrive

A total of n_1 kinds of labels.

这里，分类特征向量具体指的是第一特征向量、第二特征向量、第三特征向量、第四特征向量、第五特征向量、第六特征向量和第七特征向量。Here, the classification feature vector specifically refers to the first feature vector, the second feature vector, the third feature vector, the fourth feature vector, the fifth feature vector, the sixth feature vector, and the seventh feature vector.

通过每组历史故障工单数据，即每组历史故障工单中的分类特征向量分别训练一个模型，模型训练时所使用的预设分类模型为GBDT模型或者XGBoost模型。A model is trained by each group of historical fault ticket data, that is, the classification feature vector in each group of historical fault tickets. The preset classification model used in model training is the GBDT model or the XGBoost model.

这里，将训练好的N个第二分类预测模型model_2_1到model_2_N进行保存。Here, the trained N second classification prediction models model_2_1 to model_2_N are saved.

这里，第一分类预测模型和第二分类预测模型的具体训练过程可参考图2。Here, for the specific training process of the first classification prediction model and the second classification prediction model, reference may be made to FIG. 2 .

下面就一示例，如图3所示，具体说明本发明实施例的方法的实施过程。The following is an example, as shown in FIG. 3 , to specifically describe the implementation process of the method in the embodiment of the present invention.

S1：接收到待预测的工单。S1: The work order to be predicted is received.

这里，该待预测工单为一条新的工单，该待预测工单包括告警标题、网元名称、网元类型和故障发生时间四个字段。Here, the work order to be predicted is a new work order, and the work order to be predicted includes four fields: alarm title, network element name, network element type, and fault occurrence time.

需要说明的是，基于上述四个字段以及工单关联到的告警信息预测该待预测工单的故障原因子类别(即故障原因细分类别)这一字段。It should be noted that the field of the fault cause sub-category (ie, the fault cause sub-category) of the work order to be predicted is predicted based on the above four fields and the alarm information associated with the work order.

S2：提取该工单的一级分类的特征向量。S2: Extract the feature vector of the first-level classification of the work order.

具体的，1)告警标题的one-hot特征：将该工单中的告警标题当做key，在字典dict_1中查询告警标题对应的特征向量vec_1，即第一特征向量。Specifically, 1) the one-hot feature of the alarm title: the alarm title in the work order is used as the key, and the feature vector vec_1 corresponding to the alarm title is queried in the dictionary dict_1, that is, the first feature vector.

2)告警标题的故障原因大类分布特征：将该工单中的告警标题当做key，在字典dict_2中查询告警标题对应的特征向量vec_2，即第二特征向量。2) Distributed characteristics of the major categories of fault causes of the alarm title: The alarm title in the work order is used as the key, and the feature vector vec_2 corresponding to the alarm title is queried in the dictionary dict_2, that is, the second feature vector.

3)网元类型的one-hot特征：将该条工单的网元类型当做key，在字典dict_3中查询网元类型对应的特征向量vec_3，即第三特征向量。3) One-hot feature of network element type: The network element type of the work order is used as the key, and the feature vector vec_3 corresponding to the network element type is queried in the dictionary dict_3, that is, the third feature vector.

4)网元类型的故障原因大类分布特征：将该条工单的网元类型当做key，在字典dict_4中查询网元类型对应的特征向量vec_4，即第四特征向量。4) The distribution characteristics of the fault causes of the network element type: the network element type of the work order is used as the key, and the feature vector vec_4 corresponding to the network element type is queried in the dictionary dict_4, that is, the fourth feature vector.

5)工单关联到告警的word2vec特征：5) The word2vec feature associated with the work order to the alarm:

首先，提取该条工单关联到的告警信息，提取的方法和训练阶段中的提取方法相同。假设关联到h条告警，对于每一条告警，使用wrod2vec模型CBOW模型得到其向量，然后求出h个向量的均值vec_cbow；然后，对于每一条告警，使用Skip-grams模型得到其向量，求出h条向量的均值vec_sg；最后，将均值vec_cbow和均值vec_sg进行拼接得到特征向量vec_5，即第五特征向量。First, extract the alarm information associated with the work order. The extraction method is the same as the extraction method in the training phase. Assuming that h alarms are associated, for each alarm, use the wrod2vec model CBOW model to get its vector, and then find the mean vec_cbow of the h vectors; then, for each alarm, use the Skip-grams model to get its vector, and find h The mean vec_sg of the strip vectors; finally, the mean vec_cbow and the mean vec_sg are spliced to obtain the feature vector vec_5, which is the fifth feature vector.

6)告警标题上故障原因细分的分布特征：将该工单中的告警标题当做key，在字典dict_5中查询告警标题对应的特征向量vec_6。6) Distribution characteristics of fault cause subdivision on the alarm title: The alarm title in the work order is used as a key, and the feature vector vec_6 corresponding to the alarm title is queried in the dictionary dict_5.

7)网元类型上故障原因细分的分布特征：将该工单中的网元类型当做key，在字典dict_6中查询网元类型对应的特征向量vec_7。7) The distribution characteristics of the breakdown of the fault causes on the network element type: The network element type in the work order is used as the key, and the feature vector vec_7 corresponding to the network element type is queried in the dictionary dict_6.

这里，上述vec_1、vec_2、vec_3、vec_4和vec_5属于一级分类的特征向量，即上述实施例中的第一类特征向量。Here, the above-mentioned vec_1, vec_2, vec_3, vec_4, and vec_5 belong to the first-class classification feature vectors, that is, the first-class feature vectors in the foregoing embodiment.

S3：将一级分类的特征向量输入到model_1进行分类。S3: Input the feature vector of the first-level classification into model_1 for classification.

这里，将提取到的特征向量vec_1、vec_2、vec_3、vec_4、vec_5进行拼接，然后输入到model_1进行分类，得到属于各个故障原因大类的概率p_1，···，p_N。Here, the extracted feature vectors vec_1, vec_2, vec_3, vec_4, and vec_5 are spliced, and then input to model_1 for classification, and the probability p_1, ···, p_N of each fault cause category is obtained.

S4：选取预测结果中概率最高的故障原因大类i。S4: Select the fault cause category i with the highest probability in the prediction result.

其中，预测结果即为属于各个故障原因大类的概率p_1，···，p_N，概率值最大的类别i(1≤i≤N)即为该工单在第一步分类时所属的类别。Among them, the prediction result is the probability p_1, ···, p_N belonging to each fault cause category, and the category i with the largest probability value (1≤i≤N) is the category to which the work order belongs in the first step of classification.

S5：提取该工单的二级分类的特征向量。S5: Extract the feature vector of the secondary classification of the work order.

上述vec_1、vec_2、vec_3、vec_4、vec_5、vec_6和vec_7属于二级分类的特征向量，即上述实施例中的第二类特征向量。具体的提取过程详见S2部分的阐述，这里不再赘述。The above-mentioned vec_1, vec_2, vec_3, vec_4, vec_5, vec_6, and vec_7 belong to the feature vectors of the secondary classification, that is, the second type of feature vectors in the foregoing embodiment. For the specific extraction process, please refer to the description of the S2 part, which will not be repeated here.

S6：将二级分类的特征向量输入到model_2_i进行分类。S6: Input the feature vector of the secondary classification into model_2_i for classification.

这里，将提取到的特征向量vec_1、vec_2、vec_3、vec_4、vec_5、vec_6和vec_7进行拼接，然后输入到model_2_i进行分类，得到属于故障原因大类i中各个故障原因细分的概率

Here, the extracted feature vectors vec_1, vec_2, vec_3, vec_4, vec_5, vec_6 and vec_7 are spliced, and then input into model_2_i for classification, and the probability of each fault cause subdivision belonging to the fault cause category i is obtained.

S7：选取预测结果中概率最高的故障原因细分j。S7: Select the fault cause subdivision j with the highest probability in the prediction result.

其中，预测结果即为属于故障原因大类i中各个故障原因细分的概率

其中概率值最大的类别j(1≤j≤n_i)即为该工单在第二步分类时所属的类别，即最终的故障原因细分类别。Among them, the prediction result is the probability of each fault cause subdivision in the fault cause category i

The category j with the largest probability value (1≤j≤n_i) is the category to which the work order belongs in the second step of classification, that is, the final breakdown cause classification category.

如图4所示，本发明实施例还提供一种网络故障原因预测装置，该装置包括：As shown in FIG. 4 , an embodiment of the present invention further provides an apparatus for predicting the cause of a network fault, and the apparatus includes:

第一获取模块401，用于获取故障工单中的分类特征向量，所述分类特征向量包括第一类特征向量和第二类特征向量；The first obtaining module 401 is configured to obtain a classification feature vector in the trouble ticket, where the classification feature vector includes a first-type feature vector and a second-class feature vector;

第一故障原因预测模块402，用于根据所述第一类特征向量和第一分类预测模型，得到所述故障工单所属的目标故障原因类别；A first fault cause prediction module 402, configured to obtain the target fault cause category to which the fault work order belongs according to the first type of feature vector and the first classification prediction model;

第二故障原因预测模块403，用于根据所述第二类特征向量以及与所述目标故障原因类别相对应的第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别。The second fault cause prediction module 403 is configured to obtain, according to the second type feature vector and the second classification prediction model corresponding to the target fault cause category, the fault work order in the target fault cause category Target failure cause subcategory.

可选地，所述第一获取模块401包括：Optionally, the first obtaining module 401 includes:

第一获取单元，用于获取待处理的故障工单，所述故障工单的字段包括告警标题、网元名称、网元类型和故障发生时间；a first acquiring unit, configured to acquire a fault work order to be processed, the fields of the fault work order include an alarm title, a network element name, a network element type, and a fault occurrence time;

特征提取单元，用于基于所述故障工单的字段与特征向量的对应关系和/或特征提取模型，提取所述故障工单中的分类特征向量。A feature extraction unit, configured to extract the classification feature vector in the trouble ticket based on the corresponding relationship between the fields of the trouble ticket and the feature vector and/or the feature extraction model.

可选地，所述分类特征向量包括：Optionally, the classification feature vector includes:

可选地，所述第一故障原因预测模块402包括：Optionally, the first fault cause prediction module 402 includes:

第一处理单元，用于通过所述第一分类预测模型对所述第一类特征向量进行分类，得到各个故障原因类别的概率值；a first processing unit, configured to classify the first-type feature vector by using the first classification prediction model to obtain the probability value of each fault cause category;

第二处理单元，用于将各个故障原因类别的概率值中最大概率值对应的故障原因类别，确定为所述故障工单所属的目标故障原因类别。The second processing unit is configured to determine the failure cause category corresponding to the maximum probability value among the probability values of each failure cause category as the target failure cause category to which the failure work order belongs.

可选地，所述第二故障原因预测模块403包括：Optionally, the second fault cause prediction module 403 includes:

第三处理单元，用于通过所述第二分类预测模型对所述第二类特征向量进行分类，得到所述故障工单在所述目标故障原因类别中各个故障原因子类别的概率值；a third processing unit, configured to classify the feature vector of the second type by using the second classification prediction model, and obtain the probability value of each fault cause sub-category of the fault work order in the target fault cause category;

第四处理单元，用于将各个故障原因子类别的概率值中最大概率值对应的故障原因子类别，确定为目标故障原因子类别。The fourth processing unit is configured to determine the fault cause sub-category corresponding to the maximum probability value among the probability values of each fault cause sub-category as the target fault cause sub-category.

可选地，所述装置还包括：Optionally, the device further includes:

第二获取模块，用于获取多条历史故障工单及多条历史告警信息，每条所述历史故障工单的字段包括告警标题、网元名称、网元类型、故障发生时间、故障原因类别以及对应故障原因类别的故障原因子类别，每条所述历史告警信息的字段包括告警标题、网元名称和告警开始时间；The second acquisition module is used to acquire multiple historical fault work orders and multiple historical alarm information. The fields of each historical fault work order include the alarm title, network element name, network element type, fault occurrence time, and fault cause category. and the fault cause sub-category corresponding to the fault cause category, and the fields of each piece of historical alarm information include the alarm title, the name of the network element, and the alarm start time;

第一处理模块，用于根据所述历史故障工单的字段和所述历史告警信息的字段，得到分类特征向量，所述分类特征向量包括：用于表征所述告警标题的第一特征向量，用于表征所述告警标题对应的故障原因类别的第二特征向量，用于表征所述网元类型的第三特征向量，用于表征所述网元类型对应的故障原因类别的第四特征向量，用于表征所述故障工单关联到的告警信息的第五特征向量，用于表征所述告警标题对应的故障原因子类别的第六特征向量以及用于表征网元类型对应的故障原因子类别的第七特征向量；a first processing module, configured to obtain a classification feature vector according to the field of the historical fault ticket and the field of the historical alarm information, where the classification feature vector includes: a first feature vector used to represent the alarm title, a second eigenvector used to characterize the fault cause category corresponding to the alarm title, a third eigenvector used to characterize the network element type, and a fourth eigenvector used to characterize the fault cause category corresponding to the network element type , used to represent the fifth feature vector of the alarm information associated with the fault ticket, used to represent the sixth feature vector of the fault cause sub-category corresponding to the alarm title, and used to represent the fault cause sub-category corresponding to the network element type the seventh eigenvector of the category;

第一模型训练模块，用于根据所述第一特征向量、所述第二特征向量、所述第三特征向量、所述第四特征向量、所述第五特征向量以及故障原因类别的类别标签，进行模型训练，得到第一分类预测模型。A first model training module, configured to classify labels according to the first feature vector, the second feature vector, the third feature vector, the fourth feature vector, the fifth feature vector and the fault cause category , and perform model training to obtain the first classification prediction model.

可选地，所述装置还包括：Optionally, the device further includes:

第二处理模块，用于根据所述故障原因类别，对多条所述历史故障工单进行分组，得到多组历史故障工单数据；The second processing module is configured to group a plurality of the historical fault work orders according to the fault cause category to obtain multiple sets of historical fault work order data;

第二模型训练模块，用于根据每组历史故障工单数据对应的故障原因子类别的类别标签以及分类特征向量，对各组历史故障工单数据分别进行模型训练，得到多个第二分类预测模型。The second model training module is used to perform model training on each group of historical fault work order data according to the category label and classification feature vector of the fault cause sub-category corresponding to each group of historical fault work order data to obtain multiple second classification predictions Model.

本发明实施例的网络故障原因预测装置，通过获取故障工单中的分类特征向量，所述分类特征向量包括第一类特征向量和第二类特征向量；根据所述第一类特征向量和第一分类预测模型，得到所述故障工单所属的目标故障原因类别；根据所述第二类特征向量以及与所述目标故障原因类别相对应的第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别，如此，通过两步预测方法，即先对故障原因大类进行预测，再对该故障原因大类中的故障原因细分的类别进行预测，能够有效地减少每一步预测的类别数，提升预测结果的准确率。The apparatus for predicting the cause of a network fault in the embodiment of the present invention obtains the classification feature vector in the failure work order, where the classification feature vector includes a first-class feature vector and a second-class feature vector; according to the first-class feature vector and the second-class feature vector a classification prediction model, to obtain the target failure cause category to which the fault work order belongs; The target fault cause sub-category in the target fault cause category. In this way, through a two-step prediction method, that is, first predict the fault cause category, and then predict the sub-categories of the fault cause in the fault cause category. It can effectively reduce the number of categories predicted at each step and improve the accuracy of the prediction results.

在此需要说明的是，本发明实施例提供的上述装置，能够实现上述方法实施例所实现的所有方法步骤，且能够达到相同的技术效果，在此不再对本实施例中与方法实施例相同的部分及有益效果进行具体赘述。It should be noted here that the above-mentioned device provided by the embodiment of the present invention can realize all the method steps realized by the above-mentioned method embodiment, and can achieve the same technical effect, and the same as the method embodiment in this embodiment is not repeated here. The parts and beneficial effects will be described in detail.

为了更好的实现上述目的，如图5所示，本发明实施例还提供一种电子设备，包括处理器500和收发器510，所述收发器510在处理器的控制下接收和发送数据，所述处理器500用于执行如下过程：In order to better achieve the above purpose, as shown in FIG. 5 , an embodiment of the present invention further provides an electronic device, including a processor 500 and a transceiver 510, the transceiver 510 receives and sends data under the control of the processor, The processor 500 is configured to perform the following processes:

根据所述第二类特征向量以及与所述目标故障原因类别相对应的第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别。According to the second type feature vector and the second classification prediction model corresponding to the target fault cause category, the target fault cause subcategory of the fault work order in the target fault cause category is obtained.

可选地，所述处理器500还用于：Optionally, the processor 500 is further configured to:

获取待处理的故障工单，所述故障工单的字段包括告警标题、网元名称、网元类型和故障发生时间；Acquiring a to-be-processed fault ticket, where the fields of the fault ticket include an alarm title, a network element name, a network element type, and a fault occurrence time;

本发明实施例的电子设备，通过获取故障工单中的分类特征向量，所述分类特征向量包括第一类特征向量和第二类特征向量；根据所述第一类特征向量和第一分类预测模型，得到所述故障工单所属的目标故障原因类别；根据所述第二类特征向量和第二分类预测模型，得到所述故障工单在所述目标故障原因类别中的目标故障原因子类别，如此，通过两步预测方法，即先对故障原因大类进行预测，再对该故障原因大类中的故障原因细分的类别进行预测，能够有效地减少每一步预测的类别数，提升预测结果的准确率。In the electronic device according to the embodiment of the present invention, by acquiring the classification feature vector in the fault work order, the classification feature vector includes a first-class feature vector and a second-class feature vector; predicting based on the first-class feature vector and the first classification feature vector model to obtain the target failure cause category to which the fault work order belongs; according to the second type feature vector and the second classification prediction model, obtain the target failure cause sub-category of the fault work order in the target failure cause category , so, through the two-step prediction method, that is, first predicting the major categories of fault causes, and then predicting the sub-categories of the fault causes in the major categories of fault causes, which can effectively reduce the number of categories predicted in each step and improve the prediction. accuracy of the results.

本发明实施例还提供一种电子设备，包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述程序时实现如上所述的网络故障原因预测方法实施例中的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。An embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and running on the processor, where the processor implements the above-mentioned program when the processor executes the program The various processes in the embodiments of the method for predicting the cause of a network fault can achieve the same technical effect, and are not repeated here to avoid repetition.

本发明实施例还提供一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上所述的网络故障原因预测方法实施例中的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。其中，所述的计算机可读存储介质，如只读存储器(Read-Only Memory，简称ROM)、随机存取存储器(Random Access Memory，简称RAM)、磁碟或者光盘等。Embodiments of the present invention also provide a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements each process in the above-mentioned embodiment of the method for predicting a cause of a network failure, and can achieve the same The technical effect, in order to avoid repetition, will not be repeated here. The computer-readable storage medium is, for example, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), a magnetic disk, or an optical disk.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可读存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.

本申请是参照根据本申请实施例的方法、设备(系统)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow and/or a block or blocks of the flowchart.

这些计算机程序指令也可存储在能引导计算机或其它可编程数据处理设备以特定方式工作的计算机可读存储介质中，使得存储在该计算机可读存储介质中的指令产生包括指令装置的纸制品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable storage medium capable of directing a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce a paper product comprising the instruction means, The instruction means implements the functions specified in the flow or flows of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其它可编程数据处理设备上，使得计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他科编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device to cause the computer or other programmable device to perform a series of operational steps to produce a computer-implemented process, whereby the instructions to be executed on the computer or other programmable device Steps are provided for implementing the functions specified in a flow or flows of the flowcharts and/or a block or blocks of the block diagrams.

以上所述是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明所述原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above are the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made. These improvements and modifications It should also be regarded as the protection scope of the present invention.

Claims

1. A method for predicting a cause of a network fault includes:

obtaining a classification feature vector in a fault work order, wherein the classification feature vector comprises a first class feature vector and a second class feature vector;

obtaining the target fault reason category to which the fault work order belongs according to the first class feature vector and a first classification prediction model;

and obtaining a target fault cause sub-category of the fault work order in the target fault cause category according to the second class of feature vectors and a second classification prediction model corresponding to the target fault cause category.

2. The method of claim 1, wherein the obtaining classified feature vectors in the faulty work order comprises:

acquiring a fault work order to be processed, wherein fields of the fault work order comprise an alarm title, a network element name, a network element type and fault occurrence time;

and extracting the classification characteristic vector in the fault work order based on the corresponding relation between the field of the fault work order and the characteristic vector and/or a characteristic extraction model.

3. The method of claim 2, wherein the classifying the feature vector comprises:

a first feature vector for characterizing the alert caption;

the second characteristic vector is used for representing the fault reason category corresponding to the alarm title;

a third feature vector for characterizing the network element type;

a fourth feature vector for characterizing a fault cause category corresponding to the network element type;

a fifth feature vector used for representing the alarm information associated with the fault work order;

a sixth feature vector used for characterizing the fault cause subcategory corresponding to the alarm title; and

a seventh feature vector for characterizing a fault cause subcategory corresponding to the network element type;

wherein the first class of feature vectors comprises: the first feature vector, the second feature vector, the third feature vector, the fourth feature vector, and the fifth feature vector;

the second class of feature vectors comprises: the first feature vector, the second feature vector, the third feature vector, the fourth feature vector, the fifth feature vector, the sixth feature vector, and the seventh feature vector.

4. The method according to claim 1, wherein the obtaining a target fault cause category to which the fault work order belongs according to the first class eigenvector and the first classification prediction model comprises:

classifying the first class of feature vectors through the first classification prediction model to obtain probability values of all fault reason classes;

and determining the fault reason category corresponding to the maximum probability value in the probability values of all fault reason categories as a target fault reason category to which the fault work order belongs.

5. The method of claim 1, wherein obtaining the target fault cause subcategory of the fault work order in the target fault cause category according to the second class feature vector and a second classification prediction model corresponding to the target fault cause category comprises:

classifying the second class of feature vectors through the second classification prediction model to obtain probability values of fault cause sub-categories of the fault work order in the target fault cause category;

and determining the fault reason subcategory corresponding to the maximum probability value in the probability values of the fault reason subcategories as a target fault reason subcategory.

6. The method of claim 1, further comprising:

acquiring a plurality of historical fault work orders and a plurality of historical alarm information, wherein the field of each historical fault work order comprises an alarm title, a network element name, a network element type, fault occurrence time, a fault reason category and a fault reason subcategory corresponding to the fault reason category, and the field of each historical alarm information comprises an alarm title, a network element name and alarm starting time;

obtaining a classification feature vector according to the field of the historical fault work order and the field of the historical alarm information, wherein the classification feature vector comprises: a first feature vector used for characterizing the alarm title, a second feature vector used for characterizing a fault cause category corresponding to the alarm title, a third feature vector used for characterizing the network element type, a fourth feature vector used for characterizing the fault cause category corresponding to the network element type, a fifth feature vector used for characterizing alarm information associated with the fault work order, a sixth feature vector used for characterizing a fault cause sub-category corresponding to the alarm title, and a seventh feature vector used for characterizing the fault cause sub-category corresponding to the network element type;

and performing model training according to the first feature vector, the second feature vector, the third feature vector, the fourth feature vector, the fifth feature vector and the class label of the fault reason class to obtain a first class prediction model.

7. The method of claim 6, wherein after obtaining the classification feature vector according to the fields of the historical fault work order and the fields of the historical alarm information, the method further comprises:

according to the fault reason category, grouping a plurality of historical fault work orders to obtain a plurality of groups of historical fault work order data;

and respectively carrying out model training on each group of historical fault work order data according to the class label and the classification characteristic vector of the fault reason subclass corresponding to each group of historical fault work order data to obtain a plurality of second classification prediction models.

8. A network failure cause prediction apparatus, comprising:

the first obtaining module is used for obtaining a classification feature vector in the fault work order, wherein the classification feature vector comprises a first class feature vector and a second class feature vector;

the first fault cause prediction module is used for obtaining the target fault cause category to which the fault work order belongs according to the first class feature vector and a first classification prediction model;

and the second fault cause prediction module is used for obtaining a target fault cause sub-category of the fault work order in the target fault cause category according to the second class of feature vectors and a second classification prediction model corresponding to the target fault cause category.

9. An electronic device comprising a processor and a transceiver, the transceiver receiving and transmitting data under control of the processor, characterized in that the processor is adapted to:

and obtaining a target fault cause sub-category of the fault work order in the target fault cause category according to the second class feature vector and a second classification prediction model.

10. The electronic device of claim 9, wherein the processor is further configured to:

11. The electronic device of claim 10, wherein the classification feature vector comprises:

a first feature vector for characterizing the alert caption;

a third feature vector for characterizing the network element type;

a fourth feature vector used for characterizing the fault reason category corresponding to the network element type;

wherein the first class of feature vectors comprises: the first, second, third, fourth, and fifth feature vectors;

the second class of feature vectors includes: the first feature vector, the second feature vector, the third feature vector, the fourth feature vector, the fifth feature vector, the sixth feature vector, and the seventh feature vector.

12. The electronic device of claim 9, wherein the processor is further configured to:

13. The electronic device of claim 9, wherein the processor is further configured to:

14. The electronic device of claim 9, wherein the processor is further configured to:

obtaining a classification feature vector according to the field of the historical fault work order and the field of the historical alarm information, wherein the classification feature vector comprises: the first feature vector is used for representing the alarm title, the second feature vector is used for representing the fault cause category corresponding to the alarm title, the third feature vector is used for representing the network element type, the fourth feature vector is used for representing the fault cause category corresponding to the network element type, the fifth feature vector is used for representing the alarm information associated with the fault work order, the sixth feature vector is used for representing the fault cause sub-category corresponding to the alarm title, and the seventh feature vector is used for representing the fault cause sub-category corresponding to the network element type;

and performing model training according to the first feature vector, the second feature vector, the third feature vector, the fourth feature vector, the fifth feature vector and the class label of the fault reason class to obtain a first classification prediction model.

15. The electronic device of claim 14, wherein the processor is further configured to:

and respectively carrying out model training on each group of historical fault work order data according to the category label and the classification characteristic vector of the fault reason sub-category corresponding to each group of historical fault work order data to obtain a plurality of second classification prediction models.

16. An electronic device comprising a memory, a processor, and a program stored on the memory and executable on the processor; characterized in that the processor implements the method for predicting the cause of network failure according to any one of claims 1 to 7 when executing the program.

17. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method for predicting a cause of a network failure as set forth in any one of claims 1 to 7.