CN116010581A

CN116010581A - Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene

Info

Publication number: CN116010581A
Application number: CN202310100993.9A
Authority: CN
Inventors: 黎峰; 于沺; 许新颖; 邵柄莱; 张永强; 贾玉强; 察兴坤
Original assignee: Jinxiandai Information Industry Co ltd
Current assignee: Jinxiandai Information Industry Co ltd
Priority date: 2023-02-08
Filing date: 2023-02-08
Publication date: 2023-04-25

Abstract

The invention belongs to the field of power grid monitoring, and provides a knowledge graph question answering method and system based on a power grid hidden trouble shooting scene, wherein the knowledge graph question answering method and system comprise the steps of obtaining a problem corpus under the power grid hidden trouble shooting scene, and combing out an intention template and a problem template; carrying out named entity recognition on the problem corpus by utilizing a pre-constructed professional field model to obtain a named entity recognition result; judging whether specific entity categories and specific intention words exist in the input question, carrying out intention judgment by matching with an intention template, and classifying the intention judgment into the question template; according to the results returned by entity extraction and intention classification, correspondingly generating a cypher statement of the Neo4j graph database; according to the generated cypher statement, connecting a neo4j graph database, inquiring, and obtaining a corresponding inquiry result; and formatting the query result, outputting and returning to the user. The invention adopts the Bert-BiLSTM-CRF deep learning model, combines the electric potential hazard data to train the deep learning model in the professional field, and the accuracy rate of entity identification at present reaches 80 percent.

Description

Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene

Technical Field

The invention belongs to the technical field of power grid monitoring, and particularly relates to a knowledge graph question-answering method and system based on a power grid hidden trouble shooting scene.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Traditional intelligent question-answering systems, the comparison depends on the quality of the database, and is based on a large number of unstructured text question templates. Along with the rapid development of deep learning in the field of natural language processing in recent years, the capability of a computer for analyzing text data is gradually improved, and the capability of a knowledge graph technology for constructing a semantic network and analyzing the convergence is better improved. KBQA has been implemented in a number of industry fields at present, but the power field still lacks related products due to the lack of natural language processed labeling corpus, or the ability to answer questions is not strong enough to be intelligent.

Disclosure of Invention

In order to solve the problems, the invention provides a knowledge graph question-answering method and system based on a power grid hidden danger investigation scene, and aims to provide an intelligent question-answering method and system based on a power grid hidden danger investigation scene knowledge graph, which are used for carrying out semantic analysis on the problems through a natural language processing technology, distinguishing an inquiry intention, turning to a corresponding template, combining an entity link and an attribute link to be converted into a gallery inquiry statement, inquiring from a knowledge base to obtain a result and combining the question-answering template to obtain an answer.

According to some embodiments, the first scheme of the invention provides a knowledge graph question-answering method based on a power grid hidden trouble shooting scene, which adopts the following technical scheme:

a knowledge graph question-answering method based on a power grid hidden trouble shooting scene comprises the following steps:

acquiring a problem corpus under a potential grid hazard investigation scene, and carding out an intention template and a problem template;

carrying out named entity recognition on the problem corpus by utilizing a pre-constructed professional field model to obtain a named entity recognition result;

judging whether specific entity categories and specific intention words exist in the input question, carrying out intention judgment by matching with an intention template, and classifying the intention judgment into the question template;

according to the results returned by entity extraction and intention classification, correspondingly generating a cypher statement of the Neo4j graph database;

according to the generated cypher statement, connecting a neo4j graph database, inquiring, and obtaining a corresponding inquiry result; and formatting the query result, outputting and returning to the user.

Further, the method comprises the steps of obtaining the problem corpus under the potential grid hazard investigation scene, and carding out an intention template and a problem template, wherein the method comprises the following specific steps:

according to the problem corpus in the power grid hidden trouble investigation scene, combing out the category of the problems related to the power hidden trouble;

determining a corresponding problem framework template based on the category of the electric potential problem;

the intention template comprises equipment information, hidden danger information and team information;

the question templates include question frames corresponding to each of the intent templates.

Further, the construction process of the patent field model specifically comprises the following steps:

extracting entity and attribute from the electric potential hazard investigation and management document and carrying out knowledge construction to obtain an electric potential hazard knowledge graph;

migrating the electric potential hazard investigation and management document into a neo4j graph database according to the graph design;

and labeling the electric potential hazard data content as a corpus training set, training the deep learning model, and obtaining a professional field model aiming at electric potential hazard data named entity identification and relation extraction.

Further, the professional field model comprises a BERT module, a two-way long-short-term memory module and a CRF undirected graph module;

wherein, the MASK layer in the BERT module adopts dynamic MASK to avoid the repetition of training data;

the bidirectional long-short-term memory module adopts a stack-shaped deep bidirectional RNN network, and the pooling layer adopts a self-adaptive pooling layer, so that the original characteristic parameters are reduced for dimension reduction.

Further, the professional field model carries out named entity recognition on the problem corpus to obtain named entity recognition results, which are specifically as follows:

the BERT module extracts text information containing context for entity classification;

extracting text sequence information from text information containing context by using a two-way long-short-term memory module;

and the CRF undirected graph module classifies the text sequence information to obtain a named entity recognition result.

Further, the determining whether the specific entity category and the specific intention word exist in the input question sentence, matching with the intention template to perform intention determination, and classifying the intention determination into the question template includes:

acquiring an input question, and carrying out semantic analysis on the input question;

performing intention judgment based on the semantic analysis result and the intention template matching;

and classifying the problem template based on the intention judgment result matched with the content of the problem template.

Further, the returned result is classified according to the entity extraction and the intention, and the cypher statement of the Neo4j graph database is correspondingly generated, specifically:

acquiring an input question sentence based on a question corpus in a power grid hidden trouble investigation scene, and performing word segmentation processing to acquire a keyword of the input question sentence;

classifying the input question into a question template in a corresponding intention classification result based on the keywords of the input question and the sentence pattern of the input question;

based on the classified problem templates and the corresponding graph database cypher statement templates, the keywords of the input question are put into the cypher statement logic, and the complete cypher query statement is obtained.

According to some embodiments, the second scheme of the invention provides a knowledge graph question-answering system based on a potential grid hazard investigation scene, which adopts the following technical scheme:

a knowledge graph question-answering system based on a power grid hidden trouble shooting scene comprises:

the problem corpus processing module is configured to acquire problem corpus under the power grid hidden trouble investigation scene and comb out an intention template and a problem template;

the entity identification module is configured to carry out named entity identification on the problem corpus by utilizing a pre-constructed professional field model to obtain a named entity identification result;

the intention judging module is configured to judge whether specific entity categories and specific intention words exist in the input question, match with the intention template, judge the intention and classify the intention into the question template;

the query statement determining module is configured to correspondingly generate a cypher statement of the Neo4j graph database according to the results returned by entity extraction and intention classification;

the query result feedback module is configured to connect the neo4j graph database according to the generated cypher statement, query the neo4j graph database and acquire a corresponding query result; and formatting the query result, outputting and returning to the user.

According to some embodiments, a third aspect of the present invention provides a computer-readable storage medium.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in a knowledge graph question-answering method based on a grid potential troubleshooting scenario as described in the first aspect above.

According to some embodiments, a fourth aspect of the invention provides a computer device.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in a knowledge graph question-answering method based on a grid hidden trouble shooting scenario according to the first aspect.

Compared with the prior art, the invention has the beneficial effects that:

the invention adopts the Bert-BiLSTM-CRF deep learning model, which is one of the algorithm models with highest precision in the current named entity recognition algorithm, and combines the electric potential hazard data to train the deep learning model in the professional field, and the current entity recognition accuracy rate reaches 80%; the BERT model is improved, dynamic Mask is adopted to replace static Mask of BERT, repetition of training data is avoided to the greatest extent, the method is equivalent to simple enhancement of the data, and a certain regularization effect is achieved.

The whole body formed by the power transformation stations with various voltages and the power transmission and distribution lines in the power system is called a power network, and comprises three units of power transformation, power transmission and power distribution, and the whole physical structure is a huge and complex network. The characteristics of the physical structure of the power grid are considered, the graph data structure is adopted, and a power grid data model is created and is very fit; the graph data structure has no associated foreign key, and is associated among the relationship nodes, so that the data structure is more flexible. Based on the graph database, the support of the deep learning technology is added at the same time to conduct knowledge question answering, and the breadth and performance accuracy of the query are greatly improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

Fig. 1 is a flowchart of a knowledge graph question-answering method based on a power grid hidden trouble shooting scene in an embodiment of the invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

Example 1

As shown in fig. 1, the present embodiment provides a knowledge graph question-answering method based on a network hidden trouble shooting scene, and the present embodiment is applied to a server for illustration by using the method, and it can be understood that the method can also be applied to a terminal, and can also be applied to a system and a terminal, and implemented through interaction between the terminal and the server. The server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network servers, cloud communication, middleware services, domain name services, security services CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein. In this embodiment, the method includes the steps of:

The method comprises the steps of obtaining a problem corpus under a power grid hidden trouble investigation scene, and carding out an intention template and a problem template, wherein the method comprises the following specific steps:

the intent templates comprise equipment information, hidden danger information, team information, admission team information and other hot spot problems related to hidden danger scenes;

The intention templates are classified according to the questions concerned, some questions are questions hidden in questions, some questions are questions of questions equipment, some questions are questions of questions team, and the intention templates are made according to the classification. After the problems, such as what equipment is and what hidden danger is, the two problems are not mixed together, but are divided into two templates first, then sentences are used, the accuracy is higher, and confusion between certain equipment and certain hidden danger is avoided.

It should be noted that, in this embodiment, one intention template corresponds to more than ten question templates, and other numbers may be set as required. The question templates are relevant question frameworks listing each of the intent templates, for example:

the template of hidden danger details is as follows:

how many defect hidden danger co-occur in current year of xx transformer substation

Which of the devices of xx substations have serious defects

......

Different questions, the number of in-question defects, and which of the devices in question have defects; that is, when the problem occurs, the problem is classified first, and then the problem can be more prepared to be converted into the query statement corresponding to the graph database cytoer.

The construction process of the patent field model specifically comprises the following steps:

The professional field model comprises a BERT module, a two-way long-short-term memory module and a CRF undirected graph module;

The professional field model carries out named entity recognition on the problem corpus to obtain named entity recognition results, which are specifically as follows:

Judging whether specific entity category and specific intention word exist in the input question, carrying out intention judgment by matching with an intention template, and classifying the intention judgment into the question template, wherein the method comprises the following steps of:

The cypher statement corresponding to the Neo4j graph database is generated according to the results returned by entity extraction and intention classification, and specifically comprises the following steps:

Firstly, an acquired input question sentence is adopted to perform word segmentation processing, the keyword of the question sentence is acquired, and a stop word list and a professional word list are adopted as auxiliary materials to enable a word segmentation result to be more accurate;

secondly, classifying the problems into problem templates which are induced according to service scenes by adopting a classification algorithm and adopting sentence patterns of the problems and keywords decomposed in the last step, such as general questioning sentences and special questioning sentences;

and thirdly, the generalized question template corresponds to a figure database cytoer sentence template, the keywords obtained by word segmentation processing in the first step are put into sentence logic, a complete query sentence is finally obtained, the corresponding answer is searched, and then the answer is packaged and output by natural language.

Such as: xx substation co-occurrence of more hidden trouble

The word segmentation result is xx transformer substation, xx hidden danger "

Corresponding cypher statement

MATCH (n: 'hidden danger') WHERE n.name contacts 'hidden danger' and n.location contacts 'xx substation' RETURN n, count (n).

According to the generated cypher statement, connecting a neo4j graph database, inquiring, and obtaining a corresponding inquiry result; formatting the query result, outputting and returning to the user, specifically:

and inquiring specific hidden dangers and statistical quantity according to the generated cypher statement, and returning answers packaged by natural language. Such as: the xx transformer substation has the hidden dangers of count (n) times, namely hidden dangers 1, hidden dangers 2, hidden dangers 3 and … ….

It should be noted that, the conversion method of converting the cytoer sentence into the natural language is only required by adopting the conversion method in the prior art, and is not specifically limited, and will not be described herein.

Construction process of knowledge graph of electric potential hazard scene

And (3) designing a map: extracting entities and attributes from the electric potential hazard investigation and management document and constructing knowledge;

and (3) constructing a map: migrating the data into a neo4j graph database according to the graph design;

preparing a model: marking the electric potential hazard data content as corpus, applying Bert-BiLSTM-CRF deep learning model training, improving the static mask of BERT to be changed into a random dynamic mask, improving the recognition accuracy of the model, and obtaining a professional field model capable of recognizing electric potential hazard data named entity and extracting relation;

secondly, construction process of electric potential hazard scene question-answering system

And (3) finishing a template: acquiring a problem corpus under a potential grid hazard investigation scene, and carding out an intention template and a problem template;

and (3) entity extraction: applying the professional field model obtained in the step one and 3 to carry out named entity recognition on the problem corpus to obtain a named entity recognition result;

intent classification: judging whether specific entity categories and specific intention words exist in the input question through a naive Bayes classifier, carrying out intention judgment by matching with a template, and classifying the result into a question template;

generating a cypher query statement: according to the results returned by entity extraction and intention classification, correspondingly generating a cypher statement of the Neo4j graph database;

querying results and formatting and outputting: firstly, according to the produced cypher statement, connecting a neo4j graph database, inquiring, and obtaining a corresponding result. And formatting the query result, outputting and returning to the user.

The Bert-BiLSTM-CRF deep learning model is divided into three parts:

the BERT model is a transform-dependent multi-layer bi-directional self-encoder model, proposed by google in 2018. The input section includes three items of a word vector, a text vector, and a position vector. Randomly initializing and generating word vectors during preprocessing; the text vector is used for assigning values to different sentences to distinguish short sentences in a section of speech; the position vector is used to distinguish words in different positions. Therefore, the vector representation fused with the global semantic information can be obtained and is better applied to various tasks at the lower end. The dynamic Mask is used to replace the static Mask of the BERT, the original BERT adopts a static Mask mode, namely, the whole training process, 15% of the token is not changed once being selected, namely, the 15% token is randomly selected from the beginning, and the following N epochs are not changed. While this patent imitates the practice of Roberta, pre-trained data is initially replicated 10 copies, each copy randomly selecting 15% of the token for Masking. Then training N/10 epochs for each data, avoiding the repetition of training data to a great extent, which is equivalent to simply enhancing the data and has a certain regularization effect. The method has the function of improving the text information extraction effect by using the pre-training model on the premise of less labeling data. The result output by BERT further extracts text sequence information through a deep bidirectional LSTM neural network, and a stack-shaped deep bidirectional RNN network is adopted for the deep bidirectional LSTM neural network.

BiLSTM, a two-way long and short term memory model, is an improvement over the LSTM model, which consists of a forward LSTM and a reverse LSTM. Compared with LSTM which can only use unidirectional information, the improved BiLSTM can calculate model input from the front direction and the back direction, achieves the effect of using context information, and plays a role in extracting text information containing context for entity classification in the model. Based on the probability of directly predicting labels by using hidden layer vectors in the processing process of the deep layer bidirectional LSTM neural network, in the task of named entity identification, certain constraint relations exist for outputting labels, wherein the constraint relations cannot be represented by the deep layer bidirectional LSTM neural network, so that a CRF decoding layer is embedded in a decoding layer in a preset network model, namely, a CRF model is embedded in the deep layer bidirectional LSTM neural network, and an output value is calculated by using the CRF model.

CRF is an undirected graph model, and by improving MEMM, the transition probability among states is not directly calculated, but the normalized score obtained by the product of the maximum cluster potential function is calculated; is also a discriminant model and belongs to a log-linear model, i.e., given sequence X, the probability of the corresponding Y sequence is found. The CRF layer may add some constraints to ensure that the final prediction result is valid. These constraints can be automatically learned by the CRF layer when training data. In the model, the CRF converts the consistency of label classification in the entity into T N classification problems.

In the named entity recognition process, the input of the BiLSTM model is a word vector after BERT pre-training, the output is the score of each category corresponding to the input word, if a Softmax layer is directly added after the BiLSTM, although the label corresponding to the input word can be obtained, the constraint condition of the sentence itself is not considered, for example, the beginning of the sentence may be B or I, and the beginning of the sentence may not be O. The CRF can automatically learn the constraint conditions of sentences through all the outputs of the BiLSTM and output more reasonable prediction labels.

The embodiment improves the pooling layer of the BiLSTM model, and adds the self-adaptive pooling layer local import-based pooling to replace the common max-pooling and average pooling. The extracted features are input into a pooling layer, and the pooling layer can reduce original feature parameters, further reduce dimension, remove redundancy, reduce calculated amount and the like, and retain relative position information. The maximum pooling is to output the maximum value of the features in the neighborhood, the average pooling is to output the average value of the features in the neighborhood, and the adaptive pooling layer adds the activation value weight into the network training, so that the defects that the maximum pooling lacks the degree of importance which cannot be adapted to the activation value of the features along with the feature and the average pooling blurs the important features are effectively overcome. By inputting the features into the BiLSTM model, the spatial features and the time sequence information are fused, and the recognition accuracy of proper nouns of the electric potential hazards is effectively improved. A naive Bayes classifier classification model is introduced, so that the traditional simple template matching is mainly replaced, a mechanism for judging intention is provided, and the reaction time reaches 200 milliseconds.

Example two

The embodiment provides a knowledge graph question-answering system based on a potential grid hazard investigation scene, which comprises the following steps:

The above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to what is disclosed in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.

The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.

The proposed system may be implemented in other ways. For example, the system embodiments described above are merely illustrative, such as the division of the modules described above, are merely a logical function division, and may be implemented in other manners, such as multiple modules may be combined or integrated into another system, or some features may be omitted, or not performed.

Example III

The present embodiment provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps in a knowledge graph question-answering method based on a network hidden danger troubleshooting scenario according to the above embodiment.

Example IV

The embodiment provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the knowledge graph question-answering method based on the network hidden trouble shooting scene according to the embodiment.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. The knowledge graph question-answering method based on the power grid hidden trouble shooting scene is characterized by comprising the following steps of:

2. The knowledge graph question-answering method based on the power grid hidden trouble shooting scene as claimed in claim 1, wherein the obtaining of the problem corpus in the power grid hidden trouble shooting scene, and the combing of the intention template and the problem template are specifically as follows:

3. The knowledge graph question-answering method based on the power grid hidden trouble shooting scene as set forth in claim 1, wherein the construction process of the patent field model is specifically as follows:

4. The knowledge graph question-answering method based on the power grid hidden danger investigation scene of claim 3, wherein the professional field model comprises a BERT module, a two-way long-short-term memory module and a CRF undirected graph module;

5. The knowledge graph question answering method based on the power grid hidden danger investigation scene of claim 4, wherein the professional field model performs named entity recognition on the problem corpus to obtain a named entity recognition result, specifically:

6. The knowledge graph question-answering method based on the power grid hidden trouble shooting scene as claimed in claim 1, wherein the judging whether the specific entity category and the specific intention word exist in the input question sentence, the intention judging is carried out by matching with the intention template, and the judgment is classified into the question template, comprises the following steps:

7. The knowledge graph question-answering method based on the power grid hidden trouble shooting scene according to claim 1, wherein the returned results are classified according to the entity extraction and the intention, and the cypher statement of the Neo4j graph database is correspondingly generated specifically:

8. A knowledge graph question-answering system based on a power grid hidden trouble shooting scene is characterized by comprising:

9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of a knowledge graph question-answering method based on a grid potential troubleshooting scenario according to any one of claims 1-7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of a knowledge graph question-answering method based on a grid potential troubleshooting scenario according to any one of claims 1-7 when the program is executed by the processor.