CN110909174B - Knowledge graph-based method for improving entity link in simple question answering - Google Patents
Knowledge graph-based method for improving entity link in simple question answering Download PDFInfo
- Publication number
- CN110909174B CN110909174B CN201911131171.7A CN201911131171A CN110909174B CN 110909174 B CN110909174 B CN 110909174B CN 201911131171 A CN201911131171 A CN 201911131171A CN 110909174 B CN110909174 B CN 110909174B
- Authority
- CN
- China
- Prior art keywords
- entity
- vector
- word
- entities
- level
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000001514 detection method Methods 0.000 claims abstract description 8
- 239000013598 vector Substances 0.000 claims description 124
- 230000014509 gene expression Effects 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 6
- 210000001503 joint Anatomy 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 abstract 1
- 239000012458 free base Substances 0.000 description 9
- 239000002585 base Substances 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000000737 periodic effect Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an improved method for entity linkage in simple question answering based on a knowledge graph, which belongs to the technical field of natural language processing, and comprises the steps of establishing a central server and a problem input client, establishing an entity detection module, an entity candidate set module, a knowledge graph retrieval module and an entity matching module in the central server, detecting problem data, establishing an entity candidate set, encoding the problem data, performing three-level encoding on entities in the entity candidate set, selecting n entities with the highest matching scores with the problem data from the entity candidate set, and adopting a unique problem encoding mode.
Description
Technical Field
The invention belongs to the technical field of big data, and relates to a knowledge graph-based method for improving entity links in simple questions and answers.
Background
In recent years, more and more open source knowledge maps (KGs) have emerged that contain a large number of facts, such as FreeBase, Yago, and DBpedia. Question-answering (KG-QA) with knowledge graph as the answer source is a hot spot of recent research. There are two main ways to store knowledge maps: RDF-based storage, graph database-based storage.
Conventional KG-QA methods can be divided into three major classes of KG-QA. The first type is semantic parsing: the method is a partial linguistic method, and the main idea is that a natural language is converted into a series of formal logic forms, the logic forms are analyzed from bottom to top to obtain a logic form capable of expressing the whole problem semantics, and a corresponding query statement (similar to lambda-Caculus) is used for querying in a knowledge base to obtain an answer. The second type is Information Extraction (Information Extraction): the method comprises the steps of extracting an entity in a question, inquiring the entity in a knowledge base to obtain a knowledge base subgraph taking the entity node as the center, taking each node or edge in the subgraph as a candidate answer, extracting information according to certain rules or templates by observing the question to obtain a question feature vector, and establishing a classifier to screen the candidate answer by inputting the question feature vector, thereby obtaining a final answer. The third type is Vector Modeling (Vector Modeling), which is characterized in that the idea of the method is closer to the idea of information extraction, candidate answers are obtained according to questions, the questions and the candidate answers are both mapped into Distributed expressions (Distributed expressions), and the Distributed expressions are trained through training data, so that the scores (usually in the form of dot multiplication) of the Vector expressions of the questions and the correct answers are as high as possible.
In general, a simple knowledge-graph-based question-answer (kG-Simpleqa) involves two key subtasks, (1) entity linking. The purpose of entity linking is to detect the entities mentioned in the problem and link them into the KG; (2) relationship prediction, the subtask identifies relationships in the knowledge graph about the entity that the question asks. For example, the question "what is the question of the bone map writer in? ", it is necessary to find the expression of an entity in the knowledge-graph in the question: "skope megazine", which links it to the corresponding entity "m.03c 4nk" in the knowledge-graph, and the relationship about that entity asked in the question: "book/periodic/language".
Entity linking presents some unsolved problems, namely the problem of entity ambiguity and the problem of OOV (the entity in the problem cannot find the corresponding vector expression in the pre-trained word vector model). Entity ambiguity problem this means that different entities in the knowledge-graph have the same name, which creates a huge impediment to how to link the entity in question to the correct entity in the knowledge-graph. For example, in the above example, the entity involved in the problem is an "applet," but there are many entities in the knowledge graph named "applet," which creates an entity confusion problem. To address the issues of entity confusion and OOV, some previous work has proposed some models. Lukovnikov et al introduce character-level coding of each word in the problem when vectorizing the problem, combine with word-level coding, as the vector representation of the problem, well solve the OOV problem, but because 92.9% of the words related to OOV are all entities or a part of the entities, if character-level coding is used, the semantics of the entities will be lost, which is a little information loss in the entity links; to address the problem of entity confusion, dai et al encode the type information of an entity as a vector representation of the entity. Each dimension of the type vector is either 1 or 0, indicating whether the entity is associated with a particular type, so the dimension of the vector is the number of entity types in the knowledge-graph. This approach may work well for entity confusion issues, but does not take into account some information about the entity itself. Yin w. et al, when encoding a problem in an entity link, link the character-level code and the word-level code of each word in the problem together as the code of the problem, and when encoding an entity, comprehensively consider the character-level of the entity name and the word-level code of the entity type.
However, the knowledge graph has less type information about entities, only one level of coding is considered to be insufficient to solve the problem of entity confusion, and the character-level coding of the problem can lose some important semantic information.
In recent years, some neural network models combining attention mechanism have been proposed, in entity linkage, the main task of the model is to make vectorization of the problem better show information related to the entity, and the part of the problem related to the entity can be utilized to the maximum, but the model is generally complex and cannot well deal with the problem of entity confusion.
Disclosure of Invention
The invention aims to provide an improved method for entity linkage in simple question answering based on a knowledge graph, which solves the technical problems that the semantic information is not lost while the OOV problem is solved, and entity confusion is well processed due to the consideration of the information of three layers of entities.
In order to achieve the purpose, the invention adopts the following technical scheme:
an improved method for entity linkage in simple question answering based on knowledge graph includes the following steps:
step 1: establishing a central server and a problem input client, wherein the problem input client is used for collecting problem data and transmitting the problem data to the central server through the Internet for processing;
establishing an entity detection module, an entity candidate set module, a knowledge graph retrieval module and an entity matching module in a central server;
the knowledge-map retrieval module is used for being in butt joint with the open-source knowledge-map KG and providing retrieval service related to the open-source knowledge-map KG;
step 2: after the central server receives the question data, the entity detection module detects the question data and predicts the subject words of the question in the question data, and the steps are as follows:
step A1: establishing a BILSTM-CRF model for sequence labeling problem;
step A2: according to the BILSTM-CRF model, two labels "i" and "o" are used as labels for each word in the question data, wherein the corresponding word is part of the question subject word as denoted by "i";
step A3: obtaining each question subject word in the question data through the method of the step A1 and the step A2;
and step 3: the knowledge map retrieval module transmits all entities corresponding to the entity names which can be perfectly matched with the subject words of the questions to the entity candidate set module through retrieving the knowledge map;
and 4, step 4: the entity candidate set module establishes an entity candidate set, and all the entities retrieved in the step 3 are placed into the entity candidate set for storage after being screened, namely, all the entities corresponding to the entity names which can be partially matched with the n grams of the question subject words are kept in the entity candidate set, wherein the value of n is from high to low, if the n grams are not the question subject words per se and the number of the matched entities is more than 50, the n grams are discarded;
and 5: the entity matching module reads the entity candidate set and selects n entities with the highest matching scores with the problem data from the entity candidate set, and the steps are as follows:
step C1: adopting word-level to encode the problem data, obtaining the vector expression of the problem through a pre-training word vector model, then taking the vector expression as the input of a BILSTM, and finally performing max-posing on the hidden vector to obtain the final vector expression of the problem data, namely, the vector encoding of the problem data;
step C2: acquiring an entity candidate set, and performing three-level coding on entities in the entity candidate set, wherein the three-level coding comprises performing word-level coding on names of the entities, performing type-level coding on types of the entities and performing word-level coding on the types of the entities;
obtaining type vector codes of type-level entities in the entity candidate set, vector codes of names of word-level entities and vector codes of types of word-level entities;
step C3: and respectively carrying out similarity calculation on the vector code of the problem data and the type vector code of the type-level entity, the vector code of the name of the word-level entity and the vector code of the type of the word-level entity, and taking n candidate entities with the highest scores as predicted entities.
Preferably, when step 4 is performed, the partial matching has the following limitations: the number of words of the entity name in the knowledge graph cannot exceed the number of words in the problem subject word by one.
Preferably, in executing step C1, when the OOV problem is encountered, word-level encoding is performed on the types of entities that exceed the pre-trained model.
Preferably, in the step C2, the type-level coding is performed by using a bag-of-words model, i.e. the vector dimension is the number of total entity types in the knowledge graph.
Preferably, in step C3, similarity calculation is performed on the three-level vector codes of the entities in the entity candidate set and the vector code of the question data, and finally an average value is obtained, after BILSTM is performed on the vector codes of the question data, each word has hidden layer vectors in two directions, namely, forward and backward, and the hidden layer vectors of each word are obtained by splicing the hidden layer vectors.
Preferably, when step C3 is executed, the hidden layer vector of the right direction of the last word in the question data is concatenated with the hidden vector of the left direction of the first word to serve as the vector code of the question data, so that the coded information of all words in both directions is utilized, and the specific calculation method is as follows:
where qs (q) represents the vector code for the problem data, et(s) represents the type vector code for the entity of type-level, el(s) is the vector code for the name of the entity of word-level, and ew(s) is the vector code for the type of the entity of word-level.
The invention relates to an improved method for entity linkage in simple question answering based on a knowledge graph, which solves the problem of OOV (object oriented programming) without losing semantic information, well solves the technical problem of entity confusion by considering information of three layers of entities, adopts a unique problem coding mode, takes the type of a word which cannot be represented by a vector in a word vector model to code when the problem is coded, retains the semantic information of the word while solving the problem of OOV, and provides a method for coding the entity by three layers to solve the problem of entity confusion, fully utilizes the type information and name information of the entity and combines the coding mode of the problem, thereby effectively solving the problems of entity confusion and OOV.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
Fig. 1 shows an improved method for linking entities in a simple knowledge-graph-based question-answer, which includes the following steps:
step 1: establishing a central server and a problem input client, wherein the problem input client is used for collecting problem data and transmitting the problem data to the central server through the Internet for processing;
establishing an entity detection module, an entity candidate set module, a knowledge graph retrieval module and an entity matching module in a central server;
the knowledge-map retrieval module is used for being in butt joint with the open-source knowledge-map KG and providing retrieval service related to the open-source knowledge-map KG;
step 2: after the central server receives the question data, the entity detection module detects the question data and predicts the subject words of the question in the question data, and the steps are as follows:
step A1: establishing a BILSTM-CRF model for sequence labeling problem;
step A2: according to the BILSTM-CRF model, two labels "i" and "o" are used as labels for each word in the question data, wherein the corresponding word is part of the question subject word as denoted by "i";
step A3: obtaining subject words of the entity in the question through the methods of the step A1 and the step A2;
in the present embodiment, for example, the question data is "what language is skin map writer in? The term "topic word of the question in the question data is" sketch map ".
This subtask is considered as training a sequence tagging problem, which the present invention solves by training a BILSTM-CRF model. Two labels, "i" and "o" are used as labels for each word in the question, indicating that its corresponding word is part of the question subject word. Through this step, the subject word of the question in each question data can be obtained.
And step 3: the knowledge map retrieval module transmits all entities corresponding to the entity names which can be perfectly matched with the subject words of the questions to the entity candidate set module through retrieving the knowledge map;
knowledge-graphs contain millions of entities, it is impractical to code them to compare similarity to a problem, and the present invention creates a candidate set based on the results of entity detection.
And 4, step 4: the entity candidate set module establishes an entity candidate set, and all the entities retrieved in the step 3 are placed into the entity candidate set for storage after being screened, namely, all the entities corresponding to the entity names which can be partially matched with the n grams of the question subject words are kept in the entity candidate set, wherein the value of n is from high to low, if the n grams are not the question subject words per se and the number of the matched entities is more than 50, the n grams are discarded;
and 5: the entity matching module reads the entity candidate set and selects n entities with the highest matching scores with the problem data from the entity candidate set, and the steps are as follows:
step C1: adopting word-level to encode the problem data, obtaining the vector expression of the problem through a pre-training word vector model, then taking the vector expression as the input of a BILSTM, and finally performing max-posing on the hidden vector to obtain the final vector expression of the problem data, namely, the vector encoding of the problem data;
performing word-level coding on the problem data, and matching the corresponding types of words which cannot obtain corresponding vectors in the pre-training vector model in the problem data in KG (FreeBase is used in the embodiment);
since 88.5% of problem subject words encountering OOV problem can only be matched to one entity in FreeBase, the influence of entity confusion problem is small, if the problem of entity confusion is encountered, the type of the entity with highest frequency in FreeBase triples is used as the code of the problem subject words;
as shown in fig. 1, for the question data "what language is skin map writer in? "where" skope "cannot find the corresponding vector in the pre-trained vector model and is part of the question topic word" skope map "for the question data, the vector of the type of the question topic word" skope map "is used as its vector representation, and it is first necessary to match the" skope map "to the unique entity" m.03c14nk "in FreeBase.
Then, taking the information of word-level of the type of m.03c14nk: the method comprises the steps of connecting vectors obtained by words in a pre-training word vector model to be used as vectorization expression of "skin map", taking the vectorization expression of each word in problem data as input of BILSTM, connecting hidden vectors obtained in a forward sequence with hidden vectors obtained in a backward sequence, and finally obtaining a vector with a fixed dimension through max-posing processing.
Step C2: acquiring an entity candidate set, and performing three-level coding on entities in the entity candidate set, wherein the three-level coding comprises performing word-level coding on names of the entities, performing type-level coding on types of the entities and performing word-level coding on the types of the entities;
obtaining type vector codes of type-level entities in the entity candidate set, vector codes of names of word-level entities and vector codes of types of word-level entities;
in order to solve the entity confusion problem existing in the entity link, the type information of the entity needs to be utilized, but the type information about the entity in the FreeBase is not rich enough (the types of a plurality of entities in the FreeBase are simplified into common/topic), and the problem cannot be effectively solved by utilizing information of one layer alone, so that the type information of the entity needs to be enriched by utilizing multi-layer coding of the type. The method adopts three-level coding for entity names and entity types.
As shown in fig. 1, for an entity "m.03c14nk" in an entity candidate set, firstly, obtaining a name "skope map" of the entity from the attribute "type.object.name" of the entity in FreeBase, then performing word segmentation on the name of the entity to obtain a sequence { skope, map }, wherein the word vector representation cannot be obtained in a pre-training model by the "skope", so that a vector is obtained by performing random initialization on the word vector, the vector is used as an input of a BILSTM, an output hidden vector is subjected to max-posing processing to obtain a vector with fixed dimensions, and at this time, word-level codes of the name of the entity are obtained; the entity attributes "type/object/type" and "common/topic/non-table _ types" are used to derive the type "book/major", "book/periodic", "common/topic" of the entity, using "/", "_" to tokenize the entity type, resulting in the sequence { book, major, periodic, common, topic }, and similarly, a vector representation is derived in GloVe;
and obtaining the word-level codes of the entity types through BILSTM and max-pooling processing, wherein for the codes of the type-level of the entity types, vector dimensionality is fixed to the number of the entity types of FreeBase, and the vector characteristics are extracted without model training.
Step C3: and respectively carrying out similarity calculation on the vector code of the problem data and the type vector code of the type-level entity, the vector code of the name of the word-level entity and the vector code of the type of the word-level entity, and taking n candidate entities with the highest scores as predicted entities.
Preferably, when step 4 is performed, the partial matching has the following limitations: the number of words of the entity name in the knowledge graph cannot exceed the number of words in the problem subject word by one.
Preferably, in executing step C1, when the OOV problem is encountered, word-level encoding is performed on the types of entities that exceed the pre-trained model.
Preferably, in the step C2, the type-level coding is performed by using a bag-of-words model, i.e. the vector dimension is the number of total entity types in the knowledge graph.
The traditional technical scheme adopts vector splicing of three levels as a vector of an entity, then similarity is calculated with a vector of a problem, because only the vector of type-level of the entity type has 500 dimensions, and the vector expression of the entity can reach thousands of dimensions by adding loud splicing of other levels, and a great error can be caused.
The traditional technical scheme adopts the concatenation of each word, then uses the pooling operation to obtain the vector of fixed dimension, this kind of method can lead to losing too much information, and the effect is very poor, does not also adopt and utilize the entity vector of three levels to produce the problem vector representation that this level corresponds to, then calculates the similarity, this kind of method can make the encoding of problem be close to the entity information that contains to the greatest extent, but there is a defect, the ability of distinguishing the candidate entity that the similarity is very high is very poor, because many entities in the entity candidate set possess the same name, even some entity's type information is very close.
In the invention, the following technical scheme is adopted for improvement instead of the traditional technical scheme:
preferably, in step C3, similarity calculation is performed on the three-level vector codes of the entities in the entity candidate set and the vector code of the question data, and finally an average value is obtained, after BILSTM is performed on the vector codes of the question data, each word has hidden layer vectors in two directions, namely, forward and backward, and the hidden layer vectors of each word are obtained by splicing the hidden layer vectors.
Preferably, when step C3 is executed, the hidden layer vector of the right direction of the last word in the question data is concatenated with the hidden vector of the left direction of the first word to serve as the vector code of the question data, so that the coded information of all words in both directions is utilized, and the specific calculation method is as follows:
where qs (q) represents the vector code for the problem data, et(s) represents the type vector code for the entity of type-level, el(s) is the vector code for the name of the entity of word-level, and ew(s) is the vector code for the type of the entity of word-level.
The patent relates to simple question answering (SimpleQA), which means that only reasoning needs to be carried out based on a fact in a knowledge graph to answer a question, and the invention uses a plurality of models (Bilstm, Bigru) for deep learning to complete the simple question answering based on the knowledge graph.
The invention relates to an improved method for entity linkage in simple question answering based on a knowledge graph, which solves the problem of OOV (object oriented programming) without losing semantic information, well solves the technical problem of entity confusion by considering information of three layers of entities, adopts a unique problem coding mode, takes the type of a word which cannot be represented by a vector in a word vector model to code when the problem is coded, retains the semantic information of the word while solving the problem of OOV, and provides a method for coding the entity by three layers to solve the problem of entity confusion, fully utilizes the type information and name information of the entity and combines the coding mode of the problem, thereby effectively solving the problems of entity confusion and OOV.
Claims (6)
1. An improved method for entity link in simple question answering based on knowledge graph is characterized in that: the method comprises the following steps:
step 1: establishing a central server and a problem input client, wherein the problem input client is used for collecting problem data and transmitting the problem data to the central server through the Internet for processing;
establishing an entity detection module, an entity candidate set module, a knowledge graph retrieval module and an entity matching module in a central server;
the knowledge-map retrieval module is used for being in butt joint with the open-source knowledge-map KG and providing retrieval service related to the open-source knowledge-map KG;
step 2: after the central server receives the question data, the entity detection module detects the question data and predicts the subject words of the question in the question data, and the steps are as follows:
step A1: establishing a BILSTM-CRF model for sequence labeling problem;
step A2: according to the BILSTM-CRF model, two labels "i" and "o" are used as labels for each word in the question data, wherein the corresponding word is part of the question subject word as denoted by "i";
step A3: obtaining question subject words in the question data through the methods of the step A1 and the step A2;
and step 3: the knowledge map retrieval module transmits all entities corresponding to the entity names which can be perfectly matched with the problem subject words to the entity candidate set module through retrieving the knowledge map;
and 4, step 4: the entity candidate set module establishes an entity candidate set, and all the entities retrieved in the step 3 are placed into the entity candidate set for storage after being screened, namely, all the entities corresponding to the entity names which can be partially matched with the ngrams of the question subject words are kept into the entity candidate set, wherein the value of n is changed from high to low, and if the ngrams are not the question subject words per se and the number of the matched entities is more than 50, the ngrams is discarded;
and 5: the entity matching module reads the entity candidate set and selects n entities with the highest matching scores with the problem data from the entity candidate set, and the steps are as follows:
step C1: adopting word-level to encode the problem data, obtaining the vector expression of the problem through a pre-training word vector model, then taking the vector expression as the input of a BILSTM, and finally performing max-posing on the hidden vector to obtain the final vector expression of the problem data, namely, the vector encoding of the problem data;
step C2: acquiring an entity candidate set, and performing three-level coding on entities in the entity candidate set, wherein the three-level coding comprises performing word-level coding on names of the entities, performing type-level coding on types of the entities and performing word-level coding on the types of the entities;
obtaining type vector codes of type-level entities in the entity candidate set, vector codes of names of word-level entities and vector codes of types of word-level entities;
step C3: and respectively carrying out similarity calculation on the vector code of the problem data and the type vector code of the type-level entity, the vector code of the name of the word-level entity and the vector code of the type of the word-level entity, and taking n candidate entities with the highest scores as predicted entities.
2. The method of claim 1, wherein the method comprises the steps of: in performing step 4, the partial match has the following limitations: the number of words of the entity name in the knowledge graph cannot exceed the number of words in the problem subject word by one.
3. The method of claim 1, wherein the method comprises the steps of: in executing step C1, when OOV problem is encountered, word-level encoding is performed on the type of entity beyond the pre-training model.
4. The method of claim 1, wherein the method comprises the steps of: in performing step C2, the type-level encoding is based on the bag-of-words model, i.e., the vector dimension is the number of total entity types in the knowledge graph.
5. The method of claim 1, wherein the method comprises the steps of: and C3, respectively carrying out similarity calculation on the vector codes of the three layers of the entities in the entity candidate set and the vector codes of the problem data, finally solving the average value of the similarity calculation, wherein each word has hidden layer vectors in two forward and backward directions after the BILSTM on the vector codes of the problem data, and splicing the hidden layer vectors to obtain the hidden layer vector of each word.
6. The method of claim 5, wherein the method comprises the steps of: in step C3, the hidden layer vector of the last word in the question data in the right direction and the hidden vector of the first word in the left direction are concatenated to form the vector code of the question data, so that the coded information of all words in both directions is utilized, and the specific calculation method is as follows:
where qs (q) represents the vector code for the problem data, et(s) represents the type vector code for the entity of type-level, el(s) is the vector code for the name of the entity of word-level, and ew(s) is the vector code for the type of the entity of word-level.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911131171.7A CN110909174B (en) | 2019-11-19 | 2019-11-19 | Knowledge graph-based method for improving entity link in simple question answering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911131171.7A CN110909174B (en) | 2019-11-19 | 2019-11-19 | Knowledge graph-based method for improving entity link in simple question answering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110909174A CN110909174A (en) | 2020-03-24 |
CN110909174B true CN110909174B (en) | 2022-01-04 |
Family
ID=69818090
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911131171.7A Expired - Fee Related CN110909174B (en) | 2019-11-19 | 2019-11-19 | Knowledge graph-based method for improving entity link in simple question answering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110909174B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113535970A (en) * | 2020-04-22 | 2021-10-22 | 阿里巴巴集团控股有限公司 | Information processing method and apparatus, electronic device, and computer-readable storage medium |
CN111797245B (en) * | 2020-07-27 | 2023-07-25 | 中国平安人寿保险股份有限公司 | Knowledge graph model-based information matching method and related device |
CN114691973A (en) * | 2020-12-31 | 2022-07-01 | 华为技术有限公司 | Recommendation method, recommendation network and related equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480125A (en) * | 2017-07-05 | 2017-12-15 | 重庆邮电大学 | A kind of relational links method of knowledge based collection of illustrative plates |
CN108875051A (en) * | 2018-06-28 | 2018-11-23 | 中译语通科技股份有限公司 | Knowledge mapping method for auto constructing and system towards magnanimity non-structured text |
CN109271524A (en) * | 2018-08-02 | 2019-01-25 | 中国科学院计算技术研究所 | Entity link method in knowledge base question answering system |
US10331402B1 (en) * | 2017-05-30 | 2019-06-25 | Amazon Technologies, Inc. | Search and knowledge base question answering for a voice user interface |
CN110298042A (en) * | 2019-06-26 | 2019-10-01 | 四川长虹电器股份有限公司 | Based on Bilstm-crf and knowledge mapping video display entity recognition method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11157540B2 (en) * | 2016-09-12 | 2021-10-26 | International Business Machines Corporation | Search space reduction for knowledge graph querying and interactions |
-
2019
- 2019-11-19 CN CN201911131171.7A patent/CN110909174B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10331402B1 (en) * | 2017-05-30 | 2019-06-25 | Amazon Technologies, Inc. | Search and knowledge base question answering for a voice user interface |
CN107480125A (en) * | 2017-07-05 | 2017-12-15 | 重庆邮电大学 | A kind of relational links method of knowledge based collection of illustrative plates |
CN108875051A (en) * | 2018-06-28 | 2018-11-23 | 中译语通科技股份有限公司 | Knowledge mapping method for auto constructing and system towards magnanimity non-structured text |
CN109271524A (en) * | 2018-08-02 | 2019-01-25 | 中国科学院计算技术研究所 | Entity link method in knowledge base question answering system |
CN110298042A (en) * | 2019-06-26 | 2019-10-01 | 四川长虹电器股份有限公司 | Based on Bilstm-crf and knowledge mapping video display entity recognition method |
Non-Patent Citations (2)
Title |
---|
Entity linking based on the co-occurrence graph and entity probability;Alan Eckhardt等;《Proceedings of the first international workshop on Entity recognition&disambiguation》;20140630;第37-44页 * |
基于多数据源的知识图谱构建方法研究;吴运兵等;《福州大学学报(自然科学版)》;20170630;第45卷(第3期);第329-335页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110909174A (en) | 2020-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lin et al. | Traceability transformed: Generating more accurate links with pre-trained bert models | |
CN107748757B (en) | Question-answering method based on knowledge graph | |
CN110097085B (en) | Lyric text generation method, training method, device, server and storage medium | |
US20200301954A1 (en) | Reply information obtaining method and apparatus | |
CN117009490A (en) | Training method and device for generating large language model based on knowledge base feedback | |
Kumar et al. | A review on chatbot design and implementation techniques | |
CN109871538A (en) | A kind of Chinese electronic health record name entity recognition method | |
CN111159414B (en) | Text classification method and system, electronic equipment and computer readable storage medium | |
CN115393692A (en) | Generation formula pre-training language model-based association text-to-image generation method | |
CN109857846B (en) | Method and device for matching user question and knowledge point | |
CN110909174B (en) | Knowledge graph-based method for improving entity link in simple question answering | |
CN108491515B (en) | Sentence pair matching degree prediction method for campus psychological consultation | |
CN107491655A (en) | Liver diseases information intelligent consultation method and system based on machine learning | |
CN114528898A (en) | Scene graph modification based on natural language commands | |
CN113705196A (en) | Chinese open information extraction method and device based on graph neural network | |
US20240119075A1 (en) | Method and system for generating longform technical question and answer dataset | |
US20230281392A1 (en) | Computer-readable recording medium storing computer program, machine learning method, and natural language processing apparatus | |
CN114048301A (en) | Satisfaction-based user simulation method and system | |
CN115203388A (en) | Machine reading understanding method and device, computer equipment and storage medium | |
CN114372454B (en) | Text information extraction method, model training method, device and storage medium | |
CN118261163B (en) | Intelligent evaluation report generation method and system based on transformer structure | |
CN112016299A (en) | Method and device for generating dependency syntax tree by using neural network executed by computer | |
CN115617954B (en) | Question answering method and device, electronic equipment and storage medium | |
CN115617974B (en) | Dialogue processing method, device, equipment and storage medium | |
CN115688792A (en) | Problem generation method and device based on document and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220104 |