[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110909174B - Knowledge graph-based method for improving entity link in simple question answering - Google Patents

Knowledge graph-based method for improving entity link in simple question answering Download PDF

Info

Publication number
CN110909174B
CN110909174B CN201911131171.7A CN201911131171A CN110909174B CN 110909174 B CN110909174 B CN 110909174B CN 201911131171 A CN201911131171 A CN 201911131171A CN 110909174 B CN110909174 B CN 110909174B
Authority
CN
China
Prior art keywords
entity
vector
word
entities
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201911131171.7A
Other languages
Chinese (zh)
Other versions
CN110909174A (en
Inventor
陈凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201911131171.7A priority Critical patent/CN110909174B/en
Publication of CN110909174A publication Critical patent/CN110909174A/en
Application granted granted Critical
Publication of CN110909174B publication Critical patent/CN110909174B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an improved method for entity linkage in simple question answering based on a knowledge graph, which belongs to the technical field of natural language processing, and comprises the steps of establishing a central server and a problem input client, establishing an entity detection module, an entity candidate set module, a knowledge graph retrieval module and an entity matching module in the central server, detecting problem data, establishing an entity candidate set, encoding the problem data, performing three-level encoding on entities in the entity candidate set, selecting n entities with the highest matching scores with the problem data from the entity candidate set, and adopting a unique problem encoding mode.

Description

Knowledge graph-based method for improving entity link in simple question answering
Technical Field
The invention belongs to the technical field of big data, and relates to a knowledge graph-based method for improving entity links in simple questions and answers.
Background
In recent years, more and more open source knowledge maps (KGs) have emerged that contain a large number of facts, such as FreeBase, Yago, and DBpedia. Question-answering (KG-QA) with knowledge graph as the answer source is a hot spot of recent research. There are two main ways to store knowledge maps: RDF-based storage, graph database-based storage.
Conventional KG-QA methods can be divided into three major classes of KG-QA. The first type is semantic parsing: the method is a partial linguistic method, and the main idea is that a natural language is converted into a series of formal logic forms, the logic forms are analyzed from bottom to top to obtain a logic form capable of expressing the whole problem semantics, and a corresponding query statement (similar to lambda-Caculus) is used for querying in a knowledge base to obtain an answer. The second type is Information Extraction (Information Extraction): the method comprises the steps of extracting an entity in a question, inquiring the entity in a knowledge base to obtain a knowledge base subgraph taking the entity node as the center, taking each node or edge in the subgraph as a candidate answer, extracting information according to certain rules or templates by observing the question to obtain a question feature vector, and establishing a classifier to screen the candidate answer by inputting the question feature vector, thereby obtaining a final answer. The third type is Vector Modeling (Vector Modeling), which is characterized in that the idea of the method is closer to the idea of information extraction, candidate answers are obtained according to questions, the questions and the candidate answers are both mapped into Distributed expressions (Distributed expressions), and the Distributed expressions are trained through training data, so that the scores (usually in the form of dot multiplication) of the Vector expressions of the questions and the correct answers are as high as possible.
In general, a simple knowledge-graph-based question-answer (kG-Simpleqa) involves two key subtasks, (1) entity linking. The purpose of entity linking is to detect the entities mentioned in the problem and link them into the KG; (2) relationship prediction, the subtask identifies relationships in the knowledge graph about the entity that the question asks. For example, the question "what is the question of the bone map writer in? ", it is necessary to find the expression of an entity in the knowledge-graph in the question: "skope megazine", which links it to the corresponding entity "m.03c 4nk" in the knowledge-graph, and the relationship about that entity asked in the question: "book/periodic/language".
Entity linking presents some unsolved problems, namely the problem of entity ambiguity and the problem of OOV (the entity in the problem cannot find the corresponding vector expression in the pre-trained word vector model). Entity ambiguity problem this means that different entities in the knowledge-graph have the same name, which creates a huge impediment to how to link the entity in question to the correct entity in the knowledge-graph. For example, in the above example, the entity involved in the problem is an "applet," but there are many entities in the knowledge graph named "applet," which creates an entity confusion problem. To address the issues of entity confusion and OOV, some previous work has proposed some models. Lukovnikov et al introduce character-level coding of each word in the problem when vectorizing the problem, combine with word-level coding, as the vector representation of the problem, well solve the OOV problem, but because 92.9% of the words related to OOV are all entities or a part of the entities, if character-level coding is used, the semantics of the entities will be lost, which is a little information loss in the entity links; to address the problem of entity confusion, dai et al encode the type information of an entity as a vector representation of the entity. Each dimension of the type vector is either 1 or 0, indicating whether the entity is associated with a particular type, so the dimension of the vector is the number of entity types in the knowledge-graph. This approach may work well for entity confusion issues, but does not take into account some information about the entity itself. Yin w. et al, when encoding a problem in an entity link, link the character-level code and the word-level code of each word in the problem together as the code of the problem, and when encoding an entity, comprehensively consider the character-level of the entity name and the word-level code of the entity type.
However, the knowledge graph has less type information about entities, only one level of coding is considered to be insufficient to solve the problem of entity confusion, and the character-level coding of the problem can lose some important semantic information.
In recent years, some neural network models combining attention mechanism have been proposed, in entity linkage, the main task of the model is to make vectorization of the problem better show information related to the entity, and the part of the problem related to the entity can be utilized to the maximum, but the model is generally complex and cannot well deal with the problem of entity confusion.
Disclosure of Invention
The invention aims to provide an improved method for entity linkage in simple question answering based on a knowledge graph, which solves the technical problems that the semantic information is not lost while the OOV problem is solved, and entity confusion is well processed due to the consideration of the information of three layers of entities.
In order to achieve the purpose, the invention adopts the following technical scheme:
an improved method for entity linkage in simple question answering based on knowledge graph includes the following steps:
step 1: establishing a central server and a problem input client, wherein the problem input client is used for collecting problem data and transmitting the problem data to the central server through the Internet for processing;
establishing an entity detection module, an entity candidate set module, a knowledge graph retrieval module and an entity matching module in a central server;
the knowledge-map retrieval module is used for being in butt joint with the open-source knowledge-map KG and providing retrieval service related to the open-source knowledge-map KG;
step 2: after the central server receives the question data, the entity detection module detects the question data and predicts the subject words of the question in the question data, and the steps are as follows:
step A1: establishing a BILSTM-CRF model for sequence labeling problem;
step A2: according to the BILSTM-CRF model, two labels "i" and "o" are used as labels for each word in the question data, wherein the corresponding word is part of the question subject word as denoted by "i";
step A3: obtaining each question subject word in the question data through the method of the step A1 and the step A2;
and step 3: the knowledge map retrieval module transmits all entities corresponding to the entity names which can be perfectly matched with the subject words of the questions to the entity candidate set module through retrieving the knowledge map;
and 4, step 4: the entity candidate set module establishes an entity candidate set, and all the entities retrieved in the step 3 are placed into the entity candidate set for storage after being screened, namely, all the entities corresponding to the entity names which can be partially matched with the n grams of the question subject words are kept in the entity candidate set, wherein the value of n is from high to low, if the n grams are not the question subject words per se and the number of the matched entities is more than 50, the n grams are discarded;
and 5: the entity matching module reads the entity candidate set and selects n entities with the highest matching scores with the problem data from the entity candidate set, and the steps are as follows:
step C1: adopting word-level to encode the problem data, obtaining the vector expression of the problem through a pre-training word vector model, then taking the vector expression as the input of a BILSTM, and finally performing max-posing on the hidden vector to obtain the final vector expression of the problem data, namely, the vector encoding of the problem data;
step C2: acquiring an entity candidate set, and performing three-level coding on entities in the entity candidate set, wherein the three-level coding comprises performing word-level coding on names of the entities, performing type-level coding on types of the entities and performing word-level coding on the types of the entities;
obtaining type vector codes of type-level entities in the entity candidate set, vector codes of names of word-level entities and vector codes of types of word-level entities;
step C3: and respectively carrying out similarity calculation on the vector code of the problem data and the type vector code of the type-level entity, the vector code of the name of the word-level entity and the vector code of the type of the word-level entity, and taking n candidate entities with the highest scores as predicted entities.
Preferably, when step 4 is performed, the partial matching has the following limitations: the number of words of the entity name in the knowledge graph cannot exceed the number of words in the problem subject word by one.
Preferably, in executing step C1, when the OOV problem is encountered, word-level encoding is performed on the types of entities that exceed the pre-trained model.
Preferably, in the step C2, the type-level coding is performed by using a bag-of-words model, i.e. the vector dimension is the number of total entity types in the knowledge graph.
Preferably, in step C3, similarity calculation is performed on the three-level vector codes of the entities in the entity candidate set and the vector code of the question data, and finally an average value is obtained, after BILSTM is performed on the vector codes of the question data, each word has hidden layer vectors in two directions, namely, forward and backward, and the hidden layer vectors of each word are obtained by splicing the hidden layer vectors.
Preferably, when step C3 is executed, the hidden layer vector of the right direction of the last word in the question data is concatenated with the hidden vector of the left direction of the first word to serve as the vector code of the question data, so that the coded information of all words in both directions is utilized, and the specific calculation method is as follows:
Figure GDA0003314543770000041
where qs (q) represents the vector code for the problem data, et(s) represents the type vector code for the entity of type-level, el(s) is the vector code for the name of the entity of word-level, and ew(s) is the vector code for the type of the entity of word-level.
The invention relates to an improved method for entity linkage in simple question answering based on a knowledge graph, which solves the problem of OOV (object oriented programming) without losing semantic information, well solves the technical problem of entity confusion by considering information of three layers of entities, adopts a unique problem coding mode, takes the type of a word which cannot be represented by a vector in a word vector model to code when the problem is coded, retains the semantic information of the word while solving the problem of OOV, and provides a method for coding the entity by three layers to solve the problem of entity confusion, fully utilizes the type information and name information of the entity and combines the coding mode of the problem, thereby effectively solving the problems of entity confusion and OOV.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
Fig. 1 shows an improved method for linking entities in a simple knowledge-graph-based question-answer, which includes the following steps:
step 1: establishing a central server and a problem input client, wherein the problem input client is used for collecting problem data and transmitting the problem data to the central server through the Internet for processing;
establishing an entity detection module, an entity candidate set module, a knowledge graph retrieval module and an entity matching module in a central server;
the knowledge-map retrieval module is used for being in butt joint with the open-source knowledge-map KG and providing retrieval service related to the open-source knowledge-map KG;
step 2: after the central server receives the question data, the entity detection module detects the question data and predicts the subject words of the question in the question data, and the steps are as follows:
step A1: establishing a BILSTM-CRF model for sequence labeling problem;
step A2: according to the BILSTM-CRF model, two labels "i" and "o" are used as labels for each word in the question data, wherein the corresponding word is part of the question subject word as denoted by "i";
step A3: obtaining subject words of the entity in the question through the methods of the step A1 and the step A2;
in the present embodiment, for example, the question data is "what language is skin map writer in? The term "topic word of the question in the question data is" sketch map ".
This subtask is considered as training a sequence tagging problem, which the present invention solves by training a BILSTM-CRF model. Two labels, "i" and "o" are used as labels for each word in the question, indicating that its corresponding word is part of the question subject word. Through this step, the subject word of the question in each question data can be obtained.
And step 3: the knowledge map retrieval module transmits all entities corresponding to the entity names which can be perfectly matched with the subject words of the questions to the entity candidate set module through retrieving the knowledge map;
knowledge-graphs contain millions of entities, it is impractical to code them to compare similarity to a problem, and the present invention creates a candidate set based on the results of entity detection.
And 4, step 4: the entity candidate set module establishes an entity candidate set, and all the entities retrieved in the step 3 are placed into the entity candidate set for storage after being screened, namely, all the entities corresponding to the entity names which can be partially matched with the n grams of the question subject words are kept in the entity candidate set, wherein the value of n is from high to low, if the n grams are not the question subject words per se and the number of the matched entities is more than 50, the n grams are discarded;
and 5: the entity matching module reads the entity candidate set and selects n entities with the highest matching scores with the problem data from the entity candidate set, and the steps are as follows:
step C1: adopting word-level to encode the problem data, obtaining the vector expression of the problem through a pre-training word vector model, then taking the vector expression as the input of a BILSTM, and finally performing max-posing on the hidden vector to obtain the final vector expression of the problem data, namely, the vector encoding of the problem data;
performing word-level coding on the problem data, and matching the corresponding types of words which cannot obtain corresponding vectors in the pre-training vector model in the problem data in KG (FreeBase is used in the embodiment);
since 88.5% of problem subject words encountering OOV problem can only be matched to one entity in FreeBase, the influence of entity confusion problem is small, if the problem of entity confusion is encountered, the type of the entity with highest frequency in FreeBase triples is used as the code of the problem subject words;
as shown in fig. 1, for the question data "what language is skin map writer in? "where" skope "cannot find the corresponding vector in the pre-trained vector model and is part of the question topic word" skope map "for the question data, the vector of the type of the question topic word" skope map "is used as its vector representation, and it is first necessary to match the" skope map "to the unique entity" m.03c14nk "in FreeBase.
Then, taking the information of word-level of the type of m.03c14nk: the method comprises the steps of connecting vectors obtained by words in a pre-training word vector model to be used as vectorization expression of "skin map", taking the vectorization expression of each word in problem data as input of BILSTM, connecting hidden vectors obtained in a forward sequence with hidden vectors obtained in a backward sequence, and finally obtaining a vector with a fixed dimension through max-posing processing.
Step C2: acquiring an entity candidate set, and performing three-level coding on entities in the entity candidate set, wherein the three-level coding comprises performing word-level coding on names of the entities, performing type-level coding on types of the entities and performing word-level coding on the types of the entities;
obtaining type vector codes of type-level entities in the entity candidate set, vector codes of names of word-level entities and vector codes of types of word-level entities;
in order to solve the entity confusion problem existing in the entity link, the type information of the entity needs to be utilized, but the type information about the entity in the FreeBase is not rich enough (the types of a plurality of entities in the FreeBase are simplified into common/topic), and the problem cannot be effectively solved by utilizing information of one layer alone, so that the type information of the entity needs to be enriched by utilizing multi-layer coding of the type. The method adopts three-level coding for entity names and entity types.
As shown in fig. 1, for an entity "m.03c14nk" in an entity candidate set, firstly, obtaining a name "skope map" of the entity from the attribute "type.object.name" of the entity in FreeBase, then performing word segmentation on the name of the entity to obtain a sequence { skope, map }, wherein the word vector representation cannot be obtained in a pre-training model by the "skope", so that a vector is obtained by performing random initialization on the word vector, the vector is used as an input of a BILSTM, an output hidden vector is subjected to max-posing processing to obtain a vector with fixed dimensions, and at this time, word-level codes of the name of the entity are obtained; the entity attributes "type/object/type" and "common/topic/non-table _ types" are used to derive the type "book/major", "book/periodic", "common/topic" of the entity, using "/", "_" to tokenize the entity type, resulting in the sequence { book, major, periodic, common, topic }, and similarly, a vector representation is derived in GloVe;
and obtaining the word-level codes of the entity types through BILSTM and max-pooling processing, wherein for the codes of the type-level of the entity types, vector dimensionality is fixed to the number of the entity types of FreeBase, and the vector characteristics are extracted without model training.
Step C3: and respectively carrying out similarity calculation on the vector code of the problem data and the type vector code of the type-level entity, the vector code of the name of the word-level entity and the vector code of the type of the word-level entity, and taking n candidate entities with the highest scores as predicted entities.
Preferably, when step 4 is performed, the partial matching has the following limitations: the number of words of the entity name in the knowledge graph cannot exceed the number of words in the problem subject word by one.
Preferably, in executing step C1, when the OOV problem is encountered, word-level encoding is performed on the types of entities that exceed the pre-trained model.
Preferably, in the step C2, the type-level coding is performed by using a bag-of-words model, i.e. the vector dimension is the number of total entity types in the knowledge graph.
The traditional technical scheme adopts vector splicing of three levels as a vector of an entity, then similarity is calculated with a vector of a problem, because only the vector of type-level of the entity type has 500 dimensions, and the vector expression of the entity can reach thousands of dimensions by adding loud splicing of other levels, and a great error can be caused.
The traditional technical scheme adopts the concatenation of each word, then uses the pooling operation to obtain the vector of fixed dimension, this kind of method can lead to losing too much information, and the effect is very poor, does not also adopt and utilize the entity vector of three levels to produce the problem vector representation that this level corresponds to, then calculates the similarity, this kind of method can make the encoding of problem be close to the entity information that contains to the greatest extent, but there is a defect, the ability of distinguishing the candidate entity that the similarity is very high is very poor, because many entities in the entity candidate set possess the same name, even some entity's type information is very close.
In the invention, the following technical scheme is adopted for improvement instead of the traditional technical scheme:
preferably, in step C3, similarity calculation is performed on the three-level vector codes of the entities in the entity candidate set and the vector code of the question data, and finally an average value is obtained, after BILSTM is performed on the vector codes of the question data, each word has hidden layer vectors in two directions, namely, forward and backward, and the hidden layer vectors of each word are obtained by splicing the hidden layer vectors.
Preferably, when step C3 is executed, the hidden layer vector of the right direction of the last word in the question data is concatenated with the hidden vector of the left direction of the first word to serve as the vector code of the question data, so that the coded information of all words in both directions is utilized, and the specific calculation method is as follows:
Figure GDA0003314543770000091
where qs (q) represents the vector code for the problem data, et(s) represents the type vector code for the entity of type-level, el(s) is the vector code for the name of the entity of word-level, and ew(s) is the vector code for the type of the entity of word-level.
The patent relates to simple question answering (SimpleQA), which means that only reasoning needs to be carried out based on a fact in a knowledge graph to answer a question, and the invention uses a plurality of models (Bilstm, Bigru) for deep learning to complete the simple question answering based on the knowledge graph.
The invention relates to an improved method for entity linkage in simple question answering based on a knowledge graph, which solves the problem of OOV (object oriented programming) without losing semantic information, well solves the technical problem of entity confusion by considering information of three layers of entities, adopts a unique problem coding mode, takes the type of a word which cannot be represented by a vector in a word vector model to code when the problem is coded, retains the semantic information of the word while solving the problem of OOV, and provides a method for coding the entity by three layers to solve the problem of entity confusion, fully utilizes the type information and name information of the entity and combines the coding mode of the problem, thereby effectively solving the problems of entity confusion and OOV.

Claims (6)

1. An improved method for entity link in simple question answering based on knowledge graph is characterized in that: the method comprises the following steps:
step 1: establishing a central server and a problem input client, wherein the problem input client is used for collecting problem data and transmitting the problem data to the central server through the Internet for processing;
establishing an entity detection module, an entity candidate set module, a knowledge graph retrieval module and an entity matching module in a central server;
the knowledge-map retrieval module is used for being in butt joint with the open-source knowledge-map KG and providing retrieval service related to the open-source knowledge-map KG;
step 2: after the central server receives the question data, the entity detection module detects the question data and predicts the subject words of the question in the question data, and the steps are as follows:
step A1: establishing a BILSTM-CRF model for sequence labeling problem;
step A2: according to the BILSTM-CRF model, two labels "i" and "o" are used as labels for each word in the question data, wherein the corresponding word is part of the question subject word as denoted by "i";
step A3: obtaining question subject words in the question data through the methods of the step A1 and the step A2;
and step 3: the knowledge map retrieval module transmits all entities corresponding to the entity names which can be perfectly matched with the problem subject words to the entity candidate set module through retrieving the knowledge map;
and 4, step 4: the entity candidate set module establishes an entity candidate set, and all the entities retrieved in the step 3 are placed into the entity candidate set for storage after being screened, namely, all the entities corresponding to the entity names which can be partially matched with the ngrams of the question subject words are kept into the entity candidate set, wherein the value of n is changed from high to low, and if the ngrams are not the question subject words per se and the number of the matched entities is more than 50, the ngrams is discarded;
and 5: the entity matching module reads the entity candidate set and selects n entities with the highest matching scores with the problem data from the entity candidate set, and the steps are as follows:
step C1: adopting word-level to encode the problem data, obtaining the vector expression of the problem through a pre-training word vector model, then taking the vector expression as the input of a BILSTM, and finally performing max-posing on the hidden vector to obtain the final vector expression of the problem data, namely, the vector encoding of the problem data;
step C2: acquiring an entity candidate set, and performing three-level coding on entities in the entity candidate set, wherein the three-level coding comprises performing word-level coding on names of the entities, performing type-level coding on types of the entities and performing word-level coding on the types of the entities;
obtaining type vector codes of type-level entities in the entity candidate set, vector codes of names of word-level entities and vector codes of types of word-level entities;
step C3: and respectively carrying out similarity calculation on the vector code of the problem data and the type vector code of the type-level entity, the vector code of the name of the word-level entity and the vector code of the type of the word-level entity, and taking n candidate entities with the highest scores as predicted entities.
2. The method of claim 1, wherein the method comprises the steps of: in performing step 4, the partial match has the following limitations: the number of words of the entity name in the knowledge graph cannot exceed the number of words in the problem subject word by one.
3. The method of claim 1, wherein the method comprises the steps of: in executing step C1, when OOV problem is encountered, word-level encoding is performed on the type of entity beyond the pre-training model.
4. The method of claim 1, wherein the method comprises the steps of: in performing step C2, the type-level encoding is based on the bag-of-words model, i.e., the vector dimension is the number of total entity types in the knowledge graph.
5. The method of claim 1, wherein the method comprises the steps of: and C3, respectively carrying out similarity calculation on the vector codes of the three layers of the entities in the entity candidate set and the vector codes of the problem data, finally solving the average value of the similarity calculation, wherein each word has hidden layer vectors in two forward and backward directions after the BILSTM on the vector codes of the problem data, and splicing the hidden layer vectors to obtain the hidden layer vector of each word.
6. The method of claim 5, wherein the method comprises the steps of: in step C3, the hidden layer vector of the last word in the question data in the right direction and the hidden vector of the first word in the left direction are concatenated to form the vector code of the question data, so that the coded information of all words in both directions is utilized, and the specific calculation method is as follows:
Figure FDA0003314543760000021
where qs (q) represents the vector code for the problem data, et(s) represents the type vector code for the entity of type-level, el(s) is the vector code for the name of the entity of word-level, and ew(s) is the vector code for the type of the entity of word-level.
CN201911131171.7A 2019-11-19 2019-11-19 Knowledge graph-based method for improving entity link in simple question answering Expired - Fee Related CN110909174B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911131171.7A CN110909174B (en) 2019-11-19 2019-11-19 Knowledge graph-based method for improving entity link in simple question answering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911131171.7A CN110909174B (en) 2019-11-19 2019-11-19 Knowledge graph-based method for improving entity link in simple question answering

Publications (2)

Publication Number Publication Date
CN110909174A CN110909174A (en) 2020-03-24
CN110909174B true CN110909174B (en) 2022-01-04

Family

ID=69818090

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911131171.7A Expired - Fee Related CN110909174B (en) 2019-11-19 2019-11-19 Knowledge graph-based method for improving entity link in simple question answering

Country Status (1)

Country Link
CN (1) CN110909174B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535970A (en) * 2020-04-22 2021-10-22 阿里巴巴集团控股有限公司 Information processing method and apparatus, electronic device, and computer-readable storage medium
CN111797245B (en) * 2020-07-27 2023-07-25 中国平安人寿保险股份有限公司 Knowledge graph model-based information matching method and related device
CN114691973A (en) * 2020-12-31 2022-07-01 华为技术有限公司 Recommendation method, recommendation network and related equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480125A (en) * 2017-07-05 2017-12-15 重庆邮电大学 A kind of relational links method of knowledge based collection of illustrative plates
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
CN109271524A (en) * 2018-08-02 2019-01-25 中国科学院计算技术研究所 Entity link method in knowledge base question answering system
US10331402B1 (en) * 2017-05-30 2019-06-25 Amazon Technologies, Inc. Search and knowledge base question answering for a voice user interface
CN110298042A (en) * 2019-06-26 2019-10-01 四川长虹电器股份有限公司 Based on Bilstm-crf and knowledge mapping video display entity recognition method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11157540B2 (en) * 2016-09-12 2021-10-26 International Business Machines Corporation Search space reduction for knowledge graph querying and interactions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10331402B1 (en) * 2017-05-30 2019-06-25 Amazon Technologies, Inc. Search and knowledge base question answering for a voice user interface
CN107480125A (en) * 2017-07-05 2017-12-15 重庆邮电大学 A kind of relational links method of knowledge based collection of illustrative plates
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
CN109271524A (en) * 2018-08-02 2019-01-25 中国科学院计算技术研究所 Entity link method in knowledge base question answering system
CN110298042A (en) * 2019-06-26 2019-10-01 四川长虹电器股份有限公司 Based on Bilstm-crf and knowledge mapping video display entity recognition method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Entity linking based on the co-occurrence graph and entity probability;Alan Eckhardt等;《Proceedings of the first international workshop on Entity recognition&disambiguation》;20140630;第37-44页 *
基于多数据源的知识图谱构建方法研究;吴运兵等;《福州大学学报(自然科学版)》;20170630;第45卷(第3期);第329-335页 *

Also Published As

Publication number Publication date
CN110909174A (en) 2020-03-24

Similar Documents

Publication Publication Date Title
Lin et al. Traceability transformed: Generating more accurate links with pre-trained bert models
CN107748757B (en) Question-answering method based on knowledge graph
CN110097085B (en) Lyric text generation method, training method, device, server and storage medium
US20200301954A1 (en) Reply information obtaining method and apparatus
CN117009490A (en) Training method and device for generating large language model based on knowledge base feedback
Kumar et al. A review on chatbot design and implementation techniques
CN109871538A (en) A kind of Chinese electronic health record name entity recognition method
CN111159414B (en) Text classification method and system, electronic equipment and computer readable storage medium
CN115393692A (en) Generation formula pre-training language model-based association text-to-image generation method
CN109857846B (en) Method and device for matching user question and knowledge point
CN110909174B (en) Knowledge graph-based method for improving entity link in simple question answering
CN108491515B (en) Sentence pair matching degree prediction method for campus psychological consultation
CN107491655A (en) Liver diseases information intelligent consultation method and system based on machine learning
CN114528898A (en) Scene graph modification based on natural language commands
CN113705196A (en) Chinese open information extraction method and device based on graph neural network
US20240119075A1 (en) Method and system for generating longform technical question and answer dataset
US20230281392A1 (en) Computer-readable recording medium storing computer program, machine learning method, and natural language processing apparatus
CN114048301A (en) Satisfaction-based user simulation method and system
CN115203388A (en) Machine reading understanding method and device, computer equipment and storage medium
CN114372454B (en) Text information extraction method, model training method, device and storage medium
CN118261163B (en) Intelligent evaluation report generation method and system based on transformer structure
CN112016299A (en) Method and device for generating dependency syntax tree by using neural network executed by computer
CN115617954B (en) Question answering method and device, electronic equipment and storage medium
CN115617974B (en) Dialogue processing method, device, equipment and storage medium
CN115688792A (en) Problem generation method and device based on document and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220104