[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2021096009A1 - Method and device for supplementing knowledge on basis of relation network - Google Patents

Method and device for supplementing knowledge on basis of relation network Download PDF

Info

Publication number
WO2021096009A1
WO2021096009A1 PCT/KR2020/006239 KR2020006239W WO2021096009A1 WO 2021096009 A1 WO2021096009 A1 WO 2021096009A1 KR 2020006239 W KR2020006239 W KR 2020006239W WO 2021096009 A1 WO2021096009 A1 WO 2021096009A1
Authority
WO
WIPO (PCT)
Prior art keywords
relation network
knowledge
node
path
relationship
Prior art date
Application number
PCT/KR2020/006239
Other languages
French (fr)
Korean (ko)
Inventor
박영택
이완곤
바트셀렘자그바랄
노재승
Original Assignee
숭실대학교산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 숭실대학교산학협력단 filed Critical 숭실대학교산학협력단
Publication of WO2021096009A1 publication Critical patent/WO2021096009A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the present invention relates to a method and apparatus for supplementing knowledge on a knowledge graph using a relation network (RN).
  • RN relation network
  • Knowledge Graphs are used as important resources in the fields of machine learning and data mining, and are particularly useful for solving problems such as question answering, fact checking, and link prediction.
  • the knowledge graph is a knowledge network composed of entity nodes and relationship edges, and may be expressed as a triple ⁇ h, r, t> in an RDF format. In this case, h is the head entity, and r is the relationship between the tail entity t connected to h.
  • knowledge graphs are widely used in various tasks, there is a problem that correctness and completeness are not guaranteed.
  • KGC Knowledge Graph Completion
  • Korean Patent Laid-Open Publication No. 10-2016-0064826 title of the invention: an apparatus and method for providing a semantic search service based on a knowledge graph, publication date: June 8, 2016).
  • An embodiment of the present invention is to provide a method and apparatus for supplementing knowledge showing excellent performance while solving the problem of the existing KGC by extracting a relationship path based on a Path Ranking Algorithm (PRA) and using it as training data for a relation network. .
  • PRA Path Ranking Algorithm
  • a knowledge supplementation method based on a relation network for achieving the above object is a plurality of nodes representing a relationship between a source node constituting a node pair and a target node for a plurality of node pairs included in the knowledge graph. Extracting route information, which is information about the route of Generating training data corresponding to each of the plurality of paths based on the path information; And training a relation network model using the training data.
  • the training data includes a context including information on a relationship represented by each of at least one link constituting an individual path, a question about a relationship between a source node and a target node, and the question. May include an answer to.
  • a source node, a target node, and a first triple composed of a relationship between the source node and the target node of each of the at least one link are converted into a long A Short (LSTM).
  • LSTM long A Short
  • the at least one link may be less than or equal to a predetermined threshold number.
  • the step of extracting the path information may extract the plurality of paths based on a Path Ranking Algorithm (PRA).
  • PRA Path Ranking Algorithm
  • the knowledge supplement apparatus based on a relation network for achieving the above-described object provides a relationship between a source node and a target node constituting a node pair with respect to a plurality of node pairs included in the knowledge graph.
  • a route extraction unit for extracting route information, which is information on a plurality of routes shown;
  • a data generator for generating training data corresponding to each of the plurality of paths based on the path information;
  • a learning unit that trains a relation network model by using the training data.
  • the training data includes a context including information on a relationship represented by each of at least one link constituting an individual path, a question about a relationship between a source node and a target node, and the question. May include an answer to.
  • the learning unit encodes the source node of each of the at least one link, the target node, and the relationship between the source node and the target node into Long A Short-Term Memory (LSTM), and encodes the encoded result.
  • LSTM Long A Short-Term Memory
  • the learning unit encodes the source node of each of the at least one link, the target node, and the relationship between the source node and the target node into Long A Short-Term Memory (LSTM), and encodes the encoded result.
  • LSTM Long A Short-Term Memory
  • the at least one link may be less than or equal to a predetermined threshold number.
  • the path extraction unit may extract the plurality of paths based on a Path Ranking Algorithm (PRA).
  • PRA Path Ranking Algorithm
  • the method and apparatus for supplementing knowledge based on a relation network extracts a relational path based on a PRA (Path Ranking Algorithm) and uses it as training data for the relation network, thereby solving the problem of the existing KGC and showing excellent performance. have.
  • PRA Pulth Ranking Algorithm
  • the knowledge supplement method and apparatus based on the relation network facilitates extraction of meaningful information such as customized services specialized for a user, and thus various service fields of artificial intelligence (Q&A system, recommendation system, interactive agent system, etc. ), there is an effect that can be used.
  • Q&A system Q&A system, recommendation system, interactive agent system, etc.
  • FIG. 1 is a flowchart illustrating a method of supplementing knowledge based on a relation network according to an embodiment of the present invention.
  • FIG. 2 is a flowchart illustrating a method of learning a relation network model according to an embodiment of the present invention.
  • FIG. 3 is a block diagram illustrating an apparatus for supplementing knowledge based on a relation network according to an embodiment of the present invention.
  • FIG. 4 is a diagram showing a path matrix according to an embodiment of the present invention.
  • FIG. 5 is a diagram illustrating a path sequence according to an embodiment of the present invention.
  • FIG. 6 is a diagram for explaining training data according to an embodiment of the present invention.
  • FIG. 7 is a diagram illustrating a learning process according to an embodiment of the present invention.
  • FIG. 1 is a flowchart illustrating a method of supplementing knowledge based on a relation network according to an embodiment of the present invention.
  • step S110 the knowledge supplement apparatus provides path information, which is information on a plurality of paths indicating a relationship between a source node and a target node constituting the node pair for a plurality of node pairs included in the knowledge graph. Extract.
  • the knowledge supplement device extracts path information, which is information about a plurality of paths representing the relationship between the source node and the target node, for a plurality of node pairs that pair two nodes among them. can do.
  • the knowledge supplement apparatus may include path information including information on a plurality of paths existing between A and B with respect to the source node A and the target node B. More specifically, when A and B are connected through intermediate nodes C and D respectively, the knowledge supplement apparatus may extract information about the paths of A-C-B and A-D-B as path information.
  • the relationship between the source node and the target node may include content describing the relationship between the subject and the object when the source node is a subject and the target node is an object. For example, when the source node is lebron james and the target node is LA lakers, the relationship may be playsFor.
  • the knowledge supplement device may extract a plurality of paths based on a Path Ranking Algorithm (PRA).
  • PRA Path Ranking Algorithm
  • the knowledge supplement apparatus may extract a plurality of paths for the source node and the target node from the knowledge graph using the PRA.
  • the knowledge supplement device can use a random walk on graph algorithm, and the random walk on graph algorithm starts from the source node and moves through other nodes in the middle to reach the target node. Algorithm.
  • the knowledge supplement apparatus may generate a path matrix by calculating a random walk probability for all paths for a plurality of node pairs.
  • each cell value of the path matrix refers to the probability that the source node s i reaches the target node t i through the path ⁇ i (the i-th column). It becomes possible to identify routes that are not helpful. For example, ⁇ 2 of FIG. 4 may be classified as a poor path because it has a relatively low probability for most of the node pairs (s i , t i ) compared to ⁇ 1. Alternatively, even if the probability for some node pairs is high, such as ⁇ 1 , it can be determined that it is difficult to classify a path that is helpful for learning even if the majority of node pairs are not connected or have a low probability.
  • the knowledge supplement device may select paths in which the ratio of node pairs connected through each path occupies more than 70% of the total. For example, if the proportion connected through path ⁇ 1 among all triples for a given relationship R is less than 50%, the column for path ⁇ 1 can be excluded from the path matrix.
  • the knowledge supplement apparatus may select paths to which most of the nodes existing in the node pair can be connected except for paths with a cell value of less than 5% on average in the path matrix, which is a random walk probability for each path.
  • step S120 the knowledge supplement apparatus generates training data corresponding to each of the plurality of routes based on the route information.
  • the knowledge supplement apparatus may generate training data for each of a plurality of paths corresponding to an individual node pair.
  • the training data includes a context including information on a relationship represented by each of at least one link constituting an individual path, a question about a relationship between a source node and a target node, and the question. May include an answer to.
  • RN relation network
  • the context may include information on at least one link corresponding to a relationship sequence existing between the source node and the target node.
  • the relationship sequence may be divided into individual relationship units (ie, link units) constituting the relationship sequence, and may be divided into a source node, a target node, and a triple structure comprising the relationship. That is, the context can be composed of the separate triple set.
  • the question can be generated using a target relation with the source node of the first triple of the triple set.
  • the relationship sequence extracted through the PRA becomes playsFor ⁇ worksFor -1 ⁇ playsIn.
  • lebron james and NBA can be matched to Athlete and League, and through this, the knowledge supplement device can generate questions and answers included in the training data as ⁇ leb-ron_james playsIn ?> and NBA.
  • the knowledge supplement apparatus may extract a context for a question in the form of a triple from an instance matching the relationship sequence.
  • ⁇ lebron james playsFor LA lakers>
  • ⁇ LA lakers playsFor -1 rajon rondo>
  • ⁇ rajon rondo playsIn NBA>
  • a story composed of a context, a question, and an answer from the training data may be constructed as shown in FIG. 6, and may be used as training data for learning of the RN.
  • At least one link may be less than or equal to a predetermined threshold number.
  • the knowledge supplementation apparatus may ensure that each of the plurality of paths includes only links less than a threshold number. This is because, when the number of links included in the path exceeds the critical number, the amount of computation required for the knowledge supplement device may increase in proportion thereto.
  • the knowledge supplement apparatus may extract only paths including only three or fewer links.
  • step S130 the knowledge supplement device learns the relation network model by using the training data.
  • the relation network is proposed by DeepMind and is a deep learning-based learning model that infers the relationship between objects.
  • RN is composed of a structure that uses training data in the form of a story consisting of context, question, and answer as input, and learns the model through two multi-layer perceptrons (MLPs).
  • MLPs multi-layer perceptrons
  • Equation 1 the relation network model can be expressed using Equation 1 below.
  • o i and o j are the i and j-th objects, respectively, a is the answer, q is the question, Is the relation function, Is a parameter that predicts an answer to a question based on the learned relationship information.
  • the combination pair (o i , o j ) of individual sentences (ie, source node, target node, and their relationship) constituting the context is merged with the question q, and the first MLP is Relationships can be learned through. Also, the second MLP It is possible to learn a parameter that predicts an answer to a question based on the relationship information learned through.
  • the knowledge supplement method based on the relation network extracts a relational path based on a PRA (Path Ranking Algorithm) and uses it as training data of the relation network, thereby solving the problem of the existing KGC It has the effect of indicating performance.
  • PRA Pulth Ranking Algorithm
  • FIG. 2 is a flowchart illustrating a method of learning a relation network model according to an embodiment of the present invention.
  • step S210 the knowledge supplement device inputs a source node, a target node, and a first triple consisting of a relationship between the source node and the target node of each of the at least one link into a Long A Short-Term Memory (LSTM). Encode.
  • LSTM Long A Short-Term Memory
  • the path may include three first triples.
  • the three first triples are (h, R 1 , e 1 ), (e 1 , R 2 , e 2 ), (e 2 , R 3 , e 3 ).
  • the knowledge supplement apparatus may obtain C 1 , C 2 , and C 3 respectively as a result of encoding the three first triples into the LSTM.
  • step S220 the knowledge supplement device generates a first result vector by first learning a relation network model using two of the encoded results and a plurality of second triples consisting of questions.
  • the knowledge supplement device selects two of the encoded results C 1 , C 2 , and C 3 , and inputs the question q into an LSTM to generate a total of three second triples including the result of encoding.
  • the first result vector can be generated by learning by using it as an input of.
  • step S230 the knowledge supplement apparatus adds the first result vector in element units to secondarily learn the relation network model.
  • the knowledge supplement device sums the first result vector in an element-wise sum and is included in the relation network model. It can be learned by using it as an input of.
  • each layer can consist of 256 units. All input data It can be considered that the context and the question are embedded together as it passes through. After each The first result vectors of are Is used as the input of. There are a total of 3 fully connected layers, and the first layer may consist of 256 units and the second layer may consist of 512 units. The last layer is set to the overall vocabulary size, so the softmax value for the answer can be output.
  • the knowledge supplement apparatus may predict a relationship between missing nodes by using the learned relation network model. Furthermore, the knowledge supplement device can provide various services based on artificial intelligence by applying the learned relation network model to a Q&A system, a recommendation system, an interactive agent system, a chatbot system, and the like.
  • FIG. 3 is a block diagram illustrating an apparatus for supplementing knowledge based on a relation network according to an embodiment of the present invention.
  • a knowledge supplement device 300 based on a relation network includes a path extraction unit 310, a data generation unit 320, and a learning unit 330.
  • the knowledge supplement device 300 based on a relation network may be mounted on a desktop PC, a notebook PC, a smart phone, a tablet PC, and a server computer.
  • the path extracting unit 310 extracts path information, which is information about a plurality of paths representing a relationship between a source node and a target node constituting the node pair, for a plurality of node pairs included in the knowledge graph.
  • the data generator 320 generates training data corresponding to each of the plurality of paths based on the path information.
  • the learning unit 330 trains the relation network model by using the training data.
  • the training data includes a context including information on a relationship represented by each of at least one link constituting an individual path, a question about the relationship between the source node and the target node, and the question. May include an answer to.
  • the learning unit 330 inputs and encodes the source node, target node, and the relationship between the source node and the target node of each of at least one link into Long A Short-Term Memory (LSTM), and the encoded
  • LSTM Long A Short-Term Memory
  • a first result vector is generated, and the first result vector is summed in element units to form a relation network model. Secondary learning can be done.
  • At least one link may be less than or equal to a predetermined threshold number.
  • the path extraction unit 310 may extract a plurality of paths based on a path ranking algorithm (PRA).
  • PRA path ranking algorithm

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

Disclosed is a method for supplementing knowledge on the basis of a relation network. A method for supplementing knowledge on the basis of a relation network according to one embodiment of the present invention comprises: a step for extracting, for a plurality of node pairs included in a knowledge graph, path information which is information about a plurality of paths representing a relationship between a source node and a target node constituting the node pairs; a step for generating training data corresponding to each of the plurality of paths on the basis of the path information; and a step for training a relation network model by using the training data.

Description

릴레이션 네트워크에 기반한 지식 보완 방법 및 장치Knowledge supplement method and device based on relation network
본 발명은 릴레이션 네트워크(relation network, RN)를 이용하여 지식 그래프 상에서 지식을 보완하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for supplementing knowledge on a knowledge graph using a relation network (RN).
지식 그래프(Knowledge Graphs, KGs)는 기계학습(machine learning)과 데이터 마이닝(data mining) 분야에서 중요한 자원으로 활용되고 있으며, 특히 question answering, fact checking, link prediction과 같은 문제 해결을 위해 유용하게 사용되고 있다. 지식 그래프는 엔티티 노드(entity node)들과 관계 엣지(relationship edge)들로 구성된 지식 네트워크이며, RDF 형식의 triple <h, r, t>로 표현될 수 있다. 이때, h는 head entity, r은 h와 연결된 tail entity t와의 relationship을 의미한다. 다양한 task에서 지식 그래프가 널리 사용되고 있지만, 정확성(correctness)과 완전성(completeness)이 보장되지 않는 다는 문제가 존재한다. Knowledge Graphs (KGs) are used as important resources in the fields of machine learning and data mining, and are particularly useful for solving problems such as question answering, fact checking, and link prediction. . The knowledge graph is a knowledge network composed of entity nodes and relationship edges, and may be expressed as a triple <h, r, t> in an RDF format. In this case, h is the head entity, and r is the relationship between the tail entity t connected to h. Although knowledge graphs are widely used in various tasks, there is a problem that correctness and completeness are not guaranteed.
따라서 지식 그래프의 missing link 또는 entity를 찾을 수 있는 Knowledge Graph Completion(KGC)를 통해 지식 그래프의 품질을 향상시키는 작업이 반드시 필요하다. 최근 KGC를 위한 많은 algorithm들이 개발되었고, 성공적이었던 모델들은 모두 엔티티(entity)와 관계(relation)들을 low-dimensional embedding vector로 표현한다는 공통점을 갖고 있다. 이러한 방식들은 엔티티와 관계 자체를 학습하기 때문에 전체 지식 그래프를 학습하는데 한계점이 존재하며 예측 성능이 더 이상 좋아지지 못하는 문제점들이 발생하고 있다.Therefore, it is necessary to improve the quality of the knowledge graph through the Knowledge Graph Completion (KGC) that can find the missing link or entity of the knowledge graph. Recently, many algorithms for KGC have been developed, and all successful models have a common point of expressing entities and relationships as low-dimensional embedding vectors. Since these methods learn entities and relationships themselves, there are limitations in learning the entire knowledge graph, and there are problems in that the prediction performance is no longer improved.
관련 배경기술로는 대한민국 공개특허 제10-2016-0064826호(발명의 명칭: 지식 그래프 기반에서의 의미적 검색 서비스 제공장치 및 그 방법, 공개일자: 2016년 6월 8일)가 있다.As a related background technology, there is Korean Patent Laid-Open Publication No. 10-2016-0064826 (title of the invention: an apparatus and method for providing a semantic search service based on a knowledge graph, publication date: June 8, 2016).
본 발명의 일 실시예는 PRA(Path Ranking Algorithm)에 기반한 관계 경로를 추출하여 이를 릴레이션 네트워크의 트레이닝 데이터로 이용함으로써, 기존 KGC의 문제를 해결하면서 우수한 성능과 나타내는 지식 보완 방법 및 장치를 제공하고자 한다.An embodiment of the present invention is to provide a method and apparatus for supplementing knowledge showing excellent performance while solving the problem of the existing KGC by extracting a relationship path based on a Path Ranking Algorithm (PRA) and using it as training data for a relation network. .
본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제(들)로 제한되지 않으며, 언급되지 않은 또 다른 과제(들)은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problem to be solved by the present invention is not limited to the problem(s) mentioned above, and another problem(s) not mentioned will be clearly understood by those skilled in the art from the following description.
상술한 목적을 달성하기 위한 본 발명의 일 실시예에 따른 릴레이션 네트워크에 기반한 지식 보완 방법은 지식 그래프에 포함된 복수의 노드페어에 대하여, 노드페어를 구성하는 소스노드와 타겟노드 간의 관계를 나타내는 복수의 경로에 관한 정보인 경로정보를 추출하는 단계; 상기 경로정보에 기초하여, 상기 복수의 경로 각각에 대응되는 트레이닝 데이터를 생성하는 단계; 및 상기 트레이닝 데이터를 이용하여 릴레이션 네트워크 모델을 학습시키는 단계를 포함한다.A knowledge supplementation method based on a relation network according to an embodiment of the present invention for achieving the above object is a plurality of nodes representing a relationship between a source node constituting a node pair and a target node for a plurality of node pairs included in the knowledge graph. Extracting route information, which is information about the route of Generating training data corresponding to each of the plurality of paths based on the path information; And training a relation network model using the training data.
바람직하게는, 상기 트레이닝 데이터는 개별 경로를 구성하는 적어도 하나의 링크 각각이 나타내는 관계에 관한 정보를 포함하는 컨텍스트(context), 소스노드와 타겟노드 사이의 관계에 관한 질문(question) 및 상기 질문에 대한 대답(answer)을 포함할 수 있다.Preferably, the training data includes a context including information on a relationship represented by each of at least one link constituting an individual path, a question about a relationship between a source node and a target node, and the question. May include an answer to.
바람직하게는, 상기 릴레이션 네트워크 모델을 학습시키는 단계는 상기 적어도 하나의 링크 각각의 소스노드, 타겟노드 및 상기 소스노드와 상기 타겟노드의 관계로 구성되는 제1 트리플(triple)을 LSTM(Long A Short-Term Memory)에 입력하여 인코딩하는 단계; 상기 인코딩된 결과 중 2개와 상기 질문으로 구성되는 복수의 제2 트리플(triple)을 이용하여 상기 릴레이션 네트워크 모델을 1차 학습시킴으로써, 제1 결과 벡터를 생성하는 단계; 및 상기 제1 결과 벡터를 엘리먼트 단위로 합산하여 상기 릴레이션 네트워크 모델을 2차 학습시키는 단계를 포함할 수 있다.Preferably, in the training of the relation network model, a source node, a target node, and a first triple composed of a relationship between the source node and the target node of each of the at least one link are converted into a long A Short (LSTM). -Encoding by inputting into Term Memory; Generating a first result vector by first learning the relation network model using two of the encoded results and a plurality of second triples consisting of the question; And summing the first result vector for each element to secondarily train the relation network model.
바람직하게는, 상기 적어도 하나의 링크는 소정의 임계개수 이하일 수 있다.Preferably, the at least one link may be less than or equal to a predetermined threshold number.
바람직하게는, 상기 경로정보를 추출하는 단계는 PRA(Path Ranking Algorithm)에 기반하여 상기 복수의 경로를 추출할 수 있다.Preferably, the step of extracting the path information may extract the plurality of paths based on a Path Ranking Algorithm (PRA).
또한, 상술한 목적을 달성하기 위한 본 발명의 일 실시예에 따른 릴레이션 네트워크에 기반한 지식 보완 장치는 지식 그래프에 포함된 복수의 노드페어에 대하여, 노드페어를 구성하는 소스노드와 타겟노드 간의 관계를 나타내는 복수의 경로에 관한 정보인 경로정보를 추출하는 경로추출부; 상기 경로정보에 기초하여, 상기 복수의 경로 각각에 대응되는 트레이닝 데이터를 생성하는 데이터생성부; 및 상기 트레이닝 데이터를 이용하여 릴레이션 네트워크 모델을 학습시키는 학습부를 포함한다.In addition, the knowledge supplement apparatus based on a relation network according to an embodiment of the present invention for achieving the above-described object provides a relationship between a source node and a target node constituting a node pair with respect to a plurality of node pairs included in the knowledge graph. A route extraction unit for extracting route information, which is information on a plurality of routes shown; A data generator for generating training data corresponding to each of the plurality of paths based on the path information; And a learning unit that trains a relation network model by using the training data.
바람직하게는, 상기 트레이닝 데이터는 개별 경로를 구성하는 적어도 하나의 링크 각각이 나타내는 관계에 관한 정보를 포함하는 컨텍스트(context), 소스노드와 타겟노드 사이의 관계에 관한 질문(question) 및 상기 질문에 대한 대답(answer)을 포함할 수 있다.Preferably, the training data includes a context including information on a relationship represented by each of at least one link constituting an individual path, a question about a relationship between a source node and a target node, and the question. May include an answer to.
바람직하게는, 상기 학습부는 상기 적어도 하나의 링크 각각의 소스노드, 타겟노드 및 상기 소스노드와 상기 타겟노드의 관계를 LSTM(Long A Short-Term Memory)에 입력하여 인코딩하고, 상기 인코딩된 결과 중 2개와 상기 질문으로 구성되는 복수의 트리플(triple)을 이용하여 상기 릴레이션 네트워크 모델을 1차 학습시킴으로써, 제1 결과 벡터를 생성하고, 상기 제1 결과 벡터를 엘리먼트 단위로 합산하여 상기 릴레이션 네트워크 모델을 2차 학습시킬 수 있다.Preferably, the learning unit encodes the source node of each of the at least one link, the target node, and the relationship between the source node and the target node into Long A Short-Term Memory (LSTM), and encodes the encoded result. By first learning the relation network model using a plurality of triples consisting of two and the questions, a first result vector is generated, and the first result vector is summed in element units to form the relation network model. Secondary learning can be done.
바람직하게는, 상기 적어도 하나의 링크는 소정의 임계개수 이하일 수 있다.Preferably, the at least one link may be less than or equal to a predetermined threshold number.
바람직하게는, 상기 경로추출부는 PRA(Path Ranking Algorithm)에 기반하여 상기 복수의 경로를 추출할 수 있다.Preferably, the path extraction unit may extract the plurality of paths based on a Path Ranking Algorithm (PRA).
기타 실시예들의 구체적인 사항들은 상세한 설명 및 첨부 도면들에 포함되어 있다.Details of other embodiments are included in the detailed description and the accompanying drawings.
본 발명에 따른 릴레이션 네트워크에 기반한 지식 보완 방법 및 장치는 PRA(Path Ranking Algorithm)에 기반한 관계 경로를 추출하여 이를 릴레이션 네트워크의 트레이닝 데이터로 이용함으로써, 기존 KGC의 문제를 해결하면서 우수한 성능을 나타내는 효과가 있다.The method and apparatus for supplementing knowledge based on a relation network according to the present invention extracts a relational path based on a PRA (Path Ranking Algorithm) and uses it as training data for the relation network, thereby solving the problem of the existing KGC and showing excellent performance. have.
또한, 본 발명에 따른 릴레이션 네트워크에 기반한 지식 보완 방법 및 장치는 사용자 개인에 특화된 맞춤 서비스와 같은 의미 있는 정보 추출이 용이하여, 인공지능의 다양한 서비스 분야(Q&A 시스템, 추천 시스템, 대화형 에이전트 시스템 등)에서 활용할 수 있는 효과가 있다.In addition, the knowledge supplement method and apparatus based on the relation network according to the present invention facilitates extraction of meaningful information such as customized services specialized for a user, and thus various service fields of artificial intelligence (Q&A system, recommendation system, interactive agent system, etc. ), there is an effect that can be used.
도 1은 본 발명의 일 실시예에 따른 릴레이션 네트워크에 기반한 지식 보완 방법을 설명하기 위한 흐름도이다.1 is a flowchart illustrating a method of supplementing knowledge based on a relation network according to an embodiment of the present invention.
도 2는 본 발명의 일 실시예에 따른 릴레이션 네트워크 모델의 학습 방법을 설명하기 위한 흐름도이다.2 is a flowchart illustrating a method of learning a relation network model according to an embodiment of the present invention.
도 3은 본 발명의 일 실시예에 따른 릴레이션 네트워크에 기반한 지식 보완 장치를 설명하기 위한 블록도이다.3 is a block diagram illustrating an apparatus for supplementing knowledge based on a relation network according to an embodiment of the present invention.
도 4는 본 발명의 일 실시예에 따른 경로행렬을 나타내는 도면이다.4 is a diagram showing a path matrix according to an embodiment of the present invention.
도 5는 본 발명의 일 실시예에 따른 경로 시퀀스를 나타내는 도면이다.5 is a diagram illustrating a path sequence according to an embodiment of the present invention.
도 6은 본 발명의 일 실시예에 따른 트레이닝 데이터를 설명하기 위한 도면이다.6 is a diagram for explaining training data according to an embodiment of the present invention.
도 7은 본 발명의 일 실시예에 따른 학습 과정을 설명하기 위한 도면이다.7 is a diagram illustrating a learning process according to an embodiment of the present invention.
본 발명의 이점 및/또는 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다.Advantages and/or features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in a variety of different forms, only these embodiments are intended to complete the disclosure of the present invention, and common knowledge in the technical field to which the present invention pertains. It is provided to completely inform the scope of the invention to those who have, and the invention is only defined by the scope of the claims. The same reference numerals refer to the same elements throughout the specification.
이하에서는 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.
도 1은 본 발명의 일 실시예에 따른 릴레이션 네트워크에 기반한 지식 보완 방법을 설명하기 위한 흐름도이다.1 is a flowchart illustrating a method of supplementing knowledge based on a relation network according to an embodiment of the present invention.
단계 S110에서는, 지식 보완 장치가 지식 그래프에 포함된 복수의 노드페어(node pair)에 대하여, 노드페어를 구성하는 소스노드와 타겟노드 간의 관계를 나타내는 복수의 경로(path)에 관한 정보인 경로정보를 추출한다.In step S110, the knowledge supplement apparatus provides path information, which is information on a plurality of paths indicating a relationship between a source node and a target node constituting the node pair for a plurality of node pairs included in the knowledge graph. Extract.
여기서, 지식 보완 장치는 지식 그래프에 복수의 노드가 존재할 때, 그 중 2개의 노드를 짝지은 복수의 노드페어에 대하여, 소스노드와 타겟노드 간의 관계를 나타내는 복수의 경로에 관한 정보인 경로정보를 추출할 수 있다.Here, when a plurality of nodes exist in the knowledge graph, the knowledge supplement device extracts path information, which is information about a plurality of paths representing the relationship between the source node and the target node, for a plurality of node pairs that pair two nodes among them. can do.
예컨대, 지식 보완 장치는 소스노드 A와 타겟노드 B에 대하여, A와 B 사이에 존재하는 복수의 경로에 관한 정보를 포함하는 경로정보를 포함할 수 있다. 보다 구체적으로, 지식 보완 장치는 A와 B가 중간노드 C, D 각각을 통해 연결되어 있을 때, A-C-B와 A-D-B의 경로에 관한 정보를 경로정보로 추출할 수 있다.For example, the knowledge supplement apparatus may include path information including information on a plurality of paths existing between A and B with respect to the source node A and the target node B. More specifically, when A and B are connected through intermediate nodes C and D respectively, the knowledge supplement apparatus may extract information about the paths of A-C-B and A-D-B as path information.
또한, 소스노드와 타겟노드 간의 관계는 소스노드가 주어(subject)이고, 타겟노드가 목적어(object)일 때, 그 주어와 목적어 사이의 관계를 설명하는 내용을 포함할 수 있다. 예컨대, 소스노드가 lebron james이고, 타겟노드가 LA lakers일 때, 그 관계는 playsFor일 수 있다.In addition, the relationship between the source node and the target node may include content describing the relationship between the subject and the object when the source node is a subject and the target node is an object. For example, when the source node is lebron james and the target node is LA lakers, the relationship may be playsFor.
다른 실시예에서는, 지식 보완 장치가 PRA(Path Ranking Algorithm)에 기반하여 복수의 경로를 추출할 수 있다.In another embodiment, the knowledge supplement device may extract a plurality of paths based on a Path Ranking Algorithm (PRA).
즉, 지식 보완 장치는 PRA를 이용하여, 지식 그래프로부터 소스노드와 타겟노드에 관한 복수의 경로를 추출할 수 있다. 이때, 지식 보완 장치는 랜덤워크 온 그래프 알고리즘(random walk on graph algorithm)을 이용할 수 있으며, 랜덤워크 온 그래프 알고리즘은 소스노드에서 출발하여 중간에 다른 노드를 통해 이동하면서 타겟노드에 도달하는 단순하지만 효율적인 알고리즘이다.That is, the knowledge supplement apparatus may extract a plurality of paths for the source node and the target node from the knowledge graph using the PRA. At this time, the knowledge supplement device can use a random walk on graph algorithm, and the random walk on graph algorithm starts from the source node and moves through other nodes in the middle to reach the target node. Algorithm.
보다 구체적으로, 지식 보완 장치는 복수의 노드페어를 대상으로 모든 경로에 대한 랜덤 워크 확률(random walks probability)을 계산하여 경로행렬을 생성할 수 있다.More specifically, the knowledge supplement apparatus may generate a path matrix by calculating a random walk probability for all paths for a plurality of node pairs.
이때, 도 4를 참조하면, 경로행렬의 각 셀값은 소스노드 s i가 경로 ð i(i번째 컬럼)를 통해 타겟노드 t i에 도달할 확률을 의미하며, 이를 통해 학습에 도움이 되는 경로와 도움이 되지 않는 경로를 구분하는 것이 가능해진다. 예를 들어, 도 4의 ð 2는 ð 1에 비해 상대적으로 대부분의 노드페어(s i, t i)에 대해서 낮은 확률을 갖고 있기 때문에 좋지 못한 경로로 분류될 수 있다. 또는 π 1과 같이 일부 노드페어에 대한 확률이 높더라도 과반수의 노드페어들이 연결되지 못하거나 낮은 확률을 갖는 경우도 학습에 도움이 되는 경로로 분류되긴 어렵다고 판단할 수 있다. In this case, referring to FIG. 4, each cell value of the path matrix refers to the probability that the source node s i reaches the target node t i through the path ð i (the i-th column). It becomes possible to identify routes that are not helpful. For example, ð 2 of FIG. 4 may be classified as a poor path because it has a relatively low probability for most of the node pairs (s i , t i ) compared to ð 1. Alternatively, even if the probability for some node pairs is high, such as π 1 , it can be determined that it is difficult to classify a path that is helpful for learning even if the majority of node pairs are not connected or have a low probability.
또한 추출되는 경로의 수가 매우 많기 때문에 모든 경로들을 학습에 사용하는 것은 매우 비효율적이다. 따라서 학습에 사용될 수 있는 경로를 선별(selection)하는 과정이 반드시 필요하다. 이를 위해 학습에 좋은 영향을 줄 수 있는 경로들을 2가지 기준으로 선택할 수 있다. Also, since the number of extracted paths is very large, it is very inefficient to use all paths for learning. Therefore, it is necessary to select a path that can be used for learning. To this end, paths that can have a good influence on learning can be selected based on two criteria.
먼저 지식 보완 장치는 각 경로를 통해 연결되는 노드페어의 비율이 전체의 70% 이상을 차지하는 경로들을 선정할 수 있다. 예를 들어 주어진 관계 R에 대한 모든 트리플 중에서 경로 ð 1을 통해 연결되는 비중이 50% 이하인 경우 경로행렬에서 경로 ð 1에 대한 컬럼을 제외할 수 있다. First, the knowledge supplement device may select paths in which the ratio of node pairs connected through each path occupies more than 70% of the total. For example, if the proportion connected through path ð 1 among all triples for a given relationship R is less than 50%, the column for path ð 1 can be excluded from the path matrix.
두번째로 지식 보완 장치는 각 경로에 대한 랜덤 워크 확률인 경로행렬의 셀값이 평균 5% 이하인 경로들은 제외하고 노드페어에 존재하는 대부분의 노드들이 연결될 수 있는 경로들을 선택할 수 있다.Second, the knowledge supplement apparatus may select paths to which most of the nodes existing in the node pair can be connected except for paths with a cell value of less than 5% on average in the path matrix, which is a random walk probability for each path.
단계 S120에서는, 지식 보완 장치가 그 경로정보에 기초하여, 복수의 경로 각각에 대응되는 트레이닝 데이터를 생성한다.In step S120, the knowledge supplement apparatus generates training data corresponding to each of the plurality of routes based on the route information.
즉, 지식 보완 장치는 개별 노드페어에 대응되는 복수의 경로 각각에 대한 트레이닝 데이터를 생성할 수 있다.That is, the knowledge supplement apparatus may generate training data for each of a plurality of paths corresponding to an individual node pair.
다른 실시예에서는, 트레이닝 데이터는 개별 경로를 구성하는 적어도 하나의 링크 각각이 나타내는 관계에 관한 정보를 포함하는 컨텍스트(context), 소스노드와 타겟노드 사이의 관계에 관한 질문(question) 및 상기 질문에 대한 대답(answer)을 포함할 수 있다.In another embodiment, the training data includes a context including information on a relationship represented by each of at least one link constituting an individual path, a question about a relationship between a source node and a target node, and the question. May include an answer to.
우선, 릴레이션 네트워크(RN)의 트레이닝 데이터로는 컨텍스트(context), 질문(question) 및 답변(answer)로 구성되는 스토리 형태가 요구된다.First, as training data of the relation network (RN), a story form consisting of a context, a question, and an answer is required.
이때, 컨텍스트는 소스노드와 타겟노드 사이에 존재하는 관계 시퀀스(relation sequence)에 대응되는 적어도 하나의 링크에 관한 정보를 포함할 수 있다. 나아가, 관계 시퀀스는 관계 시퀀스를 구성하는 개별 관계 단위로(즉, 링크 단위로) 나누어져, 소스노드, 타겟노드, 그 관계로 구성되는 트리플 형태로 분리될 수 있다. 즉, 컨텍스트는 그 분리된 트리플 셋(triple set)으로 구성될 수 있다.In this case, the context may include information on at least one link corresponding to a relationship sequence existing between the source node and the target node. Further, the relationship sequence may be divided into individual relationship units (ie, link units) constituting the relationship sequence, and may be divided into a source node, a target node, and a triple structure comprising the relationship. That is, the context can be composed of the separate triple set.
또한, 질문은 그 트리플 셋의 첫번째 트리플의 소스노드와 타겟 관계(target relation)을 이용하여 생성될 수 있다.Also, the question can be generated using a target relation with the source node of the first triple of the triple set.
예컨대, 도 5를 참조하면, 타겟 관계가 playsIn인 경우에 PRA를 통해 추출된 관계 시퀀스는 playsFor → worksFor -1 → playsIn이 된다. 이 경우 Athlete와 League에 lebron james와 NBA가 매칭될 수 있으며, 이를 통해 지식 보완 장치는 트레이닝 데이터에 포함되는 질문과 답변을 <leb-ron_james playsIn ?>과 NBA로 생성할 수 있다. 또한 지식 보완 장치는 관계 시퀀스에 매칭되는 인스턴스로부터 질문에 대한 컨텍스트를 트리플 형태로 추출할 수 있다. 즉, 질문의 주어(subject)인 lebron james로부터 <lebron james playsFor LA lakers>, <LA lakers playsFor -1 rajon rondo>, <rajon rondo playsIn NBA>라는 컨텍스트가 구축될 수 있다. 이와 같은 방식을 통해 트레이닝 데이터로부터 컨텍스트, 질문 및 답변으로 구성되는 스토리는 도 6과 같이 구축될 수 있고, RN의 학습을 위한 트레이닝 데이터로 사용하는 것이 가능해질 수 있다.For example, referring to FIG. 5, when the target relationship is playsIn, the relationship sequence extracted through the PRA becomes playsFor → worksFor -1 → playsIn. In this case, lebron james and NBA can be matched to Athlete and League, and through this, the knowledge supplement device can generate questions and answers included in the training data as <leb-ron_james playsIn ?> and NBA. In addition, the knowledge supplement apparatus may extract a context for a question in the form of a triple from an instance matching the relationship sequence. That is, the context of <lebron james playsFor LA lakers>, <LA lakers playsFor -1 rajon rondo>, and <rajon rondo playsIn NBA> can be constructed from lebron james, the subject of the question. Through this method, a story composed of a context, a question, and an answer from the training data may be constructed as shown in FIG. 6, and may be used as training data for learning of the RN.
또 다른 실시예에서는, 적어도 하나의 링크는 소정의 임계개수 이하일 수 있다.In another embodiment, at least one link may be less than or equal to a predetermined threshold number.
즉, 지식 보완 장치는 개별 노드페어에 대응되는 복수의 경로를 추출할 때, 그 복수의 경로 각각이 임계개수 이하의 링크만을 포함하도록 할 수 있다. 이는, 경로에 포함된 링크의 개수가 임계개수를 초과하면 지식 보완 장치에게 요구되는 연산량이 그에 비례하여 증가할 수 있기 때문이다.That is, when extracting a plurality of paths corresponding to an individual node pair, the knowledge supplementation apparatus may ensure that each of the plurality of paths includes only links less than a threshold number. This is because, when the number of links included in the path exceeds the critical number, the amount of computation required for the knowledge supplement device may increase in proportion thereto.
예컨대, 지식 보완 장치는 3개 이내의 링크만을 포함하는 경로만을 추출할 수 있다.For example, the knowledge supplement apparatus may extract only paths including only three or fewer links.
마지막으로 단계 S130에서는, 지식 보완 장치가 그 트레이닝 데이터를 이용하여 릴레이션 네트워크 모델을 학습시킨다.Finally, in step S130, the knowledge supplement device learns the relation network model by using the training data.
여기서 릴레이션 네트워크는 DeepMind사가 제안한 것으로, 객체 사이의 관계를 추론하는 딥러닝(deep learning)기반의 학습 모델이다. RN은 컨텍스트, 질문, 답변으로 구성된 스토리 형태의 트레이닝 데이터를 입력(input)으로 사용하고, 2개의 MLP(multi-layer perceptron)를 통해 모델을 학습하는 구조로 구성된다.Here, the relation network is proposed by DeepMind and is a deep learning-based learning model that infers the relationship between objects. RN is composed of a structure that uses training data in the form of a story consisting of context, question, and answer as input, and learns the model through two multi-layer perceptrons (MLPs).
이때, 릴레이션 네트워크 모델은 아래 수학식 1을 이용하여 나타낼 수 있다.In this case, the relation network model can be expressed using Equation 1 below.
[수학식 1][Equation 1]
Figure PCTKR2020006239-appb-img-000001
Figure PCTKR2020006239-appb-img-000001
여기서, o i, o j는 각각 i,j번째 오브젝트이고, a는 답변이고, q는 질문이고,
Figure PCTKR2020006239-appb-img-000002
는 관계함수(relation fuction)이고,
Figure PCTKR2020006239-appb-img-000003
는 학습된 관계 정보를 기반으로 질문에 대한 답변을 예측하는 파라미터이다.
Here, o i and o j are the i and j-th objects, respectively, a is the answer, q is the question,
Figure PCTKR2020006239-appb-img-000002
Is the relation function,
Figure PCTKR2020006239-appb-img-000003
Is a parameter that predicts an answer to a question based on the learned relationship information.
이때, 컨텍스트를 구성하는 개별 문장(즉, 소스노드, 타겟노드, 그 관계)들의 조합쌍(combination pair)인 (o i, o j)를 질문 q와 함께 병합하여, 첫번째 MLP인
Figure PCTKR2020006239-appb-img-000004
를 통해 관계를 학습시킬 수 있다. 또한, 두번째 MLP인
Figure PCTKR2020006239-appb-img-000005
를 통해 학습된 관계 정보를 기반으로 질문에 대한 답변을 예측하는 파라미터의 학습을 수행할 수 있다.
At this time, the combination pair (o i , o j ) of individual sentences (ie, source node, target node, and their relationship) constituting the context is merged with the question q, and the first MLP is
Figure PCTKR2020006239-appb-img-000004
Relationships can be learned through. Also, the second MLP
Figure PCTKR2020006239-appb-img-000005
It is possible to learn a parameter that predicts an answer to a question based on the relationship information learned through.
이와 같이, 본 발명의 일 실시예에 따른 릴레이션 네트워크에 기반한 지식 보완 방법은 PRA(Path Ranking Algorithm)에 기반한 관계 경로를 추출하여 이를 릴레이션 네트워크의 트레이닝 데이터로 이용함으로써, 기존 KGC의 문제를 해결하면서 우수한 성능을 나타내는 효과가 있다.As described above, the knowledge supplement method based on the relation network according to an embodiment of the present invention extracts a relational path based on a PRA (Path Ranking Algorithm) and uses it as training data of the relation network, thereby solving the problem of the existing KGC It has the effect of indicating performance.
도 2는 본 발명의 일 실시예에 따른 릴레이션 네트워크 모델의 학습 방법을 설명하기 위한 흐름도이다.2 is a flowchart illustrating a method of learning a relation network model according to an embodiment of the present invention.
단계 S210에서는, 지식 보완 장치가 적어도 하나의 링크 각각의 소스노드, 타겟노드 및 그 소스노드와 타겟노드의 관계로 구성되는 제1 트리플(triple)을 LSTM(Long A Short-Term Memory)에 입력하여 인코딩한다.In step S210, the knowledge supplement device inputs a source node, a target node, and a first triple consisting of a relationship between the source node and the target node of each of the at least one link into a Long A Short-Term Memory (LSTM). Encode.
예컨대, 도 7을 참조하면, 지식 보완 장치는 h 노드와 t 노드를 연결하는 경로인 h → e 1 → e 2 → t를 추출하였을 때, 그 경로는 3개의 제1 트리플을 포함할 수 있다. 이때, 그 3개의 제1 트리플은 (h, R 1, e 1), (e 1, R 2, e 2), (e 2, R 3, e 3)이다.For example, referring to FIG. 7, when the knowledge supplement device extracts h → e 1 → e 2 → t, which is a path connecting the node h and the node t, the path may include three first triples. At this time, the three first triples are (h, R 1 , e 1 ), (e 1 , R 2 , e 2 ), (e 2 , R 3 , e 3 ).
이때, 지식 보완 장치는 그 3개의 제1 트리플을 LSTM에 입력하여 인코딩한 결과로 각각 C 1, C 2, C 3를 획득할 수 있다.In this case, the knowledge supplement apparatus may obtain C 1 , C 2 , and C 3 respectively as a result of encoding the three first triples into the LSTM.
단계 S220에서는, 지식 보완 장치가 그 인코딩된 결과 중 2개와 질문으로 구성되는 복수의 제2 트리플(triple)을 이용하여 릴레이션 네트워크 모델을 1차 학습시킴으로써, 제1 결과 벡터를 생성한다.In step S220, the knowledge supplement device generates a first result vector by first learning a relation network model using two of the encoded results and a plurality of second triples consisting of questions.
예컨대, 도 7을 참조하면, 지식 보완 장치는 인코딩된 결과인 C 1, C 2, C 3 중에서 2개를 고르고, 질문 q를 LSTM에 입력하여 인코딩한 결과를 포함하여 총 3개의 제2 트리플을 구성하고, 릴레이션 네트워크 모델에 포함된
Figure PCTKR2020006239-appb-img-000006
의 입력으로 이용하여 학습시킴으로써, 제1 결과 벡터를 생성할 수 있다.
For example, referring to FIG. 7, the knowledge supplement device selects two of the encoded results C 1 , C 2 , and C 3 , and inputs the question q into an LSTM to generate a total of three second triples including the result of encoding. Configuration, and included in the relational network model
Figure PCTKR2020006239-appb-img-000006
The first result vector can be generated by learning by using it as an input of.
마지막으로 단계 S230에서는, 지식 보완 장치가 제1 결과 벡터를 엘리먼트 단위로 합산하여 릴레이션 네트워크 모델을 2차 학습시킨다.Finally, in step S230, the knowledge supplement apparatus adds the first result vector in element units to secondarily learn the relation network model.
예컨대, 도 7을 참조하면, 지식 보완 장치는 그 제1 결과 벡터를 엘리먼트 단위로 합산(element―wise sum)하여 릴레이션 네트워크 모델에 포함된
Figure PCTKR2020006239-appb-img-000007
의 입력으로 이용함으로써 학습시킬 수 있다.
For example, referring to FIG. 7, the knowledge supplement device sums the first result vector in an element-wise sum and is included in the relation network model.
Figure PCTKR2020006239-appb-img-000007
It can be learned by using it as an input of.
이때,
Figure PCTKR2020006239-appb-img-000008
에는 총 4개의 fully connected layer가 존재하며, 각 layer는 256개의 unit으로 구성될 수 있다. 모든 입력 데이터들이
Figure PCTKR2020006239-appb-img-000009
를 통과하면서 컨텍스트와 질문이 함께 임베딩(embedding)되었다고 간주할 수 있다. 이후 각
Figure PCTKR2020006239-appb-img-000010
의 제1 결과 벡터들은 element-wise sum을 통해
Figure PCTKR2020006239-appb-img-000011
의 입력으로 사용된다.
Figure PCTKR2020006239-appb-img-000012
는 총 3개의 fully connected layer가 존재하며, 첫번째 layer는 256 unit, 두번째 layer는 512 unit으로 구성될 수 있다. 마지막 layer는 전체 vocabulary size로 설정되어 답변에 대한 softmax 값을 출력할 수 있다.
At this time,
Figure PCTKR2020006239-appb-img-000008
There are a total of 4 fully connected layers, and each layer can consist of 256 units. All input data
Figure PCTKR2020006239-appb-img-000009
It can be considered that the context and the question are embedded together as it passes through. After each
Figure PCTKR2020006239-appb-img-000010
The first result vectors of are
Figure PCTKR2020006239-appb-img-000011
Is used as the input of.
Figure PCTKR2020006239-appb-img-000012
There are a total of 3 fully connected layers, and the first layer may consist of 256 units and the second layer may consist of 512 units. The last layer is set to the overall vocabulary size, so the softmax value for the answer can be output.
나아가, 모든 트레이닝 데이터에 대해서 위 과정에 대한 forward/backward propagation이 반복되면서 제1 트리플들과 질문에 대한 관계가 학습되어 제2 트리플의 타겟노드를 예측(prediction)하는 모델이 학습될 수 있다.Furthermore, as forward/backward propagation for the above process is repeated for all training data, the relationship between the first triples and the question is learned, so that a model for predicting the target node of the second triple may be trained.
즉, 지식 보완 장치는 학습된 릴레이션 네트워크 모델을 이용하여 누락된(missing) 노드 간의 관계를 예측할 수 있다. 나아가, 지식 보완 장치는 그 학습된 릴레이션 네트워크 모델을 Q&A 시스템, 추천 시스템, 대화형 에이전트 시스템, 채팅봇 시스템 등에 적용함으로써, 인공지능 기반의 다양한 서비스를 제공할 수 있다.That is, the knowledge supplement apparatus may predict a relationship between missing nodes by using the learned relation network model. Furthermore, the knowledge supplement device can provide various services based on artificial intelligence by applying the learned relation network model to a Q&A system, a recommendation system, an interactive agent system, a chatbot system, and the like.
도 3은 본 발명의 일 실시예에 따른 릴레이션 네트워크에 기반한 지식 보완 장치를 설명하기 위한 블록도이다.3 is a block diagram illustrating an apparatus for supplementing knowledge based on a relation network according to an embodiment of the present invention.
도 3을 참조하면, 본 발명의 일 실시예에 따른 릴레이션 네트워크에 기반한 지식 보완 장치(300)는 경로추출부(310), 데이터생성부(320) 및 학습부(330)를 포함한다.Referring to FIG. 3, a knowledge supplement device 300 based on a relation network according to an embodiment of the present invention includes a path extraction unit 310, a data generation unit 320, and a learning unit 330.
한편, 본 발명의 일 실시예에 따른 릴레이션 네트워크에 기반한 지식 보완 장치(300)는 데스크탑PC, 노트북PC, 스마트폰, 태블릿PC 및 서버 컴퓨터 등에 탑재될 수 있다.Meanwhile, the knowledge supplement device 300 based on a relation network according to an embodiment of the present invention may be mounted on a desktop PC, a notebook PC, a smart phone, a tablet PC, and a server computer.
경로추출부(310)는 지식 그래프에 포함된 복수의 노드페어에 대하여, 노드페어를 구성하는 소스노드와 타겟노드 간의 관계를 나타내는 복수의 경로에 관한 정보인 경로정보를 추출한다.The path extracting unit 310 extracts path information, which is information about a plurality of paths representing a relationship between a source node and a target node constituting the node pair, for a plurality of node pairs included in the knowledge graph.
데이터생성부(320)는 그 경로정보에 기초하여, 복수의 경로 각각에 대응되는 트레이닝 데이터를 생성한다.The data generator 320 generates training data corresponding to each of the plurality of paths based on the path information.
학습부(330)는 그 트레이닝 데이터를 이용하여 릴레이션 네트워크 모델을 학습시킨다.The learning unit 330 trains the relation network model by using the training data.
다른 실시예에서는, 트레이닝 데이터는 개별 경로를 구성하는 적어도 하나의 링크 각각이 나타내는 관계에 관한 정보를 포함하는 컨텍스트(context), 소스노드와 타겟노드 사이의 관계에 관한 질문(question) 및 그 질문에 대한 대답(answer)을 포함할 수 있다.In another embodiment, the training data includes a context including information on a relationship represented by each of at least one link constituting an individual path, a question about the relationship between the source node and the target node, and the question. May include an answer to.
또 다른 실시예에서는, 학습부(330)는 적어도 하나의 링크 각각의 소스노드, 타겟노드 및 소스노드와 타겟노드의 관계를 LSTM(Long A Short-Term Memory)에 입력하여 인코딩하고, 그 인코딩된 결과 중 2개와 상기 질문으로 구성되는 복수의 트리플(triple)을 이용하여 릴레이션 네트워크 모델을 1차 학습시킴으로써, 제1 결과 벡터를 생성하고, 그 제1 결과 벡터를 엘리먼트 단위로 합산하여 릴레이션 네트워크 모델을 2차 학습시킬 수 있다.In another embodiment, the learning unit 330 inputs and encodes the source node, target node, and the relationship between the source node and the target node of each of at least one link into Long A Short-Term Memory (LSTM), and the encoded By first learning a relation network model using two of the results and a plurality of triples consisting of the above questions, a first result vector is generated, and the first result vector is summed in element units to form a relation network model. Secondary learning can be done.
또 다른 실시예에서는, 적어도 하나의 링크는 소정의 임계개수 이하일 수 있다.In another embodiment, at least one link may be less than or equal to a predetermined threshold number.
또 다른 실시예에서는, 경로추출부(310)는 PRA(Path Ranking Algorithm)에 기반하여 복수의 경로를 추출할 수 있다.In another embodiment, the path extraction unit 310 may extract a plurality of paths based on a path ranking algorithm (PRA).
지금까지 본 발명에 따른 구체적인 실시예에 관하여 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서는 여러 가지 변형이 가능함은 물론이다. 그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허 청구의 범위뿐 아니라 이 특허 청구의 범위와 균등한 것들에 의해 정해져야 한다.Although the specific embodiments according to the present invention have been described so far, various modifications can be made without departing from the scope of the present invention. Therefore, the scope of the present invention should not be limited to the described embodiments and should not be defined by the claims to be described later, as well as the scope of the claims and their equivalents.
이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.As described above, although the present invention has been described by the limited embodiments and drawings, the present invention is not limited to the above embodiments, which is, if one of ordinary skill in the field to which the present invention belongs, various modifications and Transformation is possible. Therefore, the idea of the present invention should be grasped only by the claims set forth below, and all equivalent or equivalent modifications thereof will be said to belong to the scope of the idea of the present invention.

Claims (10)

  1. 지식 그래프에 포함된 복수의 노드페어에 대하여, 노드페어를 구성하는 소스노드와 타겟노드 간의 관계를 나타내는 복수의 경로에 관한 정보인 경로정보를 추출하는 단계;Extracting path information, which is information about a plurality of paths representing a relationship between a source node and a target node constituting the node pair, for a plurality of node pairs included in the knowledge graph;
    상기 경로정보에 기초하여, 상기 복수의 경로 각각에 대응되는 트레이닝 데이터를 생성하는 단계; 및Generating training data corresponding to each of the plurality of paths based on the path information; And
    상기 트레이닝 데이터를 이용하여 릴레이션 네트워크 모델을 학습시키는 단계Training a relation network model using the training data
    를 포함하는 것을 특징으로 하는 릴레이션 네트워크에 기반한 지식 보완 방법.Knowledge supplementation method based on a relation network, characterized in that it comprises a.
  2. 제1항에 있어서,The method of claim 1,
    상기 트레이닝 데이터는The training data is
    개별 경로를 구성하는 적어도 하나의 링크 각각이 나타내는 관계에 관한 정보를 포함하는 컨텍스트(context), 소스노드와 타겟노드 사이의 관계에 관한 질문(question) 및 상기 질문에 대한 대답(answer)을 포함하는 것을 특징으로 하는 릴레이션 네트워크에 기반한 지식 보완 방법.A context including information on a relationship represented by each of at least one link constituting an individual path, a question about the relationship between the source node and the target node, and an answer to the question. A method of supplementing knowledge based on a relation network, characterized in that.
  3. 제2항에 있어서, The method of claim 2,
    상기 릴레이션 네트워크 모델을 학습시키는 단계는The step of training the relation network model
    상기 적어도 하나의 링크 각각의 소스노드, 타겟노드 및 상기 소스노드와 상기 타겟노드의 관계로 구성되는 제1 트리플(triple)을 LSTM(Long A Short-Term Memory)에 입력하여 인코딩하는 단계;Encoding a source node, a target node of each of the at least one link, and a first triple consisting of a relationship between the source node and the target node into a Long A Short-Term Memory (LSTM);
    상기 인코딩된 결과 중 2개와 상기 질문으로 구성되는 복수의 제2 트리플(triple)을 이용하여 상기 릴레이션 네트워크 모델을 1차 학습시킴으로써, 제1 결과 벡터를 생성하는 단계; 및Generating a first result vector by first learning the relation network model using two of the encoded results and a plurality of second triples consisting of the question; And
    상기 제1 결과 벡터를 엘리먼트 단위로 합산하여 상기 릴레이션 네트워크 모델을 2차 학습시키는 단계Summing the first result vector for each element to secondarily train the relation network model
    를 포함하는 것을 특징으로 하는 릴레이션 네트워크에 기반한 지식 보완 방법.Knowledge supplementation method based on a relation network, characterized in that it comprises a.
  4. 제2항에 있어서, The method of claim 2,
    상기 적어도 하나의 링크는The at least one link
    소정의 임계개수 이하인 것을 특징으로 하는 릴레이션 네트워크에 기반한 지식 보완 방법.A method of supplementing knowledge based on a relation network, characterized in that the number is less than or equal to a predetermined threshold number.
  5. 제1항에 있어서, The method of claim 1,
    상기 경로정보를 추출하는 단계는Extracting the route information
    PRA(Path Ranking Algorithm)에 기반하여 상기 복수의 경로를 추출하는 것을 특징으로 하는 릴레이션 네트워크에 기반한 지식 보완 방법.A knowledge supplement method based on a relation network, characterized in that extracting the plurality of paths based on a Path Ranking Algorithm (PRA).
  6. 지식 그래프에 포함된 복수의 노드페어에 대하여, 노드페어를 구성하는 소스노드와 타겟노드 간의 관계를 나타내는 복수의 경로에 관한 정보인 경로정보를 추출하는 경로추출부;A path extracting unit for extracting path information, which is information about a plurality of paths representing a relationship between a source node and a target node constituting the node pair, for a plurality of node pairs included in the knowledge graph;
    상기 경로정보에 기초하여, 상기 복수의 경로 각각에 대응되는 트레이닝 데이터를 생성하는 데이터생성부; 및A data generator for generating training data corresponding to each of the plurality of paths based on the path information; And
    상기 트레이닝 데이터를 이용하여 릴레이션 네트워크 모델을 학습시키는 학습부A learning unit that trains a relation network model using the training data
    를 포함하는 것을 특징으로 하는 릴레이션 네트워크에 기반한 지식 보완 장치.A knowledge supplement device based on a relation network, comprising: a.
  7. 제6항에 있어서,The method of claim 6,
    상기 트레이닝 데이터는The training data is
    개별 경로를 구성하는 적어도 하나의 링크 각각이 나타내는 관계에 관한 정보를 포함하는 컨텍스트(context), 소스노드와 타겟노드 사이의 관계에 관한 질문(question) 및 상기 질문에 대한 대답(answer)을 포함하는 것을 특징으로 하는 릴레이션 네트워크에 기반한 지식 보완 장치.A context including information on a relationship represented by each of at least one link constituting an individual path, a question about a relationship between a source node and a target node, and an answer to the question A knowledge supplement device based on a relation network, characterized in that.
  8. 제7항에 있어서, The method of claim 7,
    상기 학습부는The learning unit
    상기 적어도 하나의 링크 각각의 소스노드, 타겟노드 및 상기 소스노드와 상기 타겟노드의 관계를 LSTM(Long A Short-Term Memory)에 입력하여 인코딩하고,The source node of each of the at least one link, the target node, and the relationship between the source node and the target node are input to Long A Short-Term Memory (LSTM) and encoded,
    상기 인코딩된 결과 중 2개와 상기 질문으로 구성되는 복수의 트리플(triple)을 이용하여 상기 릴레이션 네트워크 모델을 1차 학습시킴으로써, 제1 결과 벡터를 생성하고,By first learning the relation network model using two of the encoded results and a plurality of triples consisting of the question, a first result vector is generated,
    상기 제1 결과 벡터를 엘리먼트 단위로 합산하여 상기 릴레이션 네트워크 모델을 2차 학습시키는 것을 특징으로 하는 릴레이션 네트워크에 기반한 지식 보완 장치.The apparatus for supplementing knowledge based on a relation network, characterized in that the relation network model is secondarily trained by summing the first result vector for each element.
  9. 제7항에 있어서, The method of claim 7,
    상기 적어도 하나의 링크는The at least one link
    소정의 임계개수 이하인 것을 특징으로 하는 릴레이션 네트워크에 기반한 지식 보완 장치.A knowledge supplement device based on a relation network, characterized in that the number is less than or equal to a predetermined threshold number.
  10. 제6항에 있어서, The method of claim 6,
    상기 경로추출부는The path extraction unit
    PRA(Path Ranking Algorithm)에 기반하여 상기 복수의 경로를 추출하는 것을 특징으로 하는 릴레이션 네트워크에 기반한 지식 보완 장치.A knowledge supplement device based on a relation network, characterized in that extracting the plurality of paths based on a Path Ranking Algorithm (PRA).
PCT/KR2020/006239 2019-11-15 2020-05-12 Method and device for supplementing knowledge on basis of relation network WO2021096009A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2019-0146737 2019-11-15
KR20190146737 2019-11-15
KR1020190163590A KR102234850B1 (en) 2019-11-15 2019-12-10 Method and apparatus for complementing knowledge based on relation network
KR10-2019-0163590 2019-12-10

Publications (1)

Publication Number Publication Date
WO2021096009A1 true WO2021096009A1 (en) 2021-05-20

Family

ID=75466363

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/006239 WO2021096009A1 (en) 2019-11-15 2020-05-12 Method and device for supplementing knowledge on basis of relation network

Country Status (2)

Country Link
KR (1) KR102234850B1 (en)
WO (1) WO2021096009A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297500A (en) * 2021-06-23 2021-08-24 哈尔滨工程大学 Social network isolated node link prediction method
CN113377955A (en) * 2021-06-11 2021-09-10 支付宝(杭州)信息技术有限公司 Text risk discovery method and system
CN113672740A (en) * 2021-08-04 2021-11-19 支付宝(杭州)信息技术有限公司 Data processing method and device for relational network
CN115391553A (en) * 2022-08-23 2022-11-25 西北工业大学 Method for automatically searching time sequence knowledge graph complement model
CN118171727A (en) * 2024-05-16 2024-06-11 神思电子技术股份有限公司 Method, device, equipment, medium and program product for generating triples

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360664B (en) * 2021-05-31 2022-03-25 电子科技大学 Knowledge graph complementing method
CN113641829B (en) * 2021-07-13 2023-11-24 北京百度网讯科技有限公司 Training and knowledge graph completion method and device for graph neural network
CN113672741B (en) * 2021-08-19 2024-06-21 支付宝(杭州)信息技术有限公司 Information processing method, device and equipment
CN115525773B (en) * 2022-10-10 2024-08-02 北京智源人工智能研究院 Training method and device for knowledge graph completion model
CN117575007B (en) * 2024-01-17 2024-04-05 清华大学 Large model knowledge completion method and system based on post-decoding credibility enhancement
CN117993497A (en) * 2024-03-15 2024-05-07 广州大学 Knowledge graph completion method based on meta-relation learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226846A1 (en) * 2012-02-24 2013-08-29 Ming Li System and Method for Universal Translating From Natural Language Questions to Structured Queries
KR20180041200A (en) * 2015-08-19 2018-04-23 알리바바 그룹 홀딩 리미티드 Information processing method and apparatus
US20180239763A1 (en) * 2017-02-17 2018-08-23 Kyndi, Inc. Method and apparatus of ranking linked network nodes
KR20190092043A (en) * 2018-01-30 2019-08-07 연세대학교 산학협력단 Visual Question Answering Apparatus for Explaining Reasoning Process and Method Thereof

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160064826A (en) 2014-11-28 2016-06-08 한국전자통신연구원 knowledge graph based on semantic search service providing apparatus and method therefor
US10681061B2 (en) * 2017-06-14 2020-06-09 International Business Machines Corporation Feedback-based prioritized cognitive analysis
US11853903B2 (en) * 2017-09-28 2023-12-26 Siemens Aktiengesellschaft SGCNN: structural graph convolutional neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226846A1 (en) * 2012-02-24 2013-08-29 Ming Li System and Method for Universal Translating From Natural Language Questions to Structured Queries
KR20180041200A (en) * 2015-08-19 2018-04-23 알리바바 그룹 홀딩 리미티드 Information processing method and apparatus
US20180239763A1 (en) * 2017-02-17 2018-08-23 Kyndi, Inc. Method and apparatus of ranking linked network nodes
KR20190092043A (en) * 2018-01-30 2019-08-07 연세대학교 산학협력단 Visual Question Answering Apparatus for Explaining Reasoning Process and Method Thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SANTORO ADAM, RAPOSO DAVID, BARRETT DAVID, MALINOWSKI MATEUSZ, PASCANU RAZVAN, BATTAGLIA PETER, LILLICRAP TIMOTHY: "A simple neural network module for relational reasoning", ARXIV.ORG, 5 June 2017 (2017-06-05), pages 1 - 16, XP080767624 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377955A (en) * 2021-06-11 2021-09-10 支付宝(杭州)信息技术有限公司 Text risk discovery method and system
CN113297500A (en) * 2021-06-23 2021-08-24 哈尔滨工程大学 Social network isolated node link prediction method
CN113672740A (en) * 2021-08-04 2021-11-19 支付宝(杭州)信息技术有限公司 Data processing method and device for relational network
CN113672740B (en) * 2021-08-04 2023-11-07 支付宝(杭州)信息技术有限公司 Data processing method and device for relational network
CN115391553A (en) * 2022-08-23 2022-11-25 西北工业大学 Method for automatically searching time sequence knowledge graph complement model
CN115391553B (en) * 2022-08-23 2023-10-13 西北工业大学 Method for automatically searching time sequence knowledge graph completion model
CN118171727A (en) * 2024-05-16 2024-06-11 神思电子技术股份有限公司 Method, device, equipment, medium and program product for generating triples

Also Published As

Publication number Publication date
KR102234850B1 (en) 2021-04-02

Similar Documents

Publication Publication Date Title
WO2021096009A1 (en) Method and device for supplementing knowledge on basis of relation network
Kollias et al. Distribution matching for heterogeneous multi-task learning: a large-scale face study
WO2020122456A1 (en) System and method for matching similarities between images and texts
WO2021251690A1 (en) Learning content recommendation system based on artificial intelligence training, and operation method thereof
CN112528676B (en) Document-level event argument extraction method
WO2021054514A1 (en) User-customized question-answering system based on knowledge graph
WO2021095987A1 (en) Multi-type entity-based knowledge complementing method and apparatus
WO2018212584A2 (en) Method and apparatus for classifying class, to which sentence belongs, using deep neural network
CN110457523B (en) Cover picture selection method, model training method, device and medium
CN112989024B (en) Method, device and equipment for extracting relation of text content and storage medium
CN111860193B (en) Text-based pedestrian retrieval self-supervision visual representation learning system and method
WO2013157705A1 (en) Method for inferring interest of user through interests of social neighbors and topics of social activities in sns, and system therefor
Cui et al. DGEKT: a dual graph ensemble learning method for knowledge tracing
CN113011172A (en) Text processing method and device, computer equipment and storage medium
CN111522926A (en) Text matching method, device, server and storage medium
Zheng et al. Dynamic relevance graph network for knowledge-aware question answering
WO2022108206A1 (en) Method and apparatus for completing describable knowledge graph
CN110322959A (en) A kind of Knowledge based engineering depth medical care problem method for routing and system
WO2019107625A1 (en) Machine translation method and apparatus therefor
WO2022149758A1 (en) Learning content evaluation device and system for evaluating question, on basis of predicted probability of correct answer for added question content that has never been solved, and operating method thereof
WO2022186539A1 (en) Image classification-based celebrity identification method and apparatus
CN113255701B (en) Small sample learning method and system based on absolute-relative learning framework
WO2023106870A1 (en) Re-structuralized convolutional neural network system using cmp and operation method thereof
WO2024101466A1 (en) Attribute-based missing person tracking apparatus and method
CN116881416A (en) Instance-level cross-modal retrieval method for relational reasoning and cross-modal independent matching network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20888351

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08.09.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20888351

Country of ref document: EP

Kind code of ref document: A1