KR20200071821A

KR20200071821A - Detection metohd of fake news using grammatic transformation on neural network, computer readable medium and apparatus for performing the method

Info

Publication number: KR20200071821A
Application number: KR1020180152658A
Authority: KR
Inventors: 정창성; 서영경
Original assignee: 고려대학교 산학협력단
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-22
Also published as: KR102131641B1

Abstract

Disclosed are a method for searching fake news using grammar transformation on a neural network, and a recording medium and a device for performing the same. The method for searching fake news using grammar transformation on a neural network comprises the steps of: embedding a proposition sentence, which is determined to be true or false by a news sentence, into a word vector; generating a context vector by inputting the word vector into a predetermined natural language processing neural network; inputting the context vector into a predetermined natural language processing neural network to generate a candidate sentence having the same meaning as the proposition sentence but different grammar; and comparing the candidate sentence with the news sentence to determine true or false of the proposition sentence and searching whether the news sentence corresponds to fake news.

Description

DEMETION METOHD OF FAKE NEWS USING GRAMMATIC TRANSFORMATION ON NEURAL NETWORK, COMPUTER READABLE MEDIUM AND APPARATUS FOR PERFORMING THE METHOD}

본 발명은 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 방법, 이를 수행하기 위한 기록매체 및 장치에 관한 것으로, 보다 상세하게는 딥러닝에 기반하여 문장을 생성하고, 이를 이용하여 뉴스의 진위 여부를 판별하는 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 방법, 이를 수행하기 위한 기록매체 및 장치에 관한 것이다.The present invention relates to a fake news search method using a grammatical modification on a neural network, a recording medium and a device for performing the same, and more specifically, to generate a sentence based on deep learning, and to use this to determine whether the news is authentic or not The present invention relates to a method for searching for fake news using grammatical transformation on a neural network, and a recording medium and apparatus for performing the same.

고속 이동통신망 및 스마트폰이 널리 보급된 모바일 시대가 도래하면서, 각종 사회관계망서비스(SNS)의 사용이 급속도로 증가하고 있다. 특히, 근래에 들어 블로그(blog), 카카오톡(KakaoTalk), 라인(Line), 페이스북(Facebook), 트위터(Twitter), 인스타그램(Instagram) 및 텀블러(Tumblr)와 같은 SNS의 사용이 급증하면서, 각종 SNS를 통한 정보 또는 뉴스의 전달 역시 폭발적으로 증가하고 있다.With the advent of the mobile era in which high-speed mobile communication networks and smartphones are widely spread, the use of various social networking services (SNS) is rapidly increasing. In particular, in recent years, the use of SNS such as blogs, KakaoTalk, Line, Facebook, Twitter, Instagram and Tumblr has skyrocketed. Meanwhile, the delivery of information or news through various SNS is also explosively increasing.

그러나, 선거와 같은 정치적 행사가 있을 때마다 각종 SNS를 통해 잘못된 정보 또는 뉴스가 퍼지는 경우 또한 적지 않게 발생하는 것이 현실이다. 또한, 더욱 심각한 것은 특정한 목적을 가지고 의도적으로 각종 SNS를 통해 잘못된 정보 또는 뉴스를 전파하는 경우도 빈번히 발생하고 있다는 점인데, 이와 같은 이유로 SNS를 통한 정보 또는 뉴스의 전달이 향후 중요한 사회적 문제가 될 가능성이 높다고 볼 수 있다.However, it is also a reality that whenever a political event such as an election spreads false information or news through various SNSs. In addition, more seriously, it is a frequent occurrence of intentionally disseminating wrong information or news through various SNS for a specific purpose, and for this reason, the delivery of information or news through SNS may become an important social problem in the future. It can be said that this is high.

한편 위와 같이 특정한 목적을 가지고 의도적으로 조작된 뉴스인 가짜뉴스는 대부분 사람에 의해 탐색되고, 그 진위 여부가 판별된다. 이는 많은 시간과 노력을 필요로 하므로, 사람의 판단을 요구하지 않으며, 가짜뉴스를 실시간으로 선별할 수 있는 새로운 가짜뉴스 탐색 모델이 필요한 실정이다.On the other hand, fake news, which is news intentionally manipulated with a specific purpose as described above, is mostly searched by people, and the authenticity is determined. This requires a lot of time and effort, so it does not require human judgment and requires a new fake news search model that can select fake news in real time.

본 발명은 딥러닝에 기반하여 명제(proposition) 문장의 문법을 변형시킨 후보 문장을 생성하고, 후보 문장과 뉴스 문장을 비교하여 뉴스의 진위 여부를 판별하는 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 방법, 이를 수행하기 위한 기록매체 및 장치를 제공한다. The present invention generates a candidate sentence that modifies the grammar of a proposition sentence based on deep learning, and compares a candidate sentence with a news sentence to search for fake news using a grammar modification on a neural network to determine whether the news is authentic or not. Provided is a recording medium and apparatus for performing this.

본 발명에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 방법은 뉴스 문장에 의해 참 또는 거짓이 판별되는 명제(proposition) 문장을 단어(word) 벡터로 임베딩하는 단계, 상기 단어 벡터를 소정의 자연어 처리 신경망에 입력하여 컨텍스트(context) 벡터를 생성하는 단계, 상기 컨텍스트 벡터를 소정의 자연어 처리 신경망에 입력하여 상기 명제 문장과 의미는 동일하되 문법이 다른 문장인 후보 문장을 생성하는 단계 및 상기 후보 문장과 상기 뉴스 문장을 비교하여 상기 명제 문장의 참 또는 거짓을 판별하고 상기 뉴스 문장이 가짜뉴스에 해당하는지를 탐색하는 단계를 포함한다.The method of searching for fake news using grammatical transformation on the neural network according to the present invention includes embedding a proposition sentence in which a true or false is determined by a news sentence as a word vector, and the word vector is a predetermined natural language processing neural network. Generating a context vector by inputting it, generating a candidate sentence that is identical in meaning to the propositional sentence but having a different grammar by inputting the context vector into a predetermined natural language processing neural network, and the candidate sentence and the And comparing the news sentences to determine whether the propositional sentence is true or false and searching whether the news sentence corresponds to fake news.

한편, 상기 뉴스 문장에 의해 참 또는 거짓이 판별되는 명제(proposition) 문장을 단어(word) 벡터로 임베딩하는 단계는, 원-핫 인코딩(One-hot encoding) 방식을 이용하여 상기 명제 문장을 구성하는 각 단어들을 벡터로 변환하는 단계를 포함할 수 있다.Meanwhile, the step of embedding a proposition sentence in which a true or false is determined by the news sentence into a word vector comprises constructing the proposition sentence using a one-hot encoding method. And converting each word into a vector.

또한, 상기 단어 벡터를 소정의 자연어 처리 신경망에 입력하여 컨텍스트(context) 벡터를 생성하는 단계는, 상기 단어 벡터를 LSTM(Long Short Term Memory) 신경망에 입력하여 상기 컨텍스트 벡터를 생성하는 단계를 포함할 수 있다.In addition, inputting the word vector into a predetermined natural language processing neural network to generate a context vector includes inputting the word vector into an LSTM (Long Short Term Memory) neural network to generate the context vector. Can be.

또한, 상기 컨텍스트 벡터를 소정의 자연어 처리 신경망에 입력하여 상기 명제 문장과 의미는 동일하되 문법이 다른 문장인 후보 문장을 생성하는 단계는, 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 어텐션 매커니즘에 따라 상기 컨텍스트 벡터의 가중 합(Weighted sum)을 계산하여 어텐션(attention) 벡터를 생성하는 단계, 상기 어텐션 벡터와 상기 컨텍스트 벡터의 생성 시 발생하는 히든 스테이트(hidden state) 벡터를 비교하여 매칭 벡터를 예측하는 단계 및 상기 매칭 벡터를 LSTM(Long Short Term Memory) 신경망에 입력하여 상기 후보 문장을 생성하는 단계를 포함할 수 있다.In addition, the step of generating a candidate sentence which is a sentence having the same meaning but different grammar by inputting the context vector into a predetermined natural language processing neural network, is an attention mechanism of a sequence-to-sequence learning model. Generating an attention vector by calculating a weighted sum of the context vector according to the comparison, and comparing the attention vector and a hidden state vector generated when the context vector is generated to compare a matching vector The method may include predicting and inputting the matching vector into a LSTM (Long Short Term Memory) neural network to generate the candidate sentence.

또한, 상기 후보 문장과 상기 뉴스 문장을 비교하여 상기 명제 문장의 참 또는 거짓을 판별하고 상기 뉴스 문장이 가짜뉴스에 해당하는지를 탐색하는 단계는, 상기 후보 문장을 구성하는 각 단어에 해당하는 벡터를 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 소프트맥스(softmax) 함수에 입력하여 주어진 개수의 단어를 선정 및 출력하는 단계, 빔 서치 디코더를 이용하여 상기 소프트맥스(softmax) 함수의 출력 단어를 조합한 최종 후보 문장을 생성하는 단계 및 상기 최종 후보 문장과 상기 뉴스 문장을 비교하여 상기 명제 문장의 참 또는 거짓을 판별하고 상기 뉴스 문장이 가짜뉴스에 해당하는지를 탐색하는 단계를 포함할 수 있다.In addition, comparing the candidate sentence and the news sentence to determine whether the propositional sentence is true or false, and searching for whether the news sentence corresponds to fake news, sequence a vector corresponding to each word constituting the candidate sentence Selecting and outputting a given number of words by inputting to a softmax function of a sequence to sequence learning model, and combining output words of the softmax function using a beam search decoder And generating a final candidate sentence and comparing the final candidate sentence with the news sentence to determine true or false of the propositional sentence and searching for whether the news sentence corresponds to fake news.

또한, 상기 최종 후보 문장과 상기 뉴스 문장을 비교하여 상기 명제 문장의 참 또는 거짓을 판별하고 상기 뉴스 문장이 가짜뉴스에 해당하는지를 탐색하는 단계는, 상기 최종 후보 문장 및 상기 뉴스 문장을 임베딩하는 단계, 임베딩한 상기 최종 후보 문장 및 상기 뉴스 문장의 코사인 유사도를 계산하는 단계 및 상기 코사인 유사도를 이용하여 상기 뉴스 문장에 의한 상기 명제 문장의 참 또는 거짓을 판별하는 단계를 포함할 수 있다.In addition, comparing the final candidate sentence and the news sentence to determine whether the propositional sentence is true or false and searching whether the news sentence corresponds to fake news includes embedding the final candidate sentence and the news sentence, And calculating cosine similarity between the embedded final candidate sentence and the news sentence, and determining whether the propositional sentence is true or false by using the cosine similarity.

또한, 상기 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 방법을 수행하기 위한, 컴퓨터 프로그램이 기록된 컴퓨터로 판독 가능한 기록 매체일 수 있다.In addition, a computer-readable recording medium having a computer program recorded thereon may be used to perform a method for searching for fake news using grammar modification on the neural network.

한편, 본 발명에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치는 뉴스 문장에 의해 참 또는 거짓이 판별되는 명제(proposition) 문장을 단어(word) 벡터로 임베딩하는 단어 임베딩부, 상기 단어 벡터를 소정의 자연어 처리 신경망에 입력하여 컨텍스트(context) 벡터를 생성하는 컨텍스트 생성부, 상기 컨텍스트 벡터를 소정의 자연어 처리 신경망에 입력하여 상기 명제 문장과 의미는 동일하되 문법이 다른 문장인 후보 문장을 생성하는 매칭부 및 상기 후보 문장과 상기 뉴스 문장을 비교하여 상기 명제 문장의 참 또는 거짓을 판별하고 상기 뉴스 문장이 가짜뉴스에 해당하는지를 탐색하는 추론부를 포함한다.On the other hand, the fake news search apparatus using the grammar transformation on the neural network according to the present invention is a word embedding unit that embeds a proposition sentence in which a true or false is determined by a news sentence into a word vector, and the word vector is predetermined. Matching to generate a context vector by inputting it into a natural language processing neural network, and generating a candidate sentence that has the same meaning as the propositional sentence but having a different grammar by inputting the context vector into a predetermined natural language processing neural network. And a reasoning unit that compares the candidate sentence with the news sentence to determine whether the propositional sentence is true or false, and searches whether the news sentence corresponds to fake news.

한편, 상기 단어 임베딩부는, 원-핫 인코딩(One-hot encoding) 방식을 이용하여 상기 명제 문장을 구성하는 각 단어들을 벡터로 변환할 수 있다.Meanwhile, the word embedding unit may convert each word constituting the proposition sentence into a vector using a one-hot encoding method.

또한, 상기 컨텍스트 생성부는, 상기 단어 벡터를 LSTM(Long Short Term Memory) 신경망에 입력하여 상기 컨텍스트 벡터를 생성할 수 있다.Also, the context generator may generate the context vector by inputting the word vector into a LSTM (Long Short Term Memory) neural network.

또한, 상기 매칭부는, 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 어텐션 매커니즘에 따라 상기 컨텍스트 벡터의 가중 합(Weighted sum)을 계산하여 어텐션(attention) 벡터를 생성하고, 상기 어텐션 벡터와 상기 컨텍스트 벡터의 생성 시 발생하는 히든 스테이트(hidden state) 벡터를 비교하여 매칭 벡터를 예측하며, 상기 매칭 벡터를 LSTM(Long Short Term Memory) 신경망에 입력하여 상기 후보 문장을 생성할 수 있다.In addition, the matching unit calculates a weighted sum of the context vector according to the attention mechanism of the sequence-to-sequence learning model, generates an attention vector, and the attention vector and the A matching vector is predicted by comparing a hidden state vector generated when the context vector is generated, and the candidate vector can be generated by inputting the matching vector into an LSTM (Long Short Term Memory) neural network.

또한, 상기 추론부는, 상기 후보 문장을 구성하는 각 단어에 해당하는 벡터를 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 소프트맥스(softmax) 함수에 입력하여 주어진 개수의 단어를 선정 및 출력하고, 빔 서치 디코더를 이용하여 상기 소프트맥스(softmax) 함수의 출력 단어를 조합한 최종 후보 문장을 생성하며, 상기 최종 후보 문장과 상기 뉴스 문장을 비교하여 상기 명제 문장의 참 또는 거짓을 판별하고 상기 뉴스 문장이 가짜뉴스에 해당하는지를 탐색할 수 있다.In addition, the inference unit selects and outputs a given number of words by inputting a vector corresponding to each word constituting the candidate sentence into a softmax function of a sequence-to-sequence learning model. , Generates a final candidate sentence combining the output words of the softmax function using a beam search decoder, compares the final candidate sentence with the news sentence to determine true or false of the propositional sentence, and determines the news You can search whether the sentence corresponds to fake news.

또한, 상기 추론부는, 상기 최종 후보 문장 및 상기 뉴스 문장을 임베딩하고, 임베딩한 상기 최종 후보 문장 및 상기 뉴스 문장의 코사인 유사도를 계산하며, 상기 코사인 유사도를 이용하여 상기 뉴스 문장에 의한 상기 명제 문장의 참 또는 거짓을 판별할 수 있다.In addition, the inference unit embeds the final candidate sentence and the news sentence, calculates cosine similarity between the embedded final candidate sentence and the news sentence, and uses the cosine similarity to determine the propositional sentence by the news sentence. You can discriminate between true and false.

본 발명에 따르면 사람의 판단을 필요로 하지 않으며, 실시간으로 가짜뉴스의 선별 처리가 가능하다. 아울러, 가짜뉴스를 선별함으로써 가짜뉴스가 전파되지 않도록 하여 가짜뉴스로부터 야기될 수 있는 사회적 혼란을 예방할 수 있다.According to the present invention, no human judgment is required and real-time screening of fake news is possible. In addition, by selecting fake news, it is possible to prevent the spread of fake news and prevent social confusion that may arise from fake news.

도 1은 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치의 블록도이다.
도 2는 도 1에 도시된 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치에 의해 구성되는 LSTM 기반의 시퀀스-투-시퀀스(sequence to sequence) 학습 모델을 보여주는 도면이다.
도 3은 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 방법의 흐름도이다.1 is a block diagram of a fake news search apparatus using grammar transformation on a neural network according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an LSTM-based sequence-to-sequence learning model constructed by a fake news search apparatus using grammatical transformation on a neural network according to an embodiment of the present invention illustrated in FIG. 1.
3 is a flowchart of a method for searching for fake news using grammatical transformation on a neural network according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention, and methods for achieving them will be clarified with reference to embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various different forms, and only the embodiments allow the disclosure of the present invention to be complete, and the ordinary knowledge in the technical field to which the present invention pertains. It is provided to fully inform the holder of the scope of the invention, and the invention is only defined by the scope of the claims. The same reference numerals refer to the same components throughout the specification.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소, 단계 및 동작은 하나 이상의 다른 구성요소, 단계 및 동작의 존재 또는 추가를 배제하지 않는다.The terminology used herein is for describing the embodiments and is not intended to limit the present invention. In the present specification, the singular form also includes the plural form unless otherwise specified in the phrase. As used herein, "comprises" and/or "comprising" does not exclude the presence or addition of one or more other components, steps and actions.

도 1은 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치의 블록도이다.1 is a block diagram of a fake news search apparatus using grammar transformation on a neural network according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치(1)는 단어 임베딩부(10), 컨텍스트 생성부(30), 매칭부(50) 및 추론부(70)를 포함할 수 있다.Referring to FIG. 1, the fake news search apparatus 1 using grammatical transformation on a neural network according to an embodiment of the present invention includes a word embedding unit 10, a context generating unit 30, a matching unit 50, and a reasoning unit (70) may be included.

본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치(1)는 뉴스 문장의 참 또는 거짓을 판별할 수 있는데, 딥러닝에 기반하여 명제(proposition) 문장의 문법을 변형시킨 후보 문장을 생성하고 이를 이용하여 뉴스의 진위 여부를 판별할 수 있다. 따라서 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치(1)는 사람의 개입 없이 빠르고 효율적인 가짜뉴스 탐색 모델을 제시할 수 있다.The fake news search apparatus 1 using a grammar modification on a neural network according to an embodiment of the present invention can determine whether a news sentence is true or false, and is a candidate who has modified the grammar of a proposition sentence based on deep learning. You can create a sentence and use it to determine the authenticity of the news. Therefore, the fake news search apparatus 1 using the grammar modification on the neural network according to an embodiment of the present invention can present a fast and efficient fake news search model without human intervention.

본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치(1)는 가짜뉴스 탐색을 위한 소프트웨어(어플리케이션)가 설치되어 실행될 수 있으며, 단어 임베딩부(10), 컨텍스트 생성부(30), 매칭부(50) 및 추론부(70)는 가짜뉴스 탐색을 위한 소프트웨어에 의해 제어될 수 있다.The fake news search apparatus 1 using the grammar transformation on the neural network according to an embodiment of the present invention may be installed and executed with software (application) for searching for fake news, and a word embedding unit 10 and a context generating unit 30 ), the matching unit 50 and the inference unit 70 may be controlled by software for searching for fake news.

단어 임베딩부(10), 컨텍스트 생성부(30), 매칭부(50) 및 추론부(70)의 구성은 통합 모듈로 형성되거나, 하나 이상의 모듈로 이루어질 수 있다. 그러나, 이와 반대로 각 구성은 별도의 모듈로 이루어질 수도 있다.The configuration of the word embedding unit 10, the context generating unit 30, the matching unit 50, and the inference unit 70 may be formed of an integrated module, or may be made of one or more modules. However, on the contrary, each configuration may be made of a separate module.

본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치(1)는 이동성을 갖거나 고정될 수 있다. 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치(1)는 컴퓨터(computer), 서버(server) 또는 엔진(engine) 형태일 수 있으며, 디바이스(device), 기구(apparatus), 단말(terminal), UE(user equipment), MS(mobile station), MT(mobile terminal), UT(user terminal), SS(subscriber station), 무선기기(wireless device), PDA(personal digital assistant), 무선 모뎀(wireless modem), 휴대기기(handheld device) 등 다른 용어로 불릴 수 있다.The fake news search apparatus 1 using a grammar modification on a neural network according to an embodiment of the present invention may have mobility or be fixed. The fake news search apparatus 1 using a grammar modification on a neural network according to an embodiment of the present invention may be in the form of a computer, a server, or an engine, and a device and an apparatus , Terminal (terminal), user equipment (UE), mobile station (MS), mobile terminal (MT), user terminal (UT), subscriber station (SS), wireless device (wireless device), personal digital assistant (PDA), It can also be called another term, such as a wireless modem or a handheld device.

이하, 도 1에 도시된 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치(1)의 각 구성에 대해 구체적으로 설명한다.Hereinafter, each configuration of the fake news search apparatus 1 using a grammar modification on a neural network according to an embodiment of the present invention shown in FIG. 1 will be described in detail.

단어 임베딩부(10)는 뉴스 문장으로부터 참 또는 거짓이 판별되는 명제 문장을 단어(word) 벡터로 임베딩할 수 있다. The word embedding unit 10 may embed a propositional sentence for which true or false is determined from a news sentence as a word vector.

여기서, 명제 문장은 본 실시예에서의 탐색 대상인 뉴스의 진위 여부를 판별하기 위한 문장으로, 예를 들어, 뉴스 문장에 의해 명제 문장이 거짓인 것으로 판별되는 경우 해당 뉴스는 가짜뉴스로 판별될 수 있다.Here, the propositional sentence is a sentence for determining whether the news to be searched for in the present embodiment is authentic, for example, if the propositional sentence is determined to be false by the news sentence, the corresponding news may be determined as fake news. .

단어 임베딩부(10)는 임베딩을 통해 명제 문장으로부터 단어 벡터를 생성할 수 있다. 단어 임베딩부(10)는 임베딩 방식 중 원-핫 인코딩(One-hot encoding) 방식을 채택하여 명제 문장을 구성하는 각 단어들을 벡터로 변환할 수 있다. 이는 본 실시예에서 신경망 문법 변형을 이용하여 명제 문장과 의미는 동일하되 문법은 다른 문장인 후보 문장들을 생성하는데 신경망 알고리즘에 문자 자체를 입력하는 것이 불가능하기 때문이다.The word embedding unit 10 may generate a word vector from a propositional sentence through embedding. The word embedding unit 10 may convert each word constituting a propositional sentence into a vector by adopting a one-hot encoding method among embedding methods. This is because, in the present embodiment, the neural network grammar transformation is used to generate candidate sentences, which have the same meaning as the propositional sentence, but the grammar is different.

한편, 본 실시예에서 신경망 문법 변형 시 LSTM(Long Short Term Memory) 신경망 기반의 시퀀스-투-시퀀스(sequence to sequence) 학습 모델을 이용할 수 있다. 후술하는 컨텍스트 생성부(30), 매칭부(50) 및 추론부(70)는 LSTM 신경망 기반의 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 각 계층을 구성하여 가짜뉴스 탐색을 수행할 수 있다.On the other hand, in the present embodiment, a sequence-to-sequence learning model based on a Long Short Term Memory (LSTM) neural network may be used when the neural network grammar is modified. The context generation unit 30, the matching unit 50, and the inference unit 70, which will be described later, configure each layer of the LSTM neural network based sequence-to-sequence learning model to perform fake news search. have.

LSTM 신경망은 문자와 단어들 사이의 흐름을 처리하는 데에 널리 사용되고 있는 RNN(Recurrent Neural Network) 신경망의 히든 스테이트(hidden state)에 셀 스테이트(cell state)를 추가한 구조이다. RNN 신경망은 히든 스테이트의 히든 노드가 방향을 갖는 엣지로 연결되어 순환구조를 이루는 인공신경망의 한 종류이다. LSTM 신경망에 추가되는 셀 스테이트는 스테이트가 오래 경과하더라도 그래디언트가 비교적 잘 전파될 수 있게 하는 수식을 사용하여 다음 셀에 영향을 줄 수 있다. The LSTM neural network is a structure in which a cell state is added to a hidden state of a Recurrent Neural Network (RNN) neural network that is widely used to process flow between letters and words. The RNN neural network is a kind of artificial neural network in which a hidden node of a hidden state is connected to an edge having a direction to form a circular structure. The cell state added to the LSTM neural network can affect the next cell by using a formula that allows the gradient to propagate relatively well even after a long period of time.

시퀀스-투-시퀀스(sequence to sequence) 학습 모델은 LSTM 신경망을 활용하여 구현되는 딥러닝 알고리즘으로, 텍스트와 같이 입력의 차원과 출력의 차원이 고정되지 않은 데이터를 처리하는 데 용이하다. 시퀀스-투-시퀀스(sequence to sequence) 학습 모델은 입력으로 소스 시퀀스(source sequence) 및 타겟 시퀀스(target sequence)를 받을 수 있으며, 이러한 입력은 크게 인코더 및 디코더의 두 파트로 나누어 처리할 수 있다. 인코더는 소스 시퀀스를 고정 크기의 벡터로 변환하고, 디코더는 인코더의 벡터를 타겟 시퀀스로 변환할 수 있다. 이때, 타겟 시퀀스의 예측 과정에서 디코더는 직전에 예측한 결과를 다음 단계의 입력으로 넣어 예측할 수 있다. 시퀀스-투-시퀀스(sequence to sequence) 학습 모델에서의 시퀀스 예측 정확도를 높이기 위해, 중요한 단어에 임팩트를 주는 어텐션(attention) 매커니즘을 이용할 수도 있다.The sequence-to-sequence learning model is a deep learning algorithm implemented using an LSTM neural network, and is easy to process data whose input dimension and output dimension are not fixed, such as text. The sequence-to-sequence learning model can receive a source sequence and a target sequence as inputs, and these inputs can be divided into two parts: an encoder and a decoder. The encoder converts the source sequence into a vector of fixed size, and the decoder converts the vector of the encoder into a target sequence. At this time, in the process of predicting the target sequence, the decoder may predict the result of the previous prediction as an input of the next step. In order to increase the accuracy of sequence prediction in a sequence-to-sequence learning model, an attention mechanism that impacts important words may be used.

또한, 시퀀스-투-시퀀스(sequence to sequence) 학습 모델은 소프트맥스(softmax) 계층을 포함하여 최종적으로 단어를 디코딩하여 문장을 생성하는데, 이때, 디코더를 사용하여 높은 확률의 단어로 이루어지는 문장들을 출력할 수 있다. 이러한 디코더의 종류는 여러 가지 제안된 바 있으나, 본 실시예에 적용되는 시퀀스-투-시퀀스(sequence to sequence) 학습 모델은 빔 서치 디코더(Beam Search Decoder)를 포함하여 구현될 수 있다. 빔 서치 디코더는 빔 사이즈에 따라 가장 높은 확률의 단어들을 주어진 개수만큼 생성하여 최적 조합의 문장을 만들 수 있다. In addition, the sequence-to-sequence learning model finally includes a softmax layer to decode a word to generate a sentence. At this time, a decoder is used to output sentences with high probability words. can do. Although various types of such decoders have been proposed, a sequence-to-sequence learning model applied to the present embodiment may be implemented by including a beam search decoder. The beam search decoder can generate the best combination of sentences by generating a given number of words with the highest probability according to the beam size.

컨텍스트 생성부(30)는 단어 벡터를 소정의 자연어 처리 신경망에 입력하여 컨텍스트(context) 벡터를 생성할 수 있다. 여기서 자연어 처리 신경망은 상술한 것처럼 LSTM 신경망일 수 있다. The context generator 30 may generate a context vector by inputting a word vector into a predetermined natural language processing neural network. Here, the natural language processing neural network may be an LSTM neural network as described above.

매칭부(50)는 컨텍스트 벡터를 이용하여 명제 문장과 의미는 동일하되 문법이 다른 문장인 후보 문장들을 생성할 수 있다.The matching unit 50 may use the context vector to generate candidate sentences, which have the same meaning as the propositional sentence but different grammar.

매칭부(50)는 후보 문장 생성의 정확도를 높일 수 있도록 상술한 바와 같은 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 어텐션 매커니즘을 적용할 수 있다. The matching unit 50 may apply the attention mechanism of the sequence-to-sequence learning model as described above to increase the accuracy of generating candidate sentences.

즉, 매칭부(50)는 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 어텐션 매커니즘을 이용하여 컨텍스트 벡터로부터 어텐션 벡터를 생성할 수 있다. 매칭부(50)는 어텐션 매커니즘에 따라 컨텍스트 벡터의 가중 합(weighted sum)을 계산하여 중요한 단어에 임팩트를 준 어텐션 벡터를 생성할 수 있다.That is, the matching unit 50 may generate an attention vector from the context vector using the attention mechanism of the sequence-to-sequence learning model. The matching unit 50 may generate an attention vector that gives an impact to an important word by calculating a weighted sum of the context vector according to the attention mechanism.

매칭부(50)는 어텐션 벡터 및 컨텍스트 생성부(30)에서 컨텍스트 벡터의 생성 시 발생하는 히든 스테이트 벡터를 비교하는 매칭 동작을 수행하여 매칭 벡터를 예측할 수 있다. 상술한 바와 같이 컨텍스트 생성부(30)는 LSTM 신경망을 이용하여 컨텍스트 벡터를 생성하는데, LSTM 신경망의 히든 스테이트에서 히든 스테이트 벡터가 생성될 수 있다. 매칭부(50)는 어텐션 벡터 및 히든 스테이트 벡터를 비교하는 매칭 동작을 수행하여 매칭 벡터를 예측할 수 있다.The matching unit 50 may predict a matching vector by performing a matching operation comparing the hidden state vector generated when the context vector is generated by the attention vector and the context generating unit 30. As described above, the context generator 30 generates a context vector using the LSTM neural network, and a hidden state vector may be generated from the hidden state of the LSTM neural network. The matching unit 50 may predict a matching vector by performing a matching operation comparing the attention vector and the hidden state vector.

매칭부(50)는 매칭 벡터를 소정의 자연어 처리 신경망에 입력하여 후보 문장을 생성할 수 있다. 여기서 자연어 처리 신경망은 상술한 것처럼 LSTM 신경망일 수 있다.The matching unit 50 may generate a candidate sentence by inputting a matching vector into a predetermined natural language processing neural network. Here, the natural language processing neural network may be an LSTM neural network as described above.

추론부(70)는 후보 문장과 뉴스 문장을 비교하여 명제 문장의 참 또는 거짓을 판별하고 뉴스 문장이 가짜뉴스에 해당하는지를 탐색할 수 있다.The inference unit 70 may compare candidate sentences and news sentences to determine whether a propositional sentence is true or false, and search whether the news sentence corresponds to fake news.

추론부(70)는 최적 조합의 후보 문장과 뉴스 문장을 비교하여 가짜뉴스 선별의 정확도를 높일 수 있도록 상술한 바와 같은 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 소프트맥스(softmax) 계층을 이용할 수 있다.The inference unit 70 compares the candidate sentence and the news sentence of the optimal combination to increase the accuracy of the fake news selection, and the softmax layer of the sequence-to-sequence learning model as described above. Can be used.

즉, 추론부(70)는 후보 문장을 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 소프트맥스(softmax) 함수에 입력하여 주어진 개수의 단어를 선정 및 출력할 수 있다. 여기서 후보 문장은 매칭부(50)의 출력인 연계된(Concatenated) 합계 매칭(aggregated matching) 벡터에 해당할 수 있으며, 추론부(70)는 후보 문장을 구성하는 각 단어에 해당하는 합계 매칭 벡터를 소프트맥스(softmax) 함수에 입력하여 주어진 개수의 단어를 선정 및 출력할 수 있다. 이때 출력되는 단어는 뉴스 문장에 나타날 확률이 높은 단어일 수 있다.That is, the inference unit 70 may select and output a given number of words by inputting candidate sentences into a softmax function of a sequence-to-sequence learning model. Here, the candidate sentence may correspond to an aggregated matching vector that is an output of the matching unit 50, and the inference unit 70 may select a sum matching vector corresponding to each word constituting the candidate sentence. By inputting to the softmax function, a given number of words can be selected and output. In this case, the output word may be a word having a high probability of appearing in a news sentence.

추론부(70)는 소프트맥스(softmax) 함수의 출력을 빔 서치 디코더에 입력하여 최종 후보 문장을 생성할 수 있다. 상술한 것처럼 빔 서치 디코더는 소프트맥스(softmax) 함수의 출력인 단어들을 완성도 있는 조합으로 구성하여 문장을 생성할 수 있다.The inference unit 70 may input the output of the softmax function to the beam search decoder to generate the final candidate sentence. As described above, the beam search decoder may generate sentences by constructing words, which are outputs of the softmax function, in a complete combination.

추론부(70)는 이와 같이 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 소프트맥스(softmax) 계층을 구성하여 보다 완성도 있는 후보 문장을 생성할 수 있다.The reasoning unit 70 may generate a candidate sentence with more completeness by constructing a softmax layer of the sequence-to-sequence learning model.

추론부(70)는 뉴스 문장에 의한 명제 문장의 참 또는 거짓을 판별하기 위해, 뉴스 문장과 명제 문장의 문법이 변형된 문장인 후보 문장을 비교할 수 있다.The reasoning unit 70 may compare the candidate sentence, which is a sentence in which the grammar of the proposition sentence is modified, to determine whether the proposition sentence is true or false.

이를 위해 추론부(70)는 뉴스 문장 및 빔 서치 디코더의 출력인 후보 문장을 임베딩할 수 있다. 예를 들어, 추론부(70)는 Doc2Vec 모델을 이용하여 뉴스 문장 및 후보 문장을 임베딩하여 벡터로 변환할 수 있다. 이때, 빔 서치 디코더의 출력인 후보 문장은 복수 개일 수 있으며, 이와 같은 경우 추론부(70)는 복수 개의 후보 문장들을 모아 문단(sentence) 그룹을 구성할 수 있다.To this end, the inference unit 70 may embed the candidate sentence, which is the output of the news sentence and the beam search decoder. For example, the inference unit 70 may embed news sentences and candidate sentences using the Doc2Vec model and convert them into vectors. At this time, there may be a plurality of candidate sentences that are outputs of the beam search decoder, and in this case, the inference unit 70 may form a paragraph group by collecting a plurality of candidate sentences.

추론부(70)는 임베딩한 뉴스 문장 및 후보 문장의 코사인 유사도를 계산할 수 있다. 추론부(70)는 코사인 유사도를 이용하여 뉴스 문장에 의한 명제 문장의 참 또는 거짓을 판별할 수 있다. 예를 들어, 코사인 유사도는 0~1 사이의 값을 가질 수 있으며, 추론부(70)는 코사인 유사도가 0.5 이상인 경우, 명제 문장을 참으로 판별할 수 있다.The inference unit 70 may calculate the cosine similarity between the embedded news sentences and candidate sentences. The inference unit 70 may determine true or false of the propositional sentence by the news sentence using the cosine similarity. For example, the cosine similarity may have a value between 0 and 1, and when the cosine similarity is 0.5 or more, the inference unit 70 may truly determine the proposition sentence.

추론부(70)는 문단 그룹을 구성한 경우, 문단 그룹의 각 후보 문장들과 뉴스 문장의 코사인 유사도를 모두 계산할 수 있다. 추론부(70)는 문단 그룹의 각 후보 문장들과 뉴스 문장의 코사인 유사도 중 가장 높은 코사인 유사도를 이용하여 뉴스 문장에 의한 명제 문장의 참 또는 거짓을 판별할 수 있다.When the inference unit 70 constitutes a paragraph group, both the candidate sentences of the paragraph group and the cosine similarity of the news sentence may be calculated. The reasoning unit 70 may determine true or false of the propositional sentence by the news sentence by using the highest cosine similarity among each candidate sentence of the paragraph group and the cosine similarity of the news sentence.

추론부(70)는 뉴스 문장에 의한 명제 문장이 참으로 판별된 경우, 해당 뉴스 또한 진실인 것으로 선별할 수 있으며, 뉴스 문장에 의한 명제 문장이 거짓으로 판별된 경우, 해당 뉴스 또한 가짜뉴스인 것으로 선별할 수 있다.The reasoning unit 70 may select that the propositional sentence by the news sentence is true, and the corresponding news may also be selected as true. If the propositional sentence by the news sentence is false, the corresponding newsletter is also false news. Can be screened.

이와 같이, 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치(1)는 LSTM 기반의 시퀀스-투-시퀀스(sequence to sequence) 학습 모델을 구성하여, 명제 문장과 의미는 동일하되 문법이 다른 문장인 후보 문장들을 생성하고, 후보 문장들과 뉴스 문장을 비교하여 가짜뉴스를 선별할 수 있다. As described above, the fake news search apparatus 1 using the grammar transformation on the neural network according to an embodiment of the present invention constitutes an LSTM-based sequence-to-sequence learning model, so that the propositional sentence and the meaning are the same. However, it is possible to select candidate sentences, which are sentences with different grammars, and compare the candidate sentences with news sentences to select fake news.

본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치(1)는 사람의 판단을 필요로 하지 않으며, 실시간으로 가짜뉴스 선별 처리가 가능하다.The fake news search apparatus 1 using the grammar modification on the neural network according to an embodiment of the present invention does not require human judgment, and can process fake news in real time.

도 2는 도 1에 도시된 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치에 의해 구성되는 LSTM 기반의 시퀀스-투-시퀀스(sequence to sequence) 학습 모델을 보여주는 도면이다.FIG. 2 is a diagram illustrating an LSTM-based sequence-to-sequence learning model constructed by a fake news search apparatus using grammatical transformation on a neural network according to an embodiment of the present invention illustrated in FIG. 1.

도 2를 참조하면, LSTM 기반의 시퀀스-투-시퀀스(sequence to sequence) 학습 모델은 단어 임베딩 레이어(Word Embedding Layer), 컨텍스트 생성 레이어(Context Generation Layer), 매칭 레이어(Matching Layer) 및 추론 레이어(Inference Layer)의 4 개의 계층으로 이루어질 수 있다.Referring to FIG. 2, the LSTM-based sequence-to-sequence learning model includes a word embedding layer, a context generation layer, a matching layer, and a reasoning layer ( Inference Layer).

단어 임베딩부(10)는 단어 임베딩 레이어(Word Embedding Layer)를 구성하며, 뉴스 문장으로부터 참 또는 거짓이 판별되는 명제 문장을 단어 벡터로 임베딩할 수 있다. The word embedding unit 10 constitutes a word embedding layer, and may embed a propositional sentence in which true or false is determined from a news sentence as a word vector.

컨텍스트 생성부(30)는 컨텍스트 생성 레이어(Context Generation Layer)를 구성하며, 단어 벡터를 LSTM 신경망에 입력하여 컨텍스트 벡터를 생성할 수 있다. The context generation unit 30 constitutes a context generation layer and may generate a context vector by inputting a word vector into the LSTM neural network.

매칭부(50)는 매칭 레이어(Matching Layer)를 구성하며, 컨텍스트 벡터에 대한 매칭 동작을 수행하여 매칭 벡터를 생성할 수 있다. 이때, 매칭부(50)는 컨텍스트 벡터를 어텐션 함수에 입력하여 어텐션 벡터를 생성하고, 어텐션 벡터와 컨텍스트 생성 레이어(Context Generation Layer)에 포함되는 LSTM 신경망의 히든 스테이트 벡터를 비교하는 매칭 동작을 수행하여 매칭 벡터를 생성할 수 있다. 그리고, 매칭부(50)는 매칭 벡터를 LSTM 신경망에 입력하여 명제 문장과 의미는 동일하되 문법이 다른 문장인 후보 문장에 해당하는 연계된 합계 매칭 벡터를 생성할 수 있다.The matching unit 50 configures a matching layer, and may generate a matching vector by performing a matching operation on the context vector. At this time, the matching unit 50 inputs a context vector to the attention function to generate an attention vector, and performs a matching operation comparing the attention vector and the hidden state vector of the LSTM neural network included in the context generation layer Matching vectors can be generated. Then, the matching unit 50 may input a matching vector into the LSTM neural network to generate an associated sum matching vector corresponding to a candidate sentence having the same meaning as a propositional sentence but having a different grammar.

추론부(70)는 추론 레이어(Inference Layer)를 구성하며, 연계된 합계 매칭 벡터로부터 합계 매칭 벡터를 획득하고 이를 소프트맥스(softmax) 함수에 입력하여 주어진 개수의 단어를 선정 및 출력할 수 있다. 그리고, 추론부(70)는 소프트맥스(softmax) 함수의 출력을 빔 서치 디코더에 입력하여 복수의 후보 문장들을 생성할 수 있다. 추론부(70)는 복수의 후보 문장들과 뉴스 문장들을 비교하여 뉴스 문장에 의한 명제 문장의 참 또는 거짓을 판별하고, 그 결과에 따라 가짜뉴스를 선별할 수 있다.The inference unit 70 configures an inference layer, obtains a sum matching vector from the associated sum matching vector, and inputs it to a softmax function to select and output a given number of words. In addition, the inference unit 70 may generate a plurality of candidate sentences by inputting the output of the softmax function to the beam search decoder. The inference unit 70 may compare a plurality of candidate sentences and news sentences to determine whether a propositional sentence is true or false based on the news sentence, and select fake news according to the result.

이하에서는 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 방법에 대해 설명한다.Hereinafter, a method of searching for fake news using grammatical transformation on a neural network according to an embodiment of the present invention will be described.

도 3은 본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 방법의 흐름도이다.3 is a flowchart of a method for searching for fake news using grammatical transformation on a neural network according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 방법은 도 1에 도시된 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치(1)와 실질적으로 동일한 구성에서 실행될 수 있다. 따라서 도 1에 도시된 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치(1)와 동일한 구성요소는 동일한 도면부호를 부여하고, 반복되는 설명은 생략한다.The fake news search method using the grammar modification on the neural network according to an embodiment of the present invention may be executed in substantially the same configuration as the fake news search apparatus 1 using the grammar modification on the neural network shown in FIG. 1. Therefore, the same components as the fake news search apparatus 1 using the grammar transformation on the neural network shown in FIG. 1 are given the same reference numerals, and repeated descriptions are omitted.

도 3을 참조하면, 단어 임베딩부(10)는 명제 문장을 단어 벡터로 임베딩할 수 있다(S100). Referring to FIG. 3, the word embedding unit 10 may embed a propositional sentence into a word vector (S100 ).

명제 문장은 뉴스 문장에 의해 참 또는 거짓이 판별될 문장에 해당한다. 단어 임베딩부(10)는 임베딩 방식 중 원-핫 인코딩(One-hot encoding) 방식을 채택하여 명제 문장을 구성하는 각 단어들을 벡터로 변환할 수 있다. A propositional sentence corresponds to a sentence for which true or false will be determined by a news sentence. The word embedding unit 10 may convert each word constituting a propositional sentence into a vector by adopting a one-hot encoding method among embedding methods.

컨텍스트 생성부(30)는 단어 벡터를 신경망에 입력하여 컨텍스트 벡터를 생성할 수 있다(S200).The context generation unit 30 may generate a context vector by inputting a word vector into the neural network (S200).

여기서 신경망은 자연어 처리 신경망으로, LSTM 신경망일 수 있다.Here, the neural network is a natural language processing neural network, and may be an LSTM neural network.

매칭부(50)는 컨텍스트 벡터를 신경망에 입력하여 후보 문장을 생성할 수 있다(S300).The matching unit 50 may generate a candidate sentence by inputting a context vector into the neural network (S300).

후보 문장은 명제 문장과 의미는 동일하되 문법이 다른 문장에 해당한다.Candidate sentences are sentences that have the same meaning as propositional sentences but different grammar.

매칭부(50)는 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 어텐션 매커니즘을 이용하여 컨텍스트 벡터로부터 어텐션 벡터를 생성할 수 있다. 매칭부(50)는 어텐션 매커니즘에 따라 컨텍스트 벡터의 가중 합(weighted sum)을 계산하여 중요한 단어에 임팩트를 준 어텐션 벡터를 생성할 수 있다.The matching unit 50 may generate an attention vector from the context vector using the attention mechanism of the sequence-to-sequence learning model. The matching unit 50 may generate an attention vector that gives an impact to an important word by calculating a weighted sum of the context vector according to the attention mechanism.

매칭부(50)는 어텐션 벡터 및 컨텍스트 벡터의 생성 시 발생하는 히든 스테이트 벡터를 비교하는 매칭 동작을 수행하여 매칭 벡터를 예측할 수 있다.The matching unit 50 may predict a matching vector by performing a matching operation comparing the hidden state vector generated when the attention vector and the context vector are generated.

매칭부(50)는 매칭 벡터를 LSTM 신경망에 입력하여 후보 문장을 생성할 수 있다. The matching unit 50 may generate a candidate sentence by inputting a matching vector into the LSTM neural network.

추론부(70)는 후보 문장과 뉴스 문장을 비교하여 가짜뉴스를 탐색할 수 있다(S400).The reasoning unit 70 may search for fake news by comparing the candidate sentence and the news sentence (S400).

추론부(70)는 후보 문장을 시퀀스-투-시퀀스(sequence to sequence) 학습 모델의 소프트맥스(softmax) 함수에 입력하여 주어진 개수의 단어를 선정 및 출력할 수 있다. 여기서 후보 문장은 매칭부(50)의 출력인 연계된(Concatenated) 합계 매칭(aggregated matching) 벡터에 해당할 수 있으며, 추론부(70)는 후보 문장을 구성하는 각 단어에 해당하는 합계 매칭 벡터를 소프트맥스(softmax) 함수에 입력하여 주어진 개수의 단어를 선정 및 출력할 수 있다. 이때 출력되는 단어는 뉴스 문장에 나타날 확률이 높은 단어일 수 있다.The reasoning unit 70 may select and output a given number of words by inputting candidate sentences into a softmax function of a sequence-to-sequence learning model. Here, the candidate sentence may correspond to an aggregated matching vector that is an output of the matching unit 50, and the inference unit 70 may select a sum matching vector corresponding to each word constituting the candidate sentence. By inputting to the softmax function, a given number of words can be selected and output. In this case, the output word may be a word having a high probability of appearing in a news sentence.

추론부(70)는 소프트맥스(softmax) 함수의 출력을 빔 서치 디코더에 입력하여 최종 후보 문장을 생성할 수 있다. 상술한 것처럼 빔 서치 디코더는 소프트맥스(softmax) 합수의 출력인 단어들을 완성도 있는 조합으로 구성하여 문장을 생성할 수 있다.The inference unit 70 may input the output of the softmax function to the beam search decoder to generate the final candidate sentence. As described above, the beam search decoder may generate sentences by constructing words that are outputs of a softmax function in a complete combination.

추론부(70)는 뉴스 문장 및 빔 서치 디코더의 출력인 후보 문장을 임베딩할 수 있다. 예를 들어, 추론부(70)는 Doc2Vec 모델을 이용하여 뉴스 문장 및 후보 문장을 임베딩할 수 있다. 이때, 빔 서치 디코더의 출력인 후보 문장은 복수 개일 수 있으며, 이와 같은 경우 추론부(70)는 복수 개의 후보 문장들을 모아 문단(sentence) 그룹을 구성할 수 있다. The inference unit 70 may embed the candidate sentence, which is the output of the news sentence and the beam search decoder. For example, the inference unit 70 may embed news sentences and candidate sentences using the Doc2Vec model. At this time, there may be a plurality of candidate sentences that are outputs of the beam search decoder, and in this case, the inference unit 70 may form a paragraph group by collecting a plurality of candidate sentences.

추론부(70)는 문단 그룹을 구성한 경우, 문단 그룹의 각 후보 문장들과 뉴스 문장의 코사인 유사도를 모두 계산할 수 있다. 추론부(70)는 문단 그룹의 각 후보 문장들과 뉴스 문장의 코사인 유사도 중 가장 높은 코사인 유사도를 이용하여 뉴스 문장에 의한 명제 문장의 참 또는 거짓을 판별할 수 있다. When the inference unit 70 constitutes a paragraph group, both the candidate sentences of the paragraph group and the cosine similarity of the news sentence may be calculated. The reasoning unit 70 may determine true or false of the propositional sentence by the news sentence by using the highest cosine similarity among each candidate sentence of the paragraph group and the cosine similarity of the news sentence.

이와 같은 본 발명의 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색방법은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.The fake news search method using the grammar modification on the neural network of the present invention is implemented in the form of program instructions that can be executed through various computer components and can be recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, or the like alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.The program instructions recorded on the computer-readable recording medium are specially designed and configured for the present invention, and may be known and available to those skilled in the computer software field.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs, DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include not only machine language codes produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the present invention, and vice versa.

이상 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although the embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains may be implemented in other specific forms without changing the technical spirit or essential features of the present invention. You will understand. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

1: 신경망 상의 문법 변형을 이용한 가짜뉴스 탐색 장치
10: 단어 임베딩부
30: 컨텍스트 생성부
50: 매칭부
70: 추론부1: Fake news search device using grammar transformation on neural network
10: word embedding
30: context creation unit
50: matching unit
70: reasoning

Claims

Embedding a proposition sentence in which a true or false is determined by a news sentence as a word vector;
Generating a context vector by inputting the word vector into a predetermined natural language processing neural network;
Inputting the context vector into a predetermined natural language processing neural network to generate a candidate sentence that has the same meaning as the propositional sentence but has a different grammar; And
Comprising the step of comparing the candidate sentence and the news sentence to determine the true or false of the propositional sentence and searching for whether the news sentence corresponds to the fake news, Fake news search method using a grammar modification on the neural network.

According to claim 1,
The step of embedding a proposition sentence in which a true or false is determined by the news sentence as a word vector is:
A method of searching for fake news using grammatical transformation on a neural network, comprising converting each word constituting the propositional sentence into a vector using a one-hot encoding method.

According to claim 1,
Generating a context vector by inputting the word vector into a predetermined natural language processing neural network includes:
And generating the context vector by inputting the word vector into a Long Short Term Memory (LSTM) neural network.

According to claim 1,
The step of inputting the context vector into a predetermined natural language processing neural network to generate a candidate sentence having the same meaning as the propositional sentence but having a different grammar,
Generating an attention vector by calculating a weighted sum of the context vector according to the attention mechanism of the sequence-to-sequence learning model;
Predicting a matching vector by comparing a hidden state vector generated when the attention vector is generated with the context vector; And
And generating the candidate sentence by inputting the matching vector into an LSTM (Long Short Term Memory) neural network.

According to claim 1,
Comparing the candidate sentence and the news sentence to determine the true or false of the propositional sentence and searching for whether the news sentence corresponds to fake news,
Selecting and outputting a given number of words by inputting a vector corresponding to each word constituting the candidate sentence into a softmax function of a sequence-to-sequence learning model;
Generating a final candidate sentence combining output words of the softmax function using a beam search decoder; And
And comparing the final candidate sentence with the news sentence to determine whether the propositional sentence is true or false and searching whether the news sentence corresponds to fake news.

The method of claim 5,
Comparing the final candidate sentence and the news sentence to determine the true or false of the propositional sentence and searching for whether the news sentence corresponds to fake news,
Embedding the final candidate sentence and the news sentence;
Calculating a cosine similarity between the embedded final candidate sentence and the news sentence; And
And determining whether the propositional sentence is true or false based on the news sentence using the cosine similarity.

A computer-readable recording medium in which a computer program is recorded, for performing a fake news search method using a grammar modification on a neural network according to any one of claims 1 to 6.

A word embedding unit for embedding a proposition sentence in which a true or false is determined by a news sentence as a word vector;
A context generating unit that inputs the word vector into a predetermined natural language processing neural network to generate a context vector;
A matching unit that inputs the context vector into a predetermined natural language processing neural network to generate candidate sentences that have the same meaning but different grammar sentences; And
A fake news search apparatus using grammatical transformation on a neural network, including a reasoning unit that compares the candidate sentence and the news sentence to determine whether the propositional sentence is true or false and searches whether the news sentence corresponds to fake news.

The method of claim 8,
The word embedding unit,
A fake news search apparatus using grammatical transformation on a neural network that converts each word constituting the propositional sentence into a vector using a one-hot encoding method.

The method of claim 8,
The context generating unit,
A fake news search apparatus using grammatical transformation on a neural network that generates the context vector by inputting the word vector into a LSTM (Long Short Term Memory) neural network.

The method of claim 8,
The matching unit,
Generates an attention vector by calculating the weighted sum of the context vector according to the attention mechanism of the sequence-to-sequence learning model, and occurs when the attention vector and the context vector are generated A fake news search apparatus using a grammar transformation on a neural network that generates a candidate sentence by comparing a hidden state vector to predict a matching vector and inputting the matching vector into a LSTM (Long Short Term Memory) neural network.

The method of claim 8,
The reasoning unit,
A vector corresponding to each word constituting the candidate sentence is input to a softmax function of a sequence-to-sequence learning model to select and output a given number of words, and a beam search decoder is used. To generate a final candidate sentence combining the output words of the softmax function, comparing the final candidate sentence with the news sentence to determine true or false of the propositional sentence, and the news sentence corresponds to fake news A fake news search device that uses grammatical transformations on neural networks to search for knowledge.

The method of claim 12,
The reasoning unit,
Embedding the final candidate sentence and the news sentence, calculating the cosine similarity between the embedded final candidate sentence and the news sentence, and using the cosine similarity to determine true or false of the propositional sentence by the news sentence Fake news search device using grammar transformation on neural network.